Clera - Your AI talent agent
LoginStart
Start
FT
Firmus Technologies

Senior DevOps Engineer, AI & Applications

full-time•Singapore, Australia

Summary

Location

Singapore, Australia

Type

full-time

Experience

5-10 years

Company links

Website

About this role

<p><strong>Role Summary</strong></p> <p>Every AI feature we ship touches thousands of GPUs. The Senior DevOps Engineer will build the release engineering backbone—CI/CD pipelines, automated testing gates, one-click deployments with instant rollback—that lets Firmus scale fast and responsibly.&nbsp;</p> <p>You're the bridge between engineering and operations: setting Firmus standards for how code gets to production, mentoring the team on deployment safety, and driving a blameless culture when things go wrong. Ship safely. Ship often. Ship at scale.</p> <p><br><strong>Key Responsibilities</strong></p> <ul> <li>Design and maintain team-wide CI/CD pipelines (Jenkins, GitHub Actions, ArgoCD, or equivalent) with automated testing gates, artifact management, and deployments aligned with GPU cluster standards.</li> <li>Implement release engineering best practices: repeatable releases, GitOps workflows, automated rollback, and change management procedure.</li> <li>Build and manage test infrastructure: environment provisioning, data seeding, long-running job validation (especially for distributed training templates and multi-node job submissions).</li> <li>Establish engineering protocols and standards: repo organization, PR templates, code quality gates, dependency scanning, static analysis.</li> <li>Partner with infra teams to ensure AI product features deployment practices meet compliance and security standards for massive GPU clusters.</li> <li>Mentor team on testing strategies, deployment safety, and incident response procedures.</li> </ul> <p><br><strong>Skills &amp; Experience</strong></p> <ul> <li>5–7 years of CI/CD engineering, release engineering, or DevOps experience</li> <li>Deep expertise in GitHub Actions, GitLab CI, ArgoCD, or Jenkins with multi-stage pipeline design and testing gate implementation.</li> <li>Strong automation scripting (Python, Go, or Bash) for build orchestration and environment templating.</li> <li>Strong Kubernetes fundamentals (hands-on): deep understanding of Pod lifecycle and failure modes (Pending/Running/CrashLoopBackOff/Evicted), Deployments/ReplicaSets, Jobs/CronJobs, Services/Ingress, and how these primitives behave under load and during rollouts.</li> <li>Config &amp; secret management: practical experience designing and operating ConfigMaps and Secrets (including secret rotation patterns), with strong hygiene around least privilege, auditability, and preventing credential leakage into logs/artifacts.</li> <li>Safe rollout patterns: proven experience implementing and operating safe rollout strategies (rolling updates, canary, blue/green), readiness/liveness/startup probes, PodDisruptionBudgets, and rollback procedures—ensuring zero/low-downtime deployments for customer-facing services.</li> <li>Deployment safety &amp; debugging: ability to debug common Kubernetes rollout issues end-to-end (bad probes, misconfigured resources/limits, image pull failures, secret/config drift, node pressure/evictions) and convert learnings into automated CI/CD gates and runbooks.</li> <li>Familiarity with artifact management, versioning strategies, and rollback procedures.</li> <li>Experience integrating testing frameworks into CI pipelines (unit, integration, end-to-end).</li> </ul> <p><br><strong>Key Competencies</strong></p> <ul> <li>Engineering Velocity &amp; Time-to-Release improves quarter-over-quarter while release standards remain consistent (gates, tests, approvals, auditability).</li> <li>Platform Reliability &amp; Customer Trust remains strong: release-related incidents are rare and recovery is fast; reliability targets are met without "surprise outages."</li> <li>Developer Productivity &amp; Team Scale improves: engineers spend less time fighting CI/CD and more time shipping as the team grows.</li> <li>Cost Efficiency &amp; Resource Optimization improves: CI/CD and test infrastructure costs stay controlled (or decrease per unit of output) as usage scales.</li> <li>Knowledge &amp; Culture Multiplier effect is visible: release/reliability practices become the default across the org and repeat incident classes reduce</li> </ul> <p><br><strong>Success Metrics</strong></p> <ul> <li>Engineering Velocity &amp; Time-to-Release improves quarter-over-quarter while release standards remain consistent (gates, tests, approvals, auditability).</li> <li>Platform Reliability &amp; Customer Trust remains strong: release-related incidents are rare and recovery is fast; reliability targets are met without “surprise outages.”</li> <li>Developer Productivity &amp; Team Scale improves: engineers spend less time fighting CI/CD and more time shipping as the team grows.</li> <li>Cost Efficiency &amp; Resource Optimization improves: CI/CD and test infrastructure costs stay controlled (or decrease per unit of output) as usage scales.</li> <li>Knowledge &amp; Culture Multiplier effect is visible: release/reliability practices become the default across the org and repeat incident classes reduce</li> </ul> <p><br><strong>Location &amp; Reporting</strong></p> <ul> <li>Singapore or Australia (Launceston, TAS or Sydney, NSW)</li> <li>Reporting to Head of AI &amp; Applications</li> </ul> <p> <br><strong>Employment Basis</strong></p> <p>Full-time</p> <p><br><strong>Diversity</strong></p> <p>At Firmus, we are committed to building a diverse and inclusive workplace. We encourage applications from candidates of all backgrounds who are passionate about creating a more sustainable future through innovative engineering solutions.&nbsp;</p> <p>Join us in our mission to revolutionize the AI industry through sustainable practices and cutting-edge engineering. Apply now to be part of shaping the future of sustainable AI infrastructure.&nbsp;</p>

What you'll do

  • The Senior DevOps Engineer will design and maintain CI/CD pipelines and implement release engineering best practices to ensure safe and efficient code deployment. They will also mentor the team on deployment safety and establish engineering protocols.

Ready to join Firmus Technologies?

Take the next step in your career journey

Frequently Asked Questions

What does a Senior DevOps Engineer, AI & Applications do at Firmus Technologies?

Toggle
As a Senior DevOps Engineer, AI & Applications at Firmus Technologies, you will: the Senior DevOps Engineer will design and maintain CI/CD pipelines and implement release engineering best practices to ensure safe and efficient code deployment. They will also mentor the team on deployment safety and establish engineering protocols..

Is the Senior DevOps Engineer, AI & Applications position at Firmus Technologies remote?

Toggle
The Senior DevOps Engineer, AI & Applications position at Firmus Technologies is based in Singapore, Singapore and Australia, Australia. Contact the company through Clera for specific work arrangement details.

How do I apply for the Senior DevOps Engineer, AI & Applications position at Firmus Technologies?

Toggle
You can apply for the Senior DevOps Engineer, AI & Applications position at Firmus Technologiesdirectly through Clera. Click the "Apply Now" button above to start your application. Clera's AI-powered platform will help match your profile with this opportunity and guide you through the application process.
Clera - Your AI talent agent
© 2026 Clera Labs, Inc.TermsPrivacyHelp

Join Clera's Talent Pool

Get matched with similar opportunities at top startups

This role is hosted on Firmus Technologies's careers site.
Join our talent pool first to get notified about similar roles that match your profile.