Clera - Your AI talent agent
LoginStart
Start
FT
Firmus Technologies

AI Engineer, AI & Applications

full-time•Singapore, Australia

Summary

Location

Singapore, Australia

Type

full-time

Experience

5-10 years

Company links

Website

About this role

<p><strong>Role Summary</strong></p> <p>The AI Engineer will establish Firmus AI Factory as the foundation for efficient, production-grade distributed training by delivering pre-built training recipes (TorchTitan, Megatron etc.), evaluation benchmarks, and model guidance. You'll work with customers and internal teams to optimize training efficiency, define baselines, and document best practices. Your templates and benchmarks are the anchor point for our hyperscale customers' training workflows and our model arena differentiator.</p> <p><br><strong>Key Responsibilities</strong></p> <ul> <li>Build production-ready training recipes using TorchTitan and Megatron-LM: model configs, parallelism strategies (FSDP, tensor/pipeline parallelism), checkpointing patterns.</li> <li>Document parameter tuning for different scales (e.g., "to train Llama 7B on 8xH100s, use this config and expect X throughput").</li> <li>Create and validate multi-node NCCL communication patterns on AI Factory K8s/Slurm clusters.</li> <li>Design and build benchmarking suites: accuracy, latency, throughput (tokens/sec), cost per token, energy efficiency, MFU.</li> <li>Implement offline evaluation harnesses for standardized model comparison and leaderboard tracking.</li> <li>Conduct fine-tuning experiments (LoRA, QLoRA) where they improve product outcomes (e.g., ops domain data), document gains.</li> <li>Create training efficiency playbooks and publish benchmark results so customers can optimize workloads.</li> <li>Partner with job scheduling and orchestration engineers on template integration and other AI engineers and software engineers on model optimization trade-offs for inferencing and AI applications.</li> </ul> <p><br><strong>Skills &amp; Experience</strong></p> <ul> <li>5–7 years of experience in distributed machine learning (PyTorch/JAX, FSDP, DeepSpeed, multi-node training at 10+ GPUs).</li> <li>Expert-level understanding of GPU optimization: utilization, memory patterns, communication bottlenecks (NCCL collectives).</li> <li>Hands-on distributed training at scale: debugged convergence issues, profiled bottlenecks, optimized throughput.</li> <li>Strong benchmarking methodology: design-controlled experiments, measure noise, communicate results rigorously.</li> <li>Familiarity with TorchTitan, Megatron-LM, or similar production training frameworks.&nbsp;</li> <li>Understanding of model parallelism strategies and trade-offs (FSDP vs. tensor parallelism vs. pipeline parallelism etc.).</li> </ul> <p><br><strong>Key Competencies</strong></p> <ul> <li>Distributed Systems Mastery: can explain NCCL, collective communications, and scaling inefficiency.</li> <li>Benchmarking Rigor: doesn't just run benchmarks; validates assumptions, explains variance, communicates uncertainty.</li> <li>Production Thinking: understands checkpointing, recovery, resource constraints, and cost optimization.</li> <li>Mentorship: can guide engineers on training best practices and debugging distributed training issues.</li> <li>Documentation: creates clear, actionable playbooks that customers can follow.</li> </ul> <p><br><strong>Success Metrics</strong></p> <ul> <li>Benchmark credibility &amp; decision impact increases: benchmarks are trusted and used to drive model/hardware/product decisions.</li> <li>Training efficiency leadership: sustained improvement in benchmarked training efficiency on representative workloads.</li> <li>Shorter time-to-validate new models: model candidates can be evaluated quickly and consistently end-to-end.</li> <li>Template effectiveness improves: recipes reduce misconfigurations and repeated setup failures; fewer training config escalations.</li> <li>Competitive differentiation strengthens: model arena outputs influence customer adoption and internal roadmap priorities.</li> </ul> <p><br><strong>Location &amp; Reporting</strong></p> <ul> <li>Singapore or Australia (Launceston, TAS or Sydney, NSW)</li> <li>Reporting to Head of AI &amp; Applications</li> </ul> <p><br><strong>Employment Basis</strong></p> <p>Full-time</p> <p><br><strong>Diversity</strong></p> <p>At Firmus, we are committed to building a diverse and inclusive workplace. We encourage applications from candidates of all backgrounds who are passionate about creating a more sustainable future through innovative engineering solutions.&nbsp;</p> <p>Join us in our mission to revolutionize the AI industry through sustainable practices and cutting-edge engineering. Apply now to be part of shaping the future of sustainable AI infrastructure.&nbsp;</p>

What you'll do

  • The AI Engineer will build production-ready training recipes and benchmarking suites to optimize training efficiency for customers. They will also document best practices and partner with other engineers to enhance model optimization.

Ready to join Firmus Technologies?

Take the next step in your career journey

Frequently Asked Questions

What does a AI Engineer, AI & Applications do at Firmus Technologies?

Toggle
As a AI Engineer, AI & Applications at Firmus Technologies, you will: the AI Engineer will build production-ready training recipes and benchmarking suites to optimize training efficiency for customers. They will also document best practices and partner with other engineers to enhance model optimization..

Is the AI Engineer, AI & Applications position at Firmus Technologies remote?

Toggle
The AI Engineer, AI & Applications position at Firmus Technologies is based in Singapore, Singapore and Australia, Australia. Contact the company through Clera for specific work arrangement details.

How do I apply for the AI Engineer, AI & Applications position at Firmus Technologies?

Toggle
You can apply for the AI Engineer, AI & Applications position at Firmus Technologiesdirectly through Clera. Click the "Apply Now" button above to start your application. Clera's AI-powered platform will help match your profile with this opportunity and guide you through the application process.
Clera - Your AI talent agent
© 2026 Clera Labs, Inc.TermsPrivacyHelp

Join Clera's Talent Pool

Get matched with similar opportunities at top startups

This role is hosted on Firmus Technologies's careers site.
Join our talent pool first to get notified about similar roles that match your profile.