
What we're looking for?
We are seeking a Machine Learning Engineer with at least 2 years of full-time experience in production ML systems. You should be comfortable working with GPUs, debugging CUDA issues, and have a track record of optimizing large models for performance. Bonus points if you have experience with video or audio models and low-level optimization techniques like CUDA kernels.
What you'll do:
Collaborate with researchers to operationalize models, moving them from experimental checkpoints to production-ready systems.
Take ownership of model performance, profiling and improving inference for latency and throughput using techniques like quantization, pruning, and distillation.
Work closely with the research and infrastructure teams to design and implement efficient data pipelines for video data at a petabyte scale.
Apply model acceleration techniques (e.g., TensorRT, ONNX, vLLM) to optimize multimodal models including video diffusion, LLMs, and speech models.
Lead the development of evaluation frameworks to measure model quality and guide continuous improvement.
Design and build scalable data pipelines for high-bandwidth media processing and training workflows.
Take the next step in your career journey