ML Inference Optimization Engineer

full-time•on-site•Seattle•$250k - $450k

Summary

Location

Seattle

Salary

$250k - $450k

Type

full-time

Workplace

On-site

Experience

2+ years

Visa

Visa sponsorship available

Company links

Website LinkedIn

About this role

What we're looking for?We are seeking a Machine Learning Engineer with at least 2 years of full-time experience in production ML systems. You should be comfortable working with GPUs, debugging CUDA issues, and have a track record of optimizing large models for performance. Bonus points if you have experience with video or audio models and low-level optimization techniques like CUDA kernels.What you'll do:Take ownership of model performance, profiling and improving inference for latency and throughput using techniques like quantization, pruning, and distillation.Work closely with the research and infrastructure teams to design and implement efficient data pipelines for video data at a petabyte scale.Apply model acceleration techniques (e.g., TensorRT, ONNX, vLLM) to optimize multimodal models including video diffusion, LLMs, and speech models.Lead the development of evaluation frameworks to measure model quality and guide continuous improvement.

About Nuance Labs

Nuance Labs is building visual conversational AI that feels human. We are powering new user experiences where users can jump on a video chat with an AI.

Ready to join Nuance Labs?

Take the next step in your career journey

Frequently Asked Questions

What does Nuance Labs pay for a ML Inference Optimization Engineer?

Nuance Labs offers a competitive compensation package for the ML Inference Optimization Engineer role. The salary range is USD 250k - 450k per year. Apply through Clera to learn more about the full compensation details.

What does a ML Inference Optimization Engineer do at Nuance Labs?

The ML Inference Optimization Engineer role at Nuance Labs involves What we're looking for?We are seeking a Machine Learning Engineer with at least 2 years of full-time experience in production ML systems. You should be comfortable working with GPUs, debugging CUDA is...

Is the ML Inference Optimization Engineer position at Nuance Labs remote?

The ML Inference Optimization Engineer position at Nuance Labs is based in Seattle, WA, USA and is on-site. Contact the company through Clera for specific work arrangement details.

How do I apply for the ML Inference Optimization Engineer position at Nuance Labs?

You can apply for the ML Inference Optimization Engineer position at Nuance Labs directly through Clera. Click the "Apply Now" button above to start your application. Clera's AI-powered platform will help match your profile with this opportunity and guide you through the application process.

About this role