Clera - Your AI talent agent
LoginStart
Start
N
Netflix

Machine Learning Scientist (L4/L5) - Audio & Speech for Games

full-time•Los Gatos, Los Angeles•$466k - $750k

Summary

Location

Los Gatos, Los Angeles

Salary

$466k - $750k

Type

full-time

Experience

5-10 years

Company links

WebsiteLinkedInLinkedIn

About this role

At Netflix, our mission is to entertain the world. Together, we are writing the next episode - pushing the boundaries of storytelling, global fandom and making the unimaginable a reality. We are a dream team obsessed with the uncomfortable excitement of discovering what happens when you merge creativity, intuition and cutting-edge technology. Come be a part of what’s next.

The Team

The Studio Media Algorithms team is at the forefront of algorithmic innovation to enhance and support the creation of Netflix’s entertainment content, including games. In this role, you will be embedded within this team while collaborating very closely with a specialized Games Studio R&D team. This incubation-style team is chartered to lead our investments in building new kinds of games leveraging emerging technologies to support our creators and reach player audiences in new ways.

The Role

We are seeking a Machine Learning Scientist to redefine the auditory experience in Netflix games. This role requires an impact-oriented mindset, where you strategically balance the use of existing state-of-the-art tools with bespoke internal research to deliver high-quality, interactive audio at scale.

Responsibilities

  • Strategic Model Integration: Evaluation and integration of open-source and commercial models (e.g., for ASR, TTS, diarization, VAD), balancing quality, latency, and cost to accelerate game development.

  • Model R&D & Adaptation: Lead the fine-tuning and adaptation of speech models (TTS, Voice Cloning, ASR) using parameter-efficient techniques (e.g., LoRA/PEFT) to address production gaps and creative constraints that external/commercial solutions cannot satisfy.

  • Data & Alignment: Lead audio and multimodal data curation, including cleaning, segmentation, labeling, and synthetic data generation to support speech model training and adaptation.

  • Scalable Training: Scale speech model training using distributed data pipelines and training strategies for large datasets.

About You

  • Speech & Audio Research Expertise: Deep foundation in deep learning and modern sequence modeling, with a proven track record of building and training speech/audio pipelines (ASR, TTS, diarization, VAD) at scale using PyTorch.

  • Data-Centric Mindset: Skilled in data cleaning, curation, and the creation of synthetic data for complex evaluation and training pipelines.

  • Pragmatic Innovator: Proven ability to navigate "build vs. buy" decisions, prioritizing production impact and "Netflix-quality" bars.

  • Programming: Expert proficiency in Python; experience with C++ or Java.

  • Creative Mindset: Passionate about using technology to solve creative challenges in game design, such as procedural storytelling or immersive soundscapes.

Bonus Experience

  • Experience with multi-modal audio-visual models.

  • Familiarity with digital signal processing (DSP) as it relates to real-time game audio engines.

Generally, our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $466,000.00 - $750,000.00.

Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off. See more details about our Benefits here.

Netflix is a unique culture and environment. Learn more here.

Inclusion is a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.

We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.

What you'll do

  • The role involves evaluating and integrating models for audio processing in games, as well as leading the adaptation of speech models to meet production needs. Additionally, the candidate will curate audio data and scale training processes for speech models.

About Netflix

Netflix is one of the world's leading entertainment services, with over 300 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.

Ready to join Netflix?

Take the next step in your career journey

Frequently Asked Questions

What does Netflix pay for a Machine Learning Scientist (L4/L5) - Audio & Speech for Games?

Toggle
Netflix offers a competitive compensation package for the Machine Learning Scientist (L4/L5) - Audio & Speech for Games role. The salary range is USD 466k - 750k per year. Apply through Clera to learn more about the full compensation details.

What does a Machine Learning Scientist (L4/L5) - Audio & Speech for Games do at Netflix?

Toggle
As a Machine Learning Scientist (L4/L5) - Audio & Speech for Games at Netflix, you will: the role involves evaluating and integrating models for audio processing in games, as well as leading the adaptation of speech models to meet production needs. Additionally, the candidate will curate audio data and scale training processes for speech models..

Is the Machine Learning Scientist (L4/L5) - Audio & Speech for Games position at Netflix remote?

Toggle
The Machine Learning Scientist (L4/L5) - Audio & Speech for Games position at Netflix is based in Los Gatos, California, United States and Los Angeles, California, United States. Contact the company through Clera for specific work arrangement details.

How do I apply for the Machine Learning Scientist (L4/L5) - Audio & Speech for Games position at Netflix?

Toggle
You can apply for the Machine Learning Scientist (L4/L5) - Audio & Speech for Games position at Netflixdirectly through Clera. Click the "Apply Now" button above to start your application. Clera's AI-powered platform will help match your profile with this opportunity and guide you through the application process.
Clera - Your AI talent agent
© 2026 Clera Labs, Inc.TermsPrivacyHelp

Join Clera's Talent Pool

Get matched with similar opportunities at top startups

This role is hosted on Netflix's careers site.
Join our talent pool first to get notified about similar roles that match your profile.