Netflix logo
Site Reliability Engineer 5 - Cloud Platform SRE
full-timePanama City

Summary

Location

Panama City

Type

full-time

Explore Jobs

About this role

Netflix is one of the world's leading entertainment services, with over 300 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.

About the role

As a Compute Cloud Platform SRE, you will ensure the reliability, scalability, and operational excellence of Netflix’s compute platforms, including EC2, Titus (our Kubernetes based container platform), managed batch, and function abstractions by building automation and tooling, driving observability and monitoring, and partnering closely with engineering teams across runtime, abstractions, and capacity to reduce toil and complexity for customers. You’ll play a key role in modernizing our stack toward industry-standard technologies, improving launch latency and efficiency, participating in on-call rotations and incident response, and collaborating cross-functionally to deliver integrated and consistent cloud compute experiences at scale.

Responsibilities 

  • Design, implement, and enhance monitoring, alerting, and observability for compute services and abstractions.

  • Partner with Engineering team to modernize and migrate to industry-standard container, orchestration, and capacity technologies.

  • Collaborate on the design and rollout of new managed abstractions (e.g., intent-based APIs, managed scaling, centralized capacity).

  • Lead and participate in post-incident reviews, driving actionable improvements to system reliability.

  • Develop tools and dashboards to provide visibility into utilization, efficiency, and cost for compute resources.

Qualifications

  • 5+ years of experience operating and scaling large-scale, high-performance cloud infrastructure or distributed systems.

  • Deep knowledge in Kubernetes, container runtimes (containerd, Docker), and related cloud native ecosystem tools.

  • Deep expertise in Linux/Unix systems, networking fundamentals, and cloud platforms (AWS strongly preferred).

  • Proficient in at least one programming language (Go, Python, Rust, Java etc).

  • Familiarity with auto scaling, fleet management, and capacity planning at scale.

  • Familiarity with open source observability and telemetry tooling for logs, metrics, and traces, including Prometheus and OpenTelemetry.

  • [Preferred] Experience with cost optimization, utilization analytics, and/or cloud efficiency initiatives.

Inclusion is a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.

We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.

Job is open for no less than 7 days and will be removed when the position is filled.

Other facts

Tech stack
Kubernetes,Container Runtimes,Linux/Unix Systems,Networking Fundamentals,Cloud Platforms,Programming,Auto Scaling,Fleet Management,Capacity Planning,Observability,Telemetry,Cost Optimization,Utilization Analytics,Cloud Efficiency

About Netflix

Netflix is one of the world's leading entertainment services, with over 300 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.

Team size: 10,001+ employees
LinkedIn: Visit
Industry: Entertainment Providers
Founding Year: 1997

What you'll do

  • The role involves ensuring the reliability and operational excellence of Netflix’s compute platforms by building automation and tooling, driving observability, and collaborating with engineering teams. Responsibilities include designing monitoring systems, modernizing technologies, and leading post-incident reviews.

Ready to join Netflix?

Take the next step in your career journey

Frequently Asked Questions

What does a Site Reliability Engineer 5 - Cloud Platform SRE do at Netflix?

As a Site Reliability Engineer 5 - Cloud Platform SRE at Netflix, you will: the role involves ensuring the reliability and operational excellence of Netflix’s compute platforms by building automation and tooling, driving observability, and collaborating with engineering teams. Responsibilities include designing monitoring systems, modernizing technologies, and leading post-incident reviews..

Why join Netflix as a Site Reliability Engineer 5 - Cloud Platform SRE?

Netflix is a leading Entertainment Providers company.

Is the Site Reliability Engineer 5 - Cloud Platform SRE position at Netflix remote?

The Site Reliability Engineer 5 - Cloud Platform SRE position at Netflix is based in Panama City, Panamá Province, Panama. Contact the company through Clera for specific work arrangement details.

How do I apply for the Site Reliability Engineer 5 - Cloud Platform SRE position at Netflix?

You can apply for the Site Reliability Engineer 5 - Cloud Platform SRE position at Netflix directly through Clera. Click the "Apply Now" button above to start your application. Clera's AI-powered platform will help match your profile with this opportunity and guide you through the application process. You can also learn more about Netflix on their website.