Site Reliability Engineer

New York +1 · On-site$200k – $300k + EquityVisa Sponsorship Available

About this role

You are the infrastructure expert who enables our rapid product development and guarantees 99.9%+ stability and performance of our clinical AI platform for major health systems. Your work directly impacts patient access to life-saving treatment, so reliability and developer experience matter at every step. You’ll own the production environment and drive reliability improvements across both infrastructure and application code.

What youll do

Infrastructure Ownership: Design, implement, and maintain the production environment, drawing on experience deploying 500+ machines.
Kubernetes Mastery: Own containerized infrastructure using Kubernetes and Helm to manage deployment, scaling, and operational health.
CI/CD & Deployment Optimization: Optimize TypeScript and Python/ML deployment pipelines for high-velocity releases with high reliability.
DevX Support: Improve developer experience and CI/CD systems to streamline workflows.
Infrastructure as Code: Manage infrastructure definitions using Terraform.
Service Reliability Strategy: Define, implement, and evolve SLIs/SLOs aligned with patient and customer impact.
Observability Standards: Extend in-application observability with consistent metrics, OpenTelemetry traces, and events/logs, plus dashboards and alerts.
SLO-Driven Operations: Monitor reliability against SLOs and drive corrective actions when violations occur.
Performance & Scalability Improvements: Use trace and metrics data to identify bottlenecks and drive improvements in code and configurations.
Database & Dependency Optimization: Improve reliability and latency through indexing, query optimization, pooling, caching, and safe timeouts/retries.
Incident Learnings to Engineering Outcomes: Lead or participate in incident responses and post-incident reviews with durable fixes and guardrails.

What Latent is looking for

7+ years in Site Reliability Engineering positions.
Experience automating single-tenant infrastructure provisioning and optimizing deployment pipelines for application code or ML models.
Extensive time at high-growth startups (no recent Big Tech).
Experience transitioning from backend software development to infrastructure roles.
Proficiency in Python and TypeScript.
Experience with AWS, PostgreSQL, Redis, and Kafka.
Experience in regulated industries, particularly health tech and HIPAA.
Experience with IaC and orchestration tools including Kubernetes, Helm, Terraform, and Terragrunt.
Demonstrated ability to drive systematic improvements across complex, distributed systems with high-availability requirements.
Open-source contributions and LinkedIn recommendations.

Latent develops AI-driven medical intelligence software to optimize hospital and clinic workflows, speed medication access for patients, and boost provider revenues, collaborating with major healthcare systems to expand fast, affordable provider access for every patient in the United States.

IndustryHealthTech

Team Size51-200

WorkspaceOn-site

StageSeries A

Founded2022

Locations

New York, United States ·San Francisco, United States

Investors

Conviction ·General Catalyst ·McKesson Ventures ·Spark Capital ·Transformation Capital ·Y Combinator

Websitelatenthealth.com

LinkedInLinkedIn

LatentBacked byTransformation Capital

6 open roles on Latent

Site Reliability Engineer

New York +1 · On-site$200k – $300k + EquityVisa Sponsorship Available

About this role

What youll do

Infrastructure Ownership: Design, implement, and maintain the production environment, drawing on experience deploying 500+ machines.
Kubernetes Mastery: Own containerized infrastructure using Kubernetes and Helm to manage deployment, scaling, and operational health.
CI/CD & Deployment Optimization: Optimize TypeScript and Python/ML deployment pipelines for high-velocity releases with high reliability.
DevX Support: Improve developer experience and CI/CD systems to streamline workflows.
Infrastructure as Code: Manage infrastructure definitions using Terraform.
Service Reliability Strategy: Define, implement, and evolve SLIs/SLOs aligned with patient and customer impact.
Observability Standards: Extend in-application observability with consistent metrics, OpenTelemetry traces, and events/logs, plus dashboards and alerts.
SLO-Driven Operations: Monitor reliability against SLOs and drive corrective actions when violations occur.
Performance & Scalability Improvements: Use trace and metrics data to identify bottlenecks and drive improvements in code and configurations.
Database & Dependency Optimization: Improve reliability and latency through indexing, query optimization, pooling, caching, and safe timeouts/retries.
Incident Learnings to Engineering Outcomes: Lead or participate in incident responses and post-incident reviews with durable fixes and guardrails.

What Latent is looking for

7+ years in Site Reliability Engineering positions.
Experience automating single-tenant infrastructure provisioning and optimizing deployment pipelines for application code or ML models.
Extensive time at high-growth startups (no recent Big Tech).
Experience transitioning from backend software development to infrastructure roles.
Proficiency in Python and TypeScript.
Experience with AWS, PostgreSQL, Redis, and Kafka.
Experience in regulated industries, particularly health tech and HIPAA.
Experience with IaC and orchestration tools including Kubernetes, Helm, Terraform, and Terragrunt.
Demonstrated ability to drive systematic improvements across complex, distributed systems with high-availability requirements.
Open-source contributions and LinkedIn recommendations.

IndustryHealthTech

Team Size51-200

WorkspaceOn-site

StageSeries A

Founded2022

Locations

New York, United States ·San Francisco, United States

Investors

Conviction ·General Catalyst ·McKesson Ventures ·Spark Capital ·Transformation Capital ·Y Combinator

Websitelatenthealth.com

LinkedInLinkedIn

About the Team

Team Distribution

Engineering37%
Sales35%
Operations19%
Leadership3%
Product & Design3%

Where the Team Studied

1.Cornell University
2.University of California, Berkeley
3.Case Western Reserve University
4.Stanford University
5.Princeton University

Team Worked At

Broad Institute of MIT and Harvard
CVS Health
Blue Shield of California
Hartford HealthCare
Modern Health

Funding History

Series AApr 2026

$80M

raised

Latent Health, a clinical-AI company focused on medication access, raised $80M in a Series A co-led by Spark Capital and Transformation Capital with participation from Conviction, McKesson Ventures, General Catalyst, and Y Combinator.

Transformation CapitalSpark CapitalY CombinatorMcKesson VenturesGeneral CatalystConviction

Life at Latent

Know someone who'd be great for this?

Tools

Explore

Company

Tools

Explore

Company

Site Reliability Engineer

About this role

What youll do

What Latent is looking for

Company at a glance

Tools

Explore

Company

Tools

Explore

Company

Site Reliability Engineer

About this role

What youll do

What Latent is looking for

Company at a glance

About the Team

Team Distribution

Where the Team Studied

Team Worked At

Funding History

Life at Latent

Tools

Explore

Company

About the Team

Team Distribution

Where the Team Studied

Team Worked At

Funding History

Life at Latent