Clera - Your AI talent agent
LoginStart
Start
Latent logo
Latent

Site Reliability Engineer

on-site•New York, San Francisco•$200k - $300k+ Competitive

Summary

Location

New York, San Francisco

Salary

$200k - $300k

Equity

Competitive

Workplace

On-site

Experience

7+ years

Visa

Visa sponsorship available (All sponsorships available except net new H1Bs)

Company links

WebsiteLinkedInLinkedIn

About this role

You are the infrastructure expert who enables our rapid product development and guarantees 99.9%+ stability and performance of our clinical AI platform for major health systems. Your focus on operational excellence is directly tied to a patient’s access to life-saving treatment.

As our SRE, you will be responsible for our entire production environment and improve the development experience across both infrastructure and application reliability.

Infrastructure Responsibilities

  • Infrastructure Ownership: Design, implement, and maintain the production environment, having previously handled 500+ machine deployments.

  • Kubernetes Mastery: Own our containerized infrastructure, leveraging deep expertise in Kubernetes and Helm to manage deployment, scaling, and operational health.

  • CI/CD & Deployment Optimization: Optimize and streamline both the TypeScript and Python/ML deployment pipelines to support high-velocity feature release while maintaining the highest reliability.

  • DevX Support: Support Developer Experience (DevX) work to streamline developer workflows, enhance tool proficiency, and improve CI/CD systems.

  • Infrastructure as Code (IaC): Manage and maintain infrastructure definitions using Terraform.

Application Reliability Responsibilities

  • Service Reliability Strategy (SLIs/SLOs): Define, implement, and evolve SLIs and SLOs for existing and new services; partner with engineering and product to align targets with patient and customer impact.

  • Observability Standards: Extend and standardize in-application observability by introducing consistent metrics, OpenTelemetry traces, and events/logs (including naming conventions, required attributes, and dashboards/alerts).

  • SLO-Driven Operations: Monitor reliability and performance against SLOs; respond to SLO violations by driving corrective actions and validating outcomes (not just mitigating symptoms).

  • Performance & Scalability Improvements: Use trace and metrics data to identify bottlenecks (e.g., slow endpoints, expensive queries, queue backlogs) and implement or drive improvements in application code and configurations.

  • Database & Dependency Optimization: Improve reliability and latency through targeted changes such as database indexing, query optimization, connection pooling, caching strategies, and safe dependency timeouts/retries.

  • Incident Learnings to Engineering Outcomes: Lead/participate in incident response and post-incident reviews with a bias toward durable fixes—shipping instrumentation, tests, guardrails, and code changes that prevent recurrence.

The ideal candidate for this role is someone who enjoys working both on infrastructure configuration and contributing performance and reliability enhancements to application code. You likely spent several years full time as a backend software engineer but have also contributed heavily to terraform IaC projects and have deep knowledge of deploying and running applications on Kubernetes.

About Latent

Latent is solving complex challenges in healthcare through robust and reliable Medical Intelligence. Our AI-driven solutions significantly enhance workflows for hospitals and clinics, enabling patients faster medication access while substantially boosting healthcare provider revenues. We are developing a suite of software solutions in conjunction with major healthcare systems to create fast, effective, and affordable access to a provider for every patient in the United States.

Ready to join Latent?

Take the next step in your career journey

Frequently Asked Questions

What does Latent pay for a Site Reliability Engineer?

Toggle
Latent offers a competitive compensation package for the Site Reliability Engineer role. The salary range is USD 200k - 300k per year, plus Competitive equity. Apply through Clera to learn more about the full compensation details.

What does a Site Reliability Engineer do at Latent?

Toggle
The Site Reliability Engineer role at Latent involves You are the infrastructure expert who enables our rapid product development and guarantees 99.9%+ stability and performance of our clinical AI platform for major health systems. Your focus on operatio...

Is the Site Reliability Engineer position at Latent remote?

Toggle
The Site Reliability Engineer position at Latent is based in New York, United States and San Francisco, United States and is on-site. Contact the company through Clera for specific work arrangement details.

How do I apply for the Site Reliability Engineer position at Latent?

Toggle
You can apply for the Site Reliability Engineer position at Latentdirectly through Clera. Click the "Apply Now" button above to start your application. Clera's AI-powered platform will help match your profile with this opportunity and guide you through the application process.
Clera - Your AI talent agent
© 2026 Clera Labs, Inc.TermsPrivacyHelp