About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
Skills: Site Reliability Engineering, DevOps, Platform Engineering, Kubernetes, GCP
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
Skills: Data Annotation, Project Management, Python, SQL, Machine Learning Workflows
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle — from develop…
Skills: Strategic Planning, Operational Excellence, Problem Solving, Cross-functional Collaboration, Data Analysis
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
Skills: General Management, Site Leadership, Strategic Planning, Operational Excellence, Talent Acquisition
San Francisco, California, United States · On-site
$150k–$250k/yr
Senior$110M raised
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
San Francisco, California, United States · On-site
$150k–$250k/yr
Senior$110M raised
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
Skills: Full-stack Engineering, 3D Rendering, WebGL, Python, TypeScript
San Francisco, California, United States · On-site
$150k–$250k/yr
Senior$110M raised
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
Skills: Demand Generation, Pipeline Strategy, Team Leadership, Account-Based Marketing, Field Event Strategy
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
Skills: Full-stack Engineering, Python, TypeScript, React, API Design
About us Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from developm…
Sign up with Clera and we'll reach out the moment a role actually fits you — no more spraying applications into the void.
Full-time
Competitive Salary, Commission, Equity, 25 Days Annual Leave, UK Public Holidays, Annual Learning & Development Budget
Posted 3d ago
~40 hrs/week
Responsibilities
Lead the planning and execution of core infrastructure to ensure the platform is performant, reliable, and scalable as data grows to petabyte scale. Responsibilities include managing cloud infrastructure on GCP and AWS, defining SLIs/SLOs, and improving developer productivity through automation.
Requirements
Requires hands-on experience in SRE, DevOps, or platform engineering within a production environment. Candidates must have strong fundamentals in distributed systems, networking, and observability tools.
Full job description
About us
Encord is the universal data layer for AI that helps 300+ AI teams train and run models on the right data. Our platform indexes, curates, annotates, and evaluates data across the full AI lifecycle, from development through production.
Trusted by Woven by Toyota, AXA, UiPath, Zipline, and more. We're an ambitious team of 100+ working at the frontier of AI and have raised $60M in Series C funding from Wellington Management, CRV, Next47 and Y Combinator.
The role
We're looking for a Senior Site Reliability Engineer to join our growing platform engineering team. You'll be embedded in the teams building and operating Encord's core infrastructure, ensuring our platform is performant, reliable, observable, and scalable.
You will lead the planning and execution of efforts needed as we grow from our customer base from hundreds to thousands of AI teams worldwide, and the volume of AI training and supervision data managed by our platform from TBs to PBs of data.
You'll drive a culture of performant and resilient software through individual contributions and collaboration with multiple squads.
What You'll Do
Performance & Capacity — Profile and optimise services handling large-scale data pipelines; perform capacity planning for storage and compute-intensive workloads. Work with squads to establish performance benchmarks and expectations
Collaboration — Partner closely with backend and ML engineers to improve deployment pipelines (CI/CD), review infrastructure changes, and champion reliability best practices.
Reliability & Availability — Define and own SLIs/SLOs/SLAs for critical services; build alerting, runbooks, and incident response processes; lead postmortems with a blameless culture.
Infrastructure & Cloud — Design, deploy, and maintain cloud infrastructure on GCP and AWS; manage Kubernetes clusters, networking, and storage at petabyte scale.
Automation & Tooling — Work to improve developer productivity and guide and review automation and tooling efforts across the engineering group.
Observability — Instrument services with distributed tracing, logging, and metrics (Prometheus, Grafana, OpenTelemetry, Datadog or similar); build infrastructure, define best practices and work with each squad to ensure every service is observable before it goes to production.
What We're Looking For
Experienceon hands-on SRE, DevOps, platform engineering experience or similar in a production environment.
Strong fundamentals in designing, building and maintaining resilient distributed and/or high performance systems
Solid understanding of networking, operating systems and database technologies
Experience with observability fundamentals — metrics, logs, traces, and alerting.
Comfortable with on-call rotations and incident management.
Tech stack
We are technology agnostic at Encord and not looking for experience across all of these — as long as you're open to learning, please apply.
Backend: Python and Rust
Frontend: TypeScript and React
Deployment: Kubernetes
Infrastructure: GCP
Why Encord
Competitive salary, commission, and meaningful equity in a high-growth startup
Strong in-person culture — most of the team works from our London office 4+ days/week
25 days annual leave + UK public holidays
Annual learning & development budget
Travel for customer visits, events, and conferences across the UK and Europe
Encord is the data layer for physical AI. We're how the world's most ambitious AI teams turn messy, multimodal data into production systems - from humanoid robots to autonomous vehicles to smart infrastructure. 300+ teams including Toyota, Skydio, and Maxar rely on Encord to curate, manage, and align the data their models actually need. $110M raised. San Francisco, New York, and London.
Offices: San Francisco, California, US · London, GB
Active LearningData EngineArtificial IntelligenceComputer VisionMachine LearningData AnnotationImage AnnotationVideo AnnotationAutomated LabelingGround Truth Data
Encord is the data layer for physical AI. We're how the world's most ambitious AI teams turn messy, multimodal data into production systems - from humanoid robots to autonomous vehicles to smart infrastructure. 300+ teams including Toyota, Skydio, and Maxar rely on Encord to curate, manage, and align the data their models actually need. $110M raised. San Francisco, New York, and London.
Offices: San Francisco, California, US · London, GB
Active LearningData EngineArtificial IntelligenceComputer VisionMachine LearningData AnnotationImage AnnotationVideo AnnotationAutomated LabelingGround Truth Data