We are seeking a skilled Site Reliability Engineer (SRE) with 3–5 years of experience to join our reliability and infrastructure team. The ideal candidate will have a strong background in systems engineering, cloud platforms, and automation, with a passion for building resilient, scalable, and observable systems. This role involves both hands-on engineering and collaboration with cross-functional teams to improve reliability and developer productivity.
System Reliability & Operations
Ensure high availability and performance of production systems.
Participate in on-call rotations and incident response, driving root cause analysis and postmortems.
Implement monitoring, alerting, and observability solutions to proactively detect issues.
Infrastructure & Automation
Design, build, and maintain CI/CD pipelines for seamless deployments.
Automate infrastructure provisioning and scaling using Infrastructure-as-Code (Terraform, Ansible, etc.).
Manage containerized workloads with Docker and Kubernetes.
Performance & Scalability
Conduct capacity planning, load testing, and performance tuning.
Optimize system reliability through fault-tolerant design and distributed systems best practices.
Collaborate with developers to improve application performance and resilience.
Security & Compliance
Implement security best practices in infrastructure and operations.
Ensure compliance with organizational and regulatory standards.
Contribute to disaster recovery and business continuity planning.
Collaboration & Continuous Improvement
Work closely with development teams to embed reliability into the software lifecycle.
Document processes, runbooks, and operational standards.
Contribute to a culture of continuous learning and improvement.
3–5 years of experience in SRE, DevOps, or systems engineering roles.
Strong knowledge of Linux/Unix systems and shell scripting.
Hands-on experience with cloud platforms (AWS, Azure, GCP).
Proficiency with container orchestration (Kubernetes) and CI/CD pipelines.
Familiarity with monitoring and observability tools (Prometheus, Grafana, ELK, Datadog).
Solid understanding of networking fundamentals (TCP/IP, DNS, HTTP).
Strong problem-solving and troubleshooting skills.
Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent practical experience).
Certifications in cloud (AWS Certified Solutions Architect, GCP Professional Cloud Engineer, etc.) or Kubernetes (CKA/CKAD) are a plus.
Take the next step in your career journey
Get matched with similar opportunities at top startups
This role is hosted on Prodapt's careers site.
Join our talent pool first to get notified about similar roles that match your profile.