We are seeking a DevOps Engineer Lead to design, build, and operate reliable, secure, and cost-efficient infrastructure across AWS and on-premises environments. This role is hands-on and leadership-oriented, responsible for infrastructure provisioning, system reliability, CI/CD orchestration, container platforms, security controls, and cost optimization.
You will work closely with engineering and product teams to ensure scalable, observable, and resilient systems while mentoring junior DevOps engineers.
Key ResponsibilitiesInfrastructure & Platform Engineering:-
-Design, provision, and manage infrastructure across AWS and on-prem environments using Infrastructure as Code (IaC).
-Provision and manage Linux-based servers, virtual machines, and bare-metal systems.
-Configure and manage network components including VPCs, subnets, routing, firewalls, load balancers, VPNs, and network switches (on-prem).
-Lead setup and maintenance of PostgreSQL databases, including backups, replication, performance tuning, and high availability.
Containerization & Orchestration:--Design, deploy, and operate Kubernetes clusters (EKS and/or self-managed).
-Build and maintain Docker-based container workflows.
-Manage Kubernetes workloads, namespaces, ingress, secrets, auto scaling, and rolling deployments.
-Establish best practices for container security, image scanning, and runtime hardening.
CI/CD & Automation:-
-Design and maintain CI/CD pipelines for application and infrastructure code.
-Orchestrate automated builds, tests, deployments, and rollbacks.
-Implement GitOps or pipeline-driven deployment strategies.
-Write and maintain automation scripts using Bash, Python, or similar scripting languages.
Monitoring, Reliability & Operations:-
-Implement system monitoring, logging, and alerting for infrastructure and applications.
-Define and track SLIs/SLOs and lead incident response and root cause analysis.
-Ensure high availability, fault tolerance, and disaster recovery readiness.
-Lead operational run books and on-call best practices.
Security & Compliance:-
-Implement infrastructure and platform security best practices.
-Manage IAM, secrets, encryption, network security, and access controls.
-Support vulnerability management, patching, and audit readiness.
-Collaborate with security teams to align with compliance and governance requirements.
Cost Optimization & Performance:--Monitor and optimize AWS cloud spend using cost visibility and reporting tools.
-Identify opportunities for rightsizing, reserved instances, auto scaling, and storage optimization.
-Track on-prem infrastructure utilization and capacity planning.
-Balance performance, reliability, and cost trade-offs.
Leadership & Collaboration:-
-Act as a technical lead and mentor for DevOps engineers.
-Define DevOps standards, best practices, and documentation.
-Partner with engineering teams to improve developer experience and deployment velocity.
-Participate in architecture discussions and infrastructure roadmap planning.
Requirements
Required Qualifications
-3+ years of hands-on experience in DevOps, SRE, or Platform Engineering roles.
-Strong experience with AWS (EC2, VPC, IAM, RDS, EKS, S3, CloudWatch, etc.).
-Proven experience managing on-prem infrastructure (servers, virtualization, networking).
-Deep Linux system administration experience.
-Hands-on experience with PostgreSQL administration.
-Strong knowledge of Docker and Kubernetes.
-Experience with CI/CD tools and infrastructure automation.
-Proficiency in scripting (Bash, Python, or similar).
-Strong understanding of infrastructure security principles.
-Experience with monitoring, logging, and incident response.
Preferred / Nice-to-Have Qualifications-Experience with Infrastructure as Code tools (Terraform, CloudFormation, Ansible, etc.).
-Experience operating hybrid or multi-cloud environments.
-Exposure to GitOps, service meshes, or advanced Kubernetes patterns.
-Prior experience leading or mentoring DevOps engineers.
-Experience supporting regulated or compliance-driven environments.
Benefits
What Success Looks Like
-Stable, secure, and scalable infrastructure across AWS and on-prem.
-Highly automated provisioning and deployment pipelines.
-Strong observability and fast incident resolution.
-Measurable reductions in cloud and infrastructure costs.
-Improved developer velocity and platform reliability.