Experience Required: 8+ Years
Key Responsibilities :
• Monitor, maintain, and improve reliability, availability, and performance of enterprise applications and infrastructure.
• Implement ITSM processes such as incident, problem, and change management to ensure operational excellence.
• Identify and eliminate bottlenecks by developing automation and proactive monitoring solutions.
• Collaborate with development and infrastructure teams to ensure smooth deployment and reliable operation of applications.
• Participate in on-call rotations and shift operations, ensuring critical incident response and timely resolution.
• Conduct root cause analysis (RCA) for high-impact incidents and drive permanent fixes.
• Develop and maintain runbooks, standard operating procedures (SOPs), and service documentation.
• Gather metrics, generate performance reports, and support continuous improvement initiatives.
Required Skills and Competencies
• Strong understanding of ITSM frameworks (preferably ITIL) and service operations for enterprise-scale environments.
• Experience in application monitoring, alerting, and observability tools (e.g., Prometheus, Grafana, Splunk, AppDynamics, or Dynatrace).
• Familiarity with cloud infrastructure (AWS, Azure, or GCP) and key DevOps/SRE practices.
• Proficiency in incident response, system troubleshooting, and performance optimization.
• Basic scripting or automation skills (Python, Shell, or PowerShell) for operational efficiency.
• Excellent collaboration and communication skills with a proactive problem-solving mindset.
Willingness to work in rotational shifts and support 24×7 production environments.
Salary: As per Market StandardNatobotics Ltd founded in 2012, provides boutique technology, digital customer experience, consulting and software development services. Over the years we had been supporting our clients in the areas of cloud transformation, Big data & Analytics, Digital Journey, Machine Learning and Platforms.
Take the next step in your career journey