Job Summary
Vertiv is seeking a skilled Platform Operations Engineer (Site Reliability Engineer) to serve as the owner of cross-platform observability, incident management, and operational reliability within Vertiv’s Digital organization. This individual contributor role is responsible for designing, implementing, and continuously improving monitoring and alerting solutions across Vertiv’s digital platform ecosystem — including Compass AI, Writer AI, Site Scope, UiPath, Workato, Cursor, and other approved enterprise tools — while owning incident response processes, SLA management, and operational governance. The Platform Operations / SRE will operate within the Digital organization and play a central role in advancing Vertiv’s Operational Excellence strategic priority by ensuring the availability, performance, and resilience of platforms that power critical digital workflows and business functions.
As an individual contributor in a lead capacity, this role includes proactive reliability engineering — applying SRE principles such as SLOs, error budgets, and blameless post-mortems — and embedding secure coding and operational governance practices across the Digital organization. The Platform Operations / SRE Engineer will define and enforce observability standards, lead incident response and root cause analysis, manage platform-level SLAs, and partner with engineering, security, and business stakeholders to ensure that all digital platforms meet agreed availability and performance targets.
This position partners closely with IT Security, NPDI, Digital delivery teams, and business operations, and is based on site at Vertiv’s Westerville, OH headquarters.
Responsibilities
- Own Cross-Platform Monitoring & Observability: Design, implement, and maintain end-to-end monitoring, alerting, and observability solutions across Vertiv’s digital platform ecosystem — including AI platforms, automation tools, and internal applications — ensuring real-time visibility into system health, performance, and availability.
- Lead Incident Response & Management: Serve as the primary escalation point and incident commander for P1/P2 incidents across Digital platforms; lead root cause analysis (RCA), blameless post-mortems, and corrective action tracking to prevent recurrence and reduce mean time to resolution (MTTR).
- Manage Platform SLAs & Reliability Targets: Define, instrument, and enforce service level objectives (SLOs), service level indicators (SLIs), and error budgets across Digital platforms; produce regular SLA performance reports for leadership and drive platform improvements to meet or exceed agreed availability and performance targets.
- Drive Secure Coding & Operational Governance: Champion secure coding practices and DevSecOps standards within Digital delivery teams; conduct operational readiness reviews for new platform deployments, enforce configuration management and change control processes, and partner with IT Security and NPDI to ensure all platforms meet Vertiv’s security and compliance requirements.
- Automate Operations & Reduce Toil: Identify and eliminate manual operational toil through automation. This includes automated remediation runbooks and anomaly detection through the use of scripting, IaC tools, and approved automation platforms.
- Capacity Planning & Performance Engineering: Analyze platform utilization trends and conduct capacity planning across Digital environments; proactively identify performance bottlenecks and recommend architectural improvements to ensure platforms scale reliably with business demand.
- CI/CD Pipeline Reliability & Deployment Support: Partner with Digital delivery teams to ensure CI/CD pipelines are instrumented for reliability, deployment risk is managed through progressive rollout strategies, and production deployments are supported with appropriate rollback and health-check capabilities.
- Evaluate & Advance Observability Tooling: Stay current on advancements in observability, AIOps, and SRE tooling; evaluate and recommend new tools and practices that enhance Vertiv’s platform operations maturity, and drive adoption of modern reliability engineering standards across the Digital organization.
Requirements
- Bachelor’s degree in Computer Science, Information Systems, Engineering, or a related field; equivalent practical experience considered.
- 5+ years of professional experience in platform operations, site reliability engineering, DevOps, or a related software/infrastructure engineering discipline.
- 3+ years of hands-on experience with enterprise monitoring and observability platforms (e.g., Datadog, Grafana, Prometheus, Azure Monitor, Splunk, or equivalent) in a multi-platform environment.
- Demonstrated experience owning and managing incident response processes, post-mortem facilitation, and SLA/SLO governance.
- Experience implementing secure coding practices, DevSecOps standards, or operational governance frameworks in an enterprise software delivery environment.
Technical Skills
- Proficiency with monitoring and observability tools (Datadog, Grafana, Prometheus, Azure Monitor, Splunk, or equivalent) for cross-platform health and performance tracking.
- Strong knowledge of SRE principles, including SLOs, SLIs, blameless post-mortems, and toil reduction practices.
- Hands-on experience with cloud platforms (AWS preferred) and familiarity with containerized environments (Docker, Kubernetes) and infrastructure-as-code tooling (Terraform, Ansible, or equivalent).
- Proficiency in at multiple programming languages (Python, Ruby, Powershell, Java, Javascript, C#, etc.) for automation and runbook development.
- Experience with CI/CD platforms (GitLab, Jenkins, GitHub Actions, Azure DevOps, or equivalent) and deployment reliability practices including progressive rollout, feature flags, and automated health checks.
Preferred Qualifications
- Google SRE certification, AWS DevOps Professional, Azure certifications, or equivalent SRE/cloud operations certification.
- Experience with AIOps tooling or AI-assisted anomaly detection and automated remediation capabilities.
- Familiarity with the Vertiv digital platform ecosystem: Workato, UiPath, Power Automate, Compass AI, Writer AI, or Cursor.
- Experience applying DevSecOps practices, including SAST/DAST scanning, secrets management, and compliance-as-code in enterprise environments.
- Experience working in Agile/Scrum delivery environments; familiarity with ITIL incident and change management frameworks.
The successful candidate will embrace Vertiv’s Core Principals & Behaviors to help execute our Strategic Priorities.
OUR CORE PRINCIPALS: Safety. Integrity. Respect. Teamwork. Diversity & Inclusion.
OUR STRATEGIC PRIORITIES
• Customer Focus
• Operational Excellence
• High-Performance Culture
• Innovation
• Financial Strength
OUR BEHAVIORS
• Own It
• Act With Urgency
• Foster a Customer-First Mindset
• Think Big and Execute
• Lead by Example
• Drive Continuous Improvement
• Learn and Seek Out Development
About Vertiv
Vertiv is a $10.2 billion global critical infrastructure and data center technology company. We ensure customers’ vital applications run continuously by bringing together hardware, software, analytics and ongoing services. Our portfolio includes power, cooling and IT infrastructure solutions and services that extends from the cloud to the edge of the network. Headquartered in Columbus, Ohio, USA, Vertiv employs around 20,000 people and does business in more than 130 countries. Visit Vertiv.com to learn more.
Work Authorization
No calls or agencies please. Vertiv will only employ those who are legally authorized to work in the United States. This is not a position for which sponsorship will be provided. Individuals with temporary visas such as E, F-1, H-1, H-2, L, B, J, or TN or who need sponsorship for work authorization now or in the future, are not eligible for hire.
Equal Opportunity Employer
Vertiv is an Equal Opportunity/Affirmative Action employer. We promote equal opportunities for all with respect to hiring, terms of employment, mobility, training, compensation, and occupational health, without discrimination as to age, race, color, religion, creed, sex, pregnancy status (including childbirth, breastfeeding, or related medical conditions), marital status, sexual orientation, gender identity / expression (including transgender status or sexual stereotypes), genetic information, citizenship status, national origin, protected veteran status, political affiliation, or disability. If you have a disability and are having difficulty accessing or using this website to apply for a position, you can request help by sending an email to [email protected].
#LI-RB1
Work Authorization
No calls or agencies please. Vertiv will only employ those who are legally authorized to work in the United States. This is not a position for which sponsorship will be provided. Individuals with temporary visas such as E, F-1, H-1, H-2, L, B, J, or TN or who need sponsorship for work authorization now or in the future, are not eligible for hire.
Equal Opportunity Employer
We promote equal opportunities for all with respect to hiring, terms of employment, mobility, training, compensation, and occupational health, without discrimination as to age, race, color, religion, creed, sex, pregnancy status (including childbirth, breastfeeding, or related medical conditions), marital status, sexual orientation, gender identity / expression (including transgender status or sexual stereotypes), genetic information, citizenship status, national origin, protected veteran status, political affiliation, or disability.