Director, Site Reliability Engineering

full-time•United Kingdom

Summary

Location

United Kingdom

Type

full-time

Experience

10+ years

Company links

Website LinkedIn

About this role

Req ID: 26-480

Come join our passionate team! Barracuda is a leading cybersecurity company providing complete protection against complex threats. Our platform protects email, data, applications, and networks with innovative solutions, and a managed XDR service, to strengthen cyber resilience. Hundreds of thousands of IT professionals and managed service providers worldwide trust us to protect and support them with solutions that are easy to buy, deploy, and use.

We know a diverse workforce adds to our collective value and strength as an organization. Barracuda Networks is proud to be an employer that complies with all applicable national, state and local laws pertaining to nondiscrimination and equal opportunity regardless of race, gender, religion, sex, sexual orientation, national origin, or disability.

Envision Yourself at Barracuda:

We are seeking a strategic and visionary Director of Site Reliability Engineering (SRE), in the Cloud Operations group, to lead global reliability initiatives across Barracuda’s SaaS portfolio. You will oversee a distributed team of Site Reliability Engineers and partner closely with Product Engineering, Security & Compliance, and other Cloud Operations teams to ensure our platforms are highly available, scalable, secure, and cost-efficient. This role will also drive AI-powered automation and agentic systems adoption to transform reliability operations.

What will you be working on:

Strategic Leadership: Define and execute Barracuda’s global SRE strategy, aligning reliability goals with business objectives and customer SLAs.
Operational Excellence: Drive continuous improvement in availability, latency, performance, and cost optimization across all cloud services.
AI & Agentic Systems Integration: Implement AI-driven observability and anomaly detection for proactive incident prevention; deploy agentic automation systems to manage routine operational tasks, optimize cloud resources, and accelerate remediation workflows; explore LLM-based runbooks and autonomous agents for incident triage and root cause analysis.
Cross-Functional Collaboration: Partner with Engineering, Security, and FinOps teams to embed reliability into product design and delivery pipelines.
Architecture & Governance: Influence architectural decisions for reliability, disaster recovery, and observability systems; ensure compliance with security and regulatory standards.
Automation & Tooling: Champion Infrastructure-as-Code and CI/CD automation at scale using Terraform, Cloud Formation, GitHub Actions, and Jenkins.
Incident & Risk Management: Facilitate incident response protocols, conduct executive-level postmortems, and implement proactive risk mitigation strategies.
Service Level Management: Define and enforce SLIs and SLOs across global services; report reliability metrics to executive leadership.
Team Development: Build and mentor a high-performing SRE organization; foster a culture of ownership, innovation, and collaboration across regions.
Cloud Optimization: Lead initiatives for cost governance and performance tuning in AWS and Azure environments.
Executive Communication: Present reliability roadmaps, KPIs, and risk assessments to senior leadership and stakeholders.

What you bring to the role:

Experience: 12+ years in infrastructure, cloud operations, or SRE roles, including 5+ years in leadership positions managing distributed teams.
Cloud Expertise: Deep knowledge of AWS and Azure architectures, security, and operations in large-scale SaaS environments.
AI & Automation: Experience implementing AI-driven observability, predictive analytics, and autonomous remediation systems.
Infrastructure as Code: Proven success implementing such as Terraform or CloudFormation at enterprise scale.
CI/CD & Automation: Advanced experience with GitHub Actions, Jenkins, and deployment strategies (blue/green, canary, rolling).
Container Orchestration: Expertise in Kubernetes (EKS, AKS) and containerized workloads.
Observability & Resilience: Strong background in Prometheus, Grafana, ELK, and APM tools; experience designing self-healing systems.
Programming: Proficiency in Python, Go, or similar languages for automation and tooling.
Leadership Skills: Exceptional ability to lead globally distributed teams, influence cross-functional stakeholders, and drive cultural change.
Certifications: AWS Solutions Architect/DevOps Professional and Kubernetes certifications (CKA, CKAD) preferred.

What You Will Get from Us:

A leadership role where your vision shapes the reliability of mission-critical systems.
Opportunities for career growth and executive visibility.
High-quality health benefits, retirement plan with employer match, and flexible time off.
The chance to work on cutting-edge cloud reliability challenges at scale.

#LI-hybrid

What you'll do

The Director of Site Reliability Engineering will lead global reliability initiatives across Barracuda’s SaaS portfolio and oversee a distributed team of Site Reliability Engineers. This role involves driving AI-powered automation and ensuring platforms are highly available, scalable, secure, and cost-efficient.

About Barracuda Networks Inc.

Barracuda is a leading global cybersecurity company providing complete protection against complex threats for all size business. Our AI-powered BarracudaONE platform secures email, data, applications and networks with innovative solutions, managed XDR and a centralized dashboard to maximize protection and strengthen cyber resilience. Trusted by hundreds of thousands of IT professionals and managed service providers worldwide, Barracuda delivers powerful defenses that are easy to buy, deploy and use.

Ready to join Barracuda Networks Inc.?

Take the next step in your career journey

Frequently Asked Questions

What does a Director, Site Reliability Engineering do at Barracuda Networks Inc.?

As a Director, Site Reliability Engineering at Barracuda Networks Inc., you will: the Director of Site Reliability Engineering will lead global reliability initiatives across Barracuda’s SaaS portfolio and oversee a distributed team of Site Reliability Engineers. This role involves driving AI-powered automation and ensuring platforms are highly available, scalable, secure, and cost-efficient..

Is the Director, Site Reliability Engineering position at Barracuda Networks Inc. remote?

The Director, Site Reliability Engineering position at Barracuda Networks Inc. is based in United Kingdom, United Kingdom. Contact the company through Clera for specific work arrangement details.

How do I apply for the Director, Site Reliability Engineering position at Barracuda Networks Inc.?

You can apply for the Director, Site Reliability Engineering position at Barracuda Networks Inc.directly through Clera. Click the "Apply Now" button above to start your application. Clera's AI-powered platform will help match your profile with this opportunity and guide you through the application process.

About this role

Req ID: 26-480

Envision Yourself at Barracuda:

What will you be working on:

Strategic Leadership: Define and execute Barracuda’s global SRE strategy, aligning reliability goals with business objectives and customer SLAs.
Operational Excellence: Drive continuous improvement in availability, latency, performance, and cost optimization across all cloud services.
AI & Agentic Systems Integration: Implement AI-driven observability and anomaly detection for proactive incident prevention; deploy agentic automation systems to manage routine operational tasks, optimize cloud resources, and accelerate remediation workflows; explore LLM-based runbooks and autonomous agents for incident triage and root cause analysis.
Cross-Functional Collaboration: Partner with Engineering, Security, and FinOps teams to embed reliability into product design and delivery pipelines.
Architecture & Governance: Influence architectural decisions for reliability, disaster recovery, and observability systems; ensure compliance with security and regulatory standards.
Automation & Tooling: Champion Infrastructure-as-Code and CI/CD automation at scale using Terraform, Cloud Formation, GitHub Actions, and Jenkins.
Incident & Risk Management: Facilitate incident response protocols, conduct executive-level postmortems, and implement proactive risk mitigation strategies.
Service Level Management: Define and enforce SLIs and SLOs across global services; report reliability metrics to executive leadership.
Team Development: Build and mentor a high-performing SRE organization; foster a culture of ownership, innovation, and collaboration across regions.
Cloud Optimization: Lead initiatives for cost governance and performance tuning in AWS and Azure environments.
Executive Communication: Present reliability roadmaps, KPIs, and risk assessments to senior leadership and stakeholders.

What you bring to the role:

Experience: 12+ years in infrastructure, cloud operations, or SRE roles, including 5+ years in leadership positions managing distributed teams.
Cloud Expertise: Deep knowledge of AWS and Azure architectures, security, and operations in large-scale SaaS environments.
AI & Automation: Experience implementing AI-driven observability, predictive analytics, and autonomous remediation systems.
Infrastructure as Code: Proven success implementing such as Terraform or CloudFormation at enterprise scale.
CI/CD & Automation: Advanced experience with GitHub Actions, Jenkins, and deployment strategies (blue/green, canary, rolling).
Container Orchestration: Expertise in Kubernetes (EKS, AKS) and containerized workloads.
Observability & Resilience: Strong background in Prometheus, Grafana, ELK, and APM tools; experience designing self-healing systems.
Programming: Proficiency in Python, Go, or similar languages for automation and tooling.
Leadership Skills: Exceptional ability to lead globally distributed teams, influence cross-functional stakeholders, and drive cultural change.
Certifications: AWS Solutions Architect/DevOps Professional and Kubernetes certifications (CKA, CKAD) preferred.

What You Will Get from Us:

A leadership role where your vision shapes the reliability of mission-critical systems.
Opportunities for career growth and executive visibility.
High-quality health benefits, retirement plan with employer match, and flexible time off.
The chance to work on cutting-edge cloud reliability challenges at scale.

#LI-hybrid

What you'll do

The Director of Site Reliability Engineering will lead global reliability initiatives across Barracuda’s SaaS portfolio and oversee a distributed team of Site Reliability Engineers. This role involves driving AI-powered automation and ensuring platforms are highly available, scalable, secure, and cost-efficient.

About Barracuda Networks Inc.

Director, Site Reliability Engineering

Summary

Location

Type

Experience

Company links

About this role

Envision Yourself at Barracuda:

What will you be working on:

What you bring to the role:

What You Will Get from Us:

What you'll do

About Barracuda Networks Inc.

Ready to join Barracuda Networks Inc.?

Frequently Asked Questions

What does a Director, Site Reliability Engineering do at Barracuda Networks Inc.?

Is the Director, Site Reliability Engineering position at Barracuda Networks Inc. remote?

How do I apply for the Director, Site Reliability Engineering position at Barracuda Networks Inc.?

Director, Site Reliability Engineering

Summary

Location

Type

Experience

Company links

About this role

Envision Yourself at Barracuda:

What will you be working on:

What you bring to the role:

What You Will Get from Us:

What you'll do

About Barracuda Networks Inc.

Ready to join Barracuda Networks Inc.?

Frequently Asked Questions

What does a Director, Site Reliability Engineering do at Barracuda Networks Inc.?

Is the Director, Site Reliability Engineering position at Barracuda Networks Inc. remote?

How do I apply for the Director, Site Reliability Engineering position at Barracuda Networks Inc.?

Join Clera's Talent Pool

Join Clera's Talent Pool