Pennymac logo
Site Reliability Operations (SRO) Engineer III
full-timeWestlake Village$75k - $130k

Summary

Location

Westlake Village

Salary

$75k - $130k

Type

full-time

Explore Jobs

About this role

PENNYMAC

Pennymac is (NYSE: PFSI) is a specialty financial services firm with a comprehensive mortgage platform and integrated business focused on the production and servicing of U.S. mortgage loans and the management of investments related to the U.S. mortgage market. 

 

At Pennymac, our people are the foundation of our success and at the heart of our dynamic work culture. Together, we work towards a unified goal of helping millions of Americans achieve aspirations of homeownership through the complete mortgage journey.


A Typical Day

As a member of the Site Reliability Operations (SRO) team, you will help provide 24/7 monitoring and support of the company’s IT Infrastructure. Ideal candidates should have experience in Windows and Linux administration, in addition to experience working in AWS, as Pennymac is now almost completely migrated into the AWS cloud. Individuals in this role should be comfortable working in a fast-paced environment. Multitasking, in addition to communicating quickly and accurately, is critical to the success of anyone in this role.

 

The Site Reliability Engineer - Observability Operations will:

  • Monitoring – Oversee 24/7 health monitoring of the company’s IT Infrastructure using tools such as AWS CloudWatch and New Relic. Drive observability maturity across the organization by identifying coverage gaps and implementing targeted improvements.
  • Alert Management – Own the ongoing refinement of operational alerts. Implement advanced alerting rules and thresholds to proactively identify issues, reduce noise, and ensure every alert drives action.
  • Observability Gap Analysis – Partner with Incident Management to identify monitoring and alerting gaps discovered during incident triage; prioritize and implement enhancements to prevent recurrence.
  • App Team Engagement – Serve as an observability resource to application teams, assessing current instrumentation and providing actionable recommendations to improve monitoring maturity.
  • Alert Quality Ownership – Lead initiatives to reduce alert noise, improve signal-to-noise ratio, and ensure every alert is actionable with clear runbook linkage.
  • Operational Dashboard Development – Design and maintain operationally-focused dashboards in New Relic that support 24/7 triage, SLA tracking, and real-time incident response.
  • Incident Management – Serve as an escalation point for complex incidents. Collaborate closely with the Incident Management team, Application Developers, Internal Support Teams, and 3rd Party Vendors to ensure timely and accurate resolution of service disruptions
  • Advanced Systems Administration – Perform and troubleshoot a wide range of administrative tasks across Windows and Linux environments. Assist in optimizing system performance, conducting root-cause analyses, and implementing long-term fixes
  • Virtual Server and Desktop Management – Handle more complex tasks associated with maintaining and troubleshooting the company’s virtual infrastructure. Provide guidance to junior engineers for routine issues
  • Technical Troubleshooting and Investigation – Tackle advanced technical issues that are escalated from Engineer I/II. Conduct deep dives into infrastructure and application logs to pinpoint underlying problems
  • Internal and External Escalation – Act as a liaison between multiple internal teams and external vendors for high-priority incidents. Ensure swift coordination and minimize downtime
  • Change Management – Strictly follow and help refine the company’s established Change Management processes. Provide risk assessments and validation for proposed changes before approval
  • Communication – Monitor and respond to incoming Calls, Chats, and Emails directed to the SRO team. Offer structured feedback to stakeholders when complex issues are underway
  • Ticket Queue Management – Lead by example in managing multiple ticket queues (ServiceNow, JIRA, etc.). Take ownership of priority tickets and oversee distribution among the team
  • Documentation – Maintain and expand the SRO team’s knowledge base. Author new Standard Operating Procedures (SOPs) that incorporate best practices gained from resolving advanced incidents
  • Deployments – Coordinate and execute application and website code deployments using Jenkins, GitLab, or other CI/CD tools. Help optimize deployment workflows to reduce errors and downtime
  • Data Backup and Compliance – Oversee backup tasks using CommVault, AWS Backup, and related tools. Ensure data retention meets or exceeds corporate and regulatory requirements
  • Project Management – Drive or co-lead medium to large-scale projects related to infrastructure improvements, migrations, or optimizations. Collaborate with stakeholders to define scope, timelines, and resource needs
  • Mentorship – Provide guidance to Engineer I and II staff on advanced troubleshooting methods, best practices in cloud administration, and effective incident response

What You’ll Bring

  • Bachelor’s Degree in Computer Science or comparable experience.
  • Advanced AWS Certifications strongly preferred
  • 3–5+ years of experience working in both Windows and Linux environments, with demonstrated success in advanced troubleshooting and administration.
  • Hands-on experience with New Relic (dashboards, NRQL queries, alerting configuration)
  • Demonstrated success improving monitoring coverage and alert quality
  • Ability to consult with application teams on observability best practices.
  • Strong analytical skills for identifying patterns in incident data
  • Strong scripting or programming skills in PowerShell, Python, or a similar language; ability to automate repetitive tasks and streamline operations.
  • Excellent organizational skills, with the ability to manage competing priorities and urgent issues in a fast-paced setting.
  • Strong written and verbal communication skills; able to explain complex technical issues to stakeholders at various technical levels.
  • Comfortable completing annual role-based training and certification assignments; dedicated to continual learning and development.
  • Demonstrated ability to work independently on complex tasks and to collaborate effectively with cross-functional teams.

Why You Should Join

As one of the top mortgage lenders in the country, Pennymac has helped over 4 million lifetime homeowners achieve and sustain their aspirations of home. Our vision is to be the most trusted partner for home. Together, 4,000 Pennymac team members across the country are guided by our core values: to be Accountable, Reliable and Ethical in all that we do. 

Pennymac is committed to conducting a business that makes positive contributions and promotes long-term sustainable growth and to fostering an equitable and inclusive environment, where all employees and customers feel valued, respected and supported. 

 

Benefits That Bring It Home: Whether you're looking for flexible benefits for today, setting up short-term goals for tomorrow, or planning for long-term success and retirement, Pennymac's benefits have you covered. Some key benefits include: 

  • Comprehensive Medical, Dental, and Vision
  • Paid Time Off Programs including vacation, holidays, illness, and parental leave 
  • Wellness Programs, Employee Recognition Programs, and onsite gyms and cafe style dining (select locations)
  • Retirement benefits, life insurance, 401k match, and tuition reimbursement 
  • Philanthropy Programs including matching gifts, volunteer grants, charitable grants and corporate sponsorships
  • We value the hard work and dedication of our employees. In addition to a competitive salary, positions may offer bonus opportunities.



To learn more about our benefits visit: 

https://pennymacnews.page.link/benefits

 

Compensation: Individual salary may vary based on multiple factors including specific role, geographic location / market data, and skills and experience as defined below:

  • Lower in range - Building skills and experience in the role
  • Mid-range - Experience and skills align with proficiency in the role 
  • Higher in range - Experience and skills add value above typical requirements of the role 

 

Some roles may be eligible for performance-based compensation and/or stock-based incentives awarded to employees based on company and individual performance.


Salary

$75,000 - $130,000
Work Model

OFFICE

Other facts

Tech stack
Windows Administration,Linux Administration,AWS,Monitoring,Alert Management,Observability,Incident Management,Technical Troubleshooting,Scripting,Communication,Project Management,Data Backup,Change Management,Documentation,Virtual Server Management,Mentorship

About Pennymac

PennyMac Loan Services, LLC (NMLS #35953) is a top national mortgage lender with over 4 million lifetime customers. Whether you are new to the home loan process or an experienced buyer, Pennymac is dedicated to offering competitive rates and superior service.

Being an online mortgage lender means Pennymac can focus on the needs of its customers rather than maintaining a network of branches and banking products.

Our mission is to build a foundation of homeownership by enabling our customers to achieve and sustain their aspirations of home.

Pennymac Publication Rules
We A.R.E. Pennymac: Accountable, Reliable, Ethical. Pennymac does not pay or otherwise provide compensation for reviews/comments. All reviews/comments are subject to the disclaimer provided prior to posting and become the property of Pennymac once submitted. We reserve the right to validate, remove or republish your reviews/comments at any time.

Equal Housing Opportunity, PennyMac Loan Services, LLC, 3043 Townsgate Road, Suite 200, Westlake Village, CA 91361, 818-224-7442. NMLS ID # 35953 (www.nmlsconsumeraccess.org). For a complete listing of state licenses and important notices, please visit www.pennymac.com/state-licenses. Not all property types qualify. Some loan products may not be available in all states. Information, property type eligibility, rates and pricing are subject to change without prior notice at the sole discretion of PennyMac Loan Services, LLC. Ask your loan officer for details. © 2025 Private National Mortgage Acceptance Company, LLC, Pennymac and all related marks are trademarks of Private National Mortgage Acceptance Company, LLC and/or its subsidiaries or affiliates. Third-party content is the property of its respective owners. All rights reserved. (01-2025)

Team size: 1,001-5,000 employees
LinkedIn: Visit
Industry: Financial Services
Founding Year: 2008

What you'll do

  • The Site Reliability Operations Engineer III will oversee 24/7 health monitoring of the IT infrastructure and manage operational alerts to proactively identify issues. They will also engage with application teams to improve monitoring maturity and serve as an escalation point for complex incidents.

Ready to join Pennymac?

Take the next step in your career journey

Frequently Asked Questions

What does Pennymac pay for a Site Reliability Operations (SRO) Engineer III?

Pennymac offers a competitive compensation package for the Site Reliability Operations (SRO) Engineer III role. The salary range is USD 75k - 130k per year. Apply through Clera to learn more about the full compensation details.

What does a Site Reliability Operations (SRO) Engineer III do at Pennymac?

As a Site Reliability Operations (SRO) Engineer III at Pennymac, you will: the Site Reliability Operations Engineer III will oversee 24/7 health monitoring of the IT infrastructure and manage operational alerts to proactively identify issues. They will also engage with application teams to improve monitoring maturity and serve as an escalation point for complex incidents..

Why join Pennymac as a Site Reliability Operations (SRO) Engineer III?

Pennymac is a leading Financial Services company. The Site Reliability Operations (SRO) Engineer III role offers competitive compensation.

Is the Site Reliability Operations (SRO) Engineer III position at Pennymac remote?

The Site Reliability Operations (SRO) Engineer III position at Pennymac is based in Westlake Village, California, United States. Contact the company through Clera for specific work arrangement details.

How do I apply for the Site Reliability Operations (SRO) Engineer III position at Pennymac?

You can apply for the Site Reliability Operations (SRO) Engineer III position at Pennymac directly through Clera. Click the "Apply Now" button above to start your application. Clera's AI-powered platform will help match your profile with this opportunity and guide you through the application process. You can also learn more about Pennymac on their website.