Find Your Dream Job Today

Our mission is to help high-achieving LGBTQ+ undergraduates reach their full potential.

Senior Manager, GovCloud Site Reliability Operations



Denver, CO, USA
Posted on Monday, November 20, 2023

To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.

Job Category

Software Engineering

Job Details

About Salesforce

We’re Salesforce, the Customer Company, inspiring the future of business with AI+ Data +CRM. Leading with our core values, we help companies across every industry blaze new trails and connect with customers in a whole new way. And, we empower you to be a Trailblazer, too — driving your performance and career growth, charting new paths, and improving the state of the world. If you believe in business as the greatest platform for change and in companies doing well and doing good – you’ve come to the right place.

Senior Manager, GovCloud Site Reliability Operations

Salesforce is looking for a Senior Manager to lead the Site Reliability Operations team for one of the largest and most trusted cloud platforms in the world! The Senior Manager will lead a team supporting our GovCloud platform. In this position, you will play a key role within the GovCloud leadership team to strategize on our current transformation journey into ever improving, true site reliability operations, delivering consistently high SLAs.

Your team provides the Incident Management, Detection, Operational and Software Engineering expertise, as well as any Root Cause Analysis/remediation and other proactive measures to improve the stability of customer performance and minimize TTR. The team will draw on your past experience as a people and technical leader managing software engineering efforts, keeping your team healthy and productive.

The Senior Manager is responsible for leading a team of Incident Response operators, setting a vision for growth and driving operational transformation, while automating how we do Operations. The constant goal is to reduce toil and fragility. As such, you have a balance of technical expertise, leadership skills and managerial experience. Your operational skills are sufficiently advanced to enable you to set technical direction on incident bridges and marshal resources accordingly, as well as ensuring that investigations follow the appropriate troubleshooting paths, monitoring, triage and change execution remain optimal. As a leader in this role, you demonstrate a strong focus on engineering practices, service ownership, agile leadership and people management skills. Your scope will span the full breadth of our 1P and Hyperforce public cloud infrastructure. This position will involve fostering and maintaining strong relationships with other connected areas of the business, ensuring the SRO team are vital stakeholders within a continuous cycle of engineering and process.

In this role:

  • You will be responsible for managing and supervising the day-to-day responsibilities of front-line Site Reliability Engineers.

  • You will act in key support role during incidents.

  • You will find, hire and retain the best technical talent. Be accountable for the success of your team, providing coaching, mentorship and support to help them develop professionally as well as achieve their delivery goals.

  • As a technical leader, you will both create the strategy for your team’s role in a larger movement to DevOps principles within Salesforce, and set the tactical direction across multiple teams as you drive investigations within incident investigations

  • You will drive the team as well as partner product teams to populate and participate in RCAs to drive permanent resolution of sophisticated issues. Collaborate to be proactive in design, management, and improvement of high-quality customer-facing services, with a focus on automation, reliability, and observability.

  • You will collaborate successfully with both internal and external stakeholders to carry out the strategy for SRO to maintain high SLAs

  • You will create and improve processes that facilitate SREs responding and mitigating incidents to quantitative goals.

  • You will work successfully with other cross-cloud service owners (Developers, DBAs, Network etc) with positive relationships but with influence.

  • We want to a leader who will use data to solve underlying problems in our systems.


  • U.S. citizen (U.S. born or naturalized) who does not hold dual citizenship. You agree to complete a Minimum Background Investigation (MBI) for a Moderate Public Trust position with the U.S. federal government or other clearances as deemed appropriate for the role

  • A related technical degree required

  • 8+ years of Infrastructure Engineering, or Technical Operations experience

  • 5+ years leading Site Reliability Engineering, Operations, or Software Development teams preferably in globally distributed environments

  • Available to run 24x7 teams across US with the flexibility to manage teams on off-hour shifts (may require work outside of traditional office hours).

  • Experience with management and troubleshooting of Internet services running on traditional data centers and Public Cloud (AWS, GCP) infrastructure

  • Past experience in Incident Management, strong understanding of ITIL processes, and Scrum agile development methodologies

  • Expertise with enterprise observability and monitoring systems, such as Prometheus, OpenTSDB, and Splunk

  • Experience in leading and driving team Transformations that showcase Teamwork and Collaboration, Adaptability, Customer Focus, Results, and Innovation

  • Experience successfully coaching individuals to achieve goals and focus on employee development

  • Experience in delivering Engineering Productivity, working in a Service Ownership model and a consistent record of Customer Success

  • Strong communication, organizational, analytical and problem solving skills and attention to detail

Nice to Haves:

  • Experience developing security & compliance into pipelines (OPA, Checkov, Twistlock, Prisma, Casbin)

  • Experience in any of the monitoring tools like Nagios, Graphite, Datadog, Cloudwatch, Prometheus, Zabbix etc.,

  • Experience in managing large scale web applications in production


If you require assistance due to a disability applying for open positions please submit a request via this Accommodations Request Form.

Posting Statement

At Salesforce we believe that the business of business is to improve the state of our world. Each of us has a responsibility to drive Equality in our communities and workplaces. We are committed to creating a workforce that reflects society through inclusive programs and initiatives such as equal pay, employee resource groups, inclusive benefits, and more. Learn more about Equality at www.equality.com and explore our company benefits at www.salesforcebenefits.com.

Salesforce is an Equal Employment Opportunity and Affirmative Action Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status. Salesforce does not accept unsolicited headhunter and agency resumes. Salesforce will not pay any third-party agency or company that does not have a signed agreement with Salesforce.

Salesforce welcomes all.

For Colorado-based roles, the base salary hiring range for this position is $156,800 to $215,600.

Compensation offered will be determined by factors such as location, level, job-related knowledge, skills, and experience. Certain roles may be eligible for incentive compensation, equity, benefits. More details about our company benefits can be found at the following link: https://www.salesforcebenefits.com.