Engineering Team Lead - Cloud Stability
Bloomberg
New York, NY, USA
The Cloud Stability team is essential, managing Bloomberg’s private cloud, a massive, critical infrastructure hosting over 100k virtual machines that underpin Bloomberg's most vital applications, including the flagship Terminal and News systems. The team's core mission is to efficiently guarantee the high-availability infrastructure that all clients depend on. Key responsibilities include ensuring reliability of cloud services, establishing comprehensive observability across the fleet, managing infrastructure maintenance and failure response, and executing predictive capacity planning. The team drives this through extensive automation built on tools like Ansible, Airflow, and Flask. The private cloud consists of two key offerings: Bloomberg Cloud Compute (BCC), an in-house cloud leveraging open source technologies such as OpenStack and Ceph, and a VMWare-based commercial software cloud.
We are seeking an experienced Team Lead for the Cloud Stability team based out of New York. This leadership role is pivotal: you will define and drive the strategic execution for running the private cloud, ensuring it continuously enables the vast majority of engineering workflows within Bloomberg. Furthermore, you will be trusted to articulate a future-state vision for the Bloomberg cloud's operation and forge critical partnerships across internal and external teams to realize that vision.
Some of our initiatives include:
Set a strategy for capacity management that takes into account changes in real time capacity and planned demand
Develop orchestration to coordinate competing automation
Enable twice annual reboots of a fleet of ~7k servers
Expand BCC’s global footprint with new production clusters
As a Team Lead, we’ll trust you to
Inspire and motivate the team to achieve outstanding results, while supporting individual growth and development.
Build partnerships with internal teams, and external stakeholders so that problems are well understood and solutions are aligned with expectations.
Organize and prioritize the backlog of work with the team and stakeholders so that the most important and impactful work is addressed first
Work with the engineers in the team to deliver high quality solutions that adhere to best practices.
Develop a vision of an optimally run cloud and a roadmap for getting there.
You'll need to have:
At least 4+ years experience as a Team Lead of a software development team
BS/MS/PhD in Computer Science, Engineering, or a related technology field
Ability to foster a collaborative team environment; through driving a strong culture of teamwork and taking advantage of team diversity
Ability to effectively listen, communicate, challenge, and influence team members, peers, and senior managers
Experience building trust based relationships with stakeholders to pave the way for cross team collaboration and alignment
A solid foundation in software development, including best practices, code quality, modular design, testing strategies, CI/CD pipelines, and maintainability.
Ability to reason about system behavior, failure domains, and scaling characteristics, enabling effective guidance on stability, reliability, and performance.
Architectural fluency across compute, storage, networking, & orchestration technologies.
We'd love to see experience in:
Openstack
Cloud infrastructure or SRE team
Virtual networking, Software Defined Networking
Workflow orchestrators such as Airflow and AWX/Ansible
Unix and distributed systems
Agile Scrum