Software Engineering II
Microsoft
The Azure Compute team builds a fault-tolerant, distributed system on top of commodity datacenter hardware to deliver infrastructure for hosting cloud applications in virtual machines (VMs). The team creates the illusion that resources are limitless, infinitely elastic, and always available.
This role is in the Availability Platform team within Azure Compute, which focuses on ensuring every Azure virtual machine achieves a Service Level Agreement (SLA) of 99.99 percent or higher. Meeting and exceeding this target requires innovative thinking, supported by data-driven decisions and intelligent automation. The team owns services that monitor the health of millions of Azure machines and the control plane services that make all repair decisions in Azure. We use artificial intelligence (AI) and machine learning to build predictive failure models that proactively live-migrate virtual machines before failures occur, minimizing customer impact and improving platform resilience.
We are also exploring the use of generative artificial intelligence to enhance diagnostics, automate root cause analysis, and accelerate incident resolution. Our collaboration with data scientists and AI researchers enables us to continuously evolve our platform with smarter, self-healing capabilities. As a Software Engineer II, you will join a team that invests in people and technology for the long term. We emphasize comprehensive designs, incremental development with high quality, frequent shipping, and rapid adaptation to customer feedback. If you want hands-on experience with services architecture at hyperscale, this is the role for you.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
- Partners with appropriate stakeholders spanning across teams and orgs to determine project requirements and build intelligent observability pipelines that leverage anomaly detection and trend
- Leads the design and architecture of change management features and services in Azure Compute. Identifies dependencies and authors design documents for features and services.
- Leverages expertise with appropriate stakeholders to develop project plans, release plans, and work items. Develops high quality, extensible, maintainable code and coaches others to do the same.
- Supports livesite as Designated Responsible Individual (DRI), mentoring engineers across products/solutions, working on-call to monitor system/product/service for degradation, downtime, or interruptions.
- Proactively seeks new knowledge and adapts to new trends, technical solutions, and patterns that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale and shares knowledge with other engineers.
- Collaborates with data scientists and ML engineers to design and integrate predictive models that proactively detect hardware anomalies and trigger live migrations, improving VM uptime and SLA compliance.
Leads initiatives to embed AI-driven diagnostics and root cause analysis into availability services, reducing time-to-resolution for incidents and improving operational efficiency. - Drives the adoption of generative AI tools to automate documentation, incident summaries, and engineering workflows, enhancing team productivity and knowledge sharing.
Qualifications
Required Qualifications
- Bachelor's Degree in Computer Science or related technical field AND 2+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.
Other Requirements
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
- Master's Degree in Computer Science or related technical field AND 3+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR Bachelor's Degree in Computer Science or related technical field AND 5+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.
#azurecorejobs
Software Engineering IC3 - The typical base pay range for this role across the U.S. is USD $100,600 - $199,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $131,400 - $215,400 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.