Senior Software Engineer
Microsoft
Azure High Performance Computing and AI Platform (HPC/AI) group is the team behind Azure’s cloud offering that powers some of the most demanding and largest scale AI training and inference workloads in the industry. The virtual machine (VM) series that our team owns combine cutting edge GPUs and accelerators, as well as a state-of-the-art scale-out network infrastructure to enable these workloads. We collaborate with many Microsoft teams and our industry partners to design and bring up the underlying platform, and we build the software to expose this platform as an Azure service.
As a Senior Software Engineer in the Azure HPC/AI team, you will play a critical role in designing and delivering the next generations of our platform by solving technical problems at all levels of the stack, contributing to our codebases to enable new features on our VMs, working on architectural proposals, and collaborating with our industry partners.
This position involves deep technical work that primarily focuses on hardware/software interactions, device virtualization, and performance analysis of GPU workloads in VMs. Since our team is also responsible for the vertical integration of our VM offerings, you will also have the opportunity to work with upper layers of the Azure infrastructure software.
It is an exciting time for the team as we are working on expanding the capacity and range of supported scenarios to fuel the next growth wave. This position offers a unique opportunity to have a huge impact on Microsoft’s AI infrastructure and AI initiatives.
Responsibilities
- Analyzes functionality, integration, and performance issues at various levels of the HW/SW stack on current and future generations of AI training platforms.
- Designs and codes solutions that improve functional correctness, stability and performance of AI training oriented VM offerings and related services. When appropriate drives internal partner teams or industry partners to implement such solutions.
- Optimizes, debugs, refactors, and reuses code to improve performance and maintainability, effectiveness, and return on investment (ROI). Applies metrics to drive the quality and stability of code, as well as appropriate coding patterns and best practices.
- Holds accountability as a Designated Responsible Individual (DRI), and collaborates with other engineers across products/solutions, working as on-call to monitor system/product/service for degradation, downtime, or interruptions.
- Develops a playbook for the team to resolve issues.
- Maintains communication with key partners across the Microsoft ecosystem of engineers.
Qualifications
Required Qualifications:
- Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, OR Java, JavaScript, or Python
- OR equivalent experience.
Other Requirements:
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Preferred Qualifications:
- Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
- Familiarity with AI Infrastructure
- Familiarity with Operating Systems fundamentals and virtualization technologies
- Experience on Distributed Systems
- Experience on High Performance Computing / Machine Learning middleware
Software Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year.
Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:
https://careers.microsoft.com/us/en/us-corporate-pay
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.