hero

Find Your Dream Job Today

Principal Machine Learning Engineer - CoreAI

Microsoft

Microsoft

Software Engineering
USD 163k-296,400 / year
Posted on Jul 12, 2025

Principal Machine Learning Engineer - CoreAI

Redmond, Washington, United States

Save

Share job

Date posted
Jul 11, 2025
Job number
1845974
Work site
Microsoft on-site only
Travel
0-25 %
Role type
Individual Contributor
Profession
Research, Applied, & Data Sciences
Discipline
Research Sciences
Employment type
Full-Time

Overview

The Microsoft CoreAI Post-Training team is dedicated to advancing post-training methods for both OpenAI and open-source models. Their work encompasses continual pre-training, large-scale deep reinforcement learning running on extensive GPU resources, and significant efforts to curate and synthesize training data. In addition, the team employs various fine-tuning approaches to support both research and product development.

The team also develops advanced AI technologies that integrate language and multi-modality for a range of Microsoft products. The team is particularly active in developing code-specific models, including those used in Github Copilot and Visual Studio Code, such as code completion model and the software engineering (SWE) agent models.

The team has also produced publications as by-products, including work such as LoRA, DeBerTa, Oscar, Rho-1, Florence, and the open-source Phi models.

We are looking for a Principal Machine Learning Engineer - CoreAI with significant experience in large-scale model deployment, production systems, and engineering excellence, ideally from leading technology companies. You will build and optimize production systems for LLMs, SLMs, multimodal, and coding models using both proprietary and open-source frameworks. Key responsibilities include ensuring model reliability, inference performance, and scalability in production environments, and managing the full engineering pipeline from model training, serving, monitoring, to continuous deployment.

Our team values startup-style efficiency and practical problem-solving. We are seeking a curious, adaptable problem-solver who thrives on continuous learning, embraces changing priorities, and is motivated by creating meaningful impact. Candidates must be self-driven, able to write production-grade code and debug complex distributed systems, document engineering decisions, and demonstrate a track record in shipping ML systems at scale. The ability to quickly translate ideas into working code for rapid experimentation would be a plus. You may include information about any individual who can serve as your referral in your application.

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.

Qualifications

Required/Minimum Qualifications

  • Doctorate in relevant field AND 3+ years related research experience
    • OR equivalent experience
  • 3+ years of coding experience in Python and experience with ML frameworks such as PyTorch and Triton
  • 3+ years of proven ability to design and scale training infrastructure and pipelines in production environments
  • 3+ years of experience in production ML systems, especially on finetuning LLMs, SLMs, multimodal, or code-specific models
  • 3+ years of expertise in system architecture and scalability, designing and implementing large-scale distributed systems, such as batching, caching, load balancing, and model parallelism
  • 3+ years of proficiency in building and maintaining ML production pipelines

Other Requirements:

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:

  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Preferred/Additional Qualifications

  • Proven track record of shipping generative AI products, preferably at leading technology companies, with demonstrated impact on production systems
  • Extensive experience with generative AI infrastructure, including model training, inference, quantization, deployment, and performance optimization for foundation models
  • Hands-on experience with large-scale distributed systems - systems thinking and experience with high-availability, low-latency inference services
  • Proficiency in containerization (Docker / Kubernetes)
  • Experience with MLOps and DevOps practices, CI/CD pipelines, automated testing, deployment strategies, and infrastructure as code
  • Demonstrated ability to work in cross-functional teams and collaborate effectively with researchers, product managers, and other engineers to deliver complex ML solutions
  • Leadership and influence with the ability to lead projects and influence others across teams and disciplines
  • AI-forward approach with a demonstrated willingness to incorporate AI tools in day-to-day work to enhance productivity and innovation
  • Self-driven and organized with the ability to take ownership of projects and document findings clearly and effectively
  • Startup-style mindset - agile, solution-oriented, and able to operate with minimal overhead

Research Sciences IC6 - The typical base pay range for this role across the U.S. is USD $163,000 - $296,400 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $220,800 - $331,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay

Microsoft posts positions for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.

Responsibilities

Core Qualifications & Responsibilities

  • Design and implement large-scale model training - Especially with LLMs, SLMs, multimodal, or code-specific models.
  • System optimization and performance - Optimize inference latency, throughput, and resource utilization for training and inference workloads.
  • Hands-on coding- Ability to write efficient, production-quality code and debug complex distributed systems.
  • Cross-platform integration - Work with both proprietary and open-source frameworks to build robust ML pipelines.
  • Production model deployment - Design and implement scalable serving infrastructure for LLMs, SLMs, multimodal, and code-specific models.

Research & Innovation

  • Contribute to or build on existing innovations like technical report of the well-known models.
  • Develop novel AI solutions that bridge language, vision, and code understanding.
  • Help develop models powering tools like GitHub Copilot, Cursor, and VS Code suggestions.

Other:


Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.