AI Infra Engineer
Lenovo
Software Engineering, Data Science
Morrisville, NC, USA
Why Work at Lenovo
Description and Requirements
Job Summary:
Lenovo’s CTO (Chief Technology Office) Org-Sustainable Computing Research Team is seeking an AI Infra Engineer, to support the delivery of cutting edge and energy efficient product offerings. The team focuses on software-hardware co-design for energy-efficient computing clusters, covering DVFS, intelligent task scheduling, liquid cooling optimization, and more. Our primary internal customer is Lenovo’s Infrastructure Solution Group.
In this role, you will act as the bridge between our R&D team and Lenovo’s global business teams. Your core mission is to localize and adapt our sustainable computing technologies, integrate them into Lenovo AI infra & device products, and support business deployment. You will also represent Lenovo in industry alliances such as OCP (Open Compute Project) to track and influence cutting-edge trends in green computing.
This position requires a hybrid schedule of 3 days onsite a week with two days remote.
Responsibilities:
- Adapt and optimize the team’s existing energy-saving solutions (DVFS, scheduling, liquid cooling control, etc.) for AI clusters.
- Conduct cutting-edge research on energy-efficient management of heterogeneous computing clusters.
- Build power monitoring, predictive analytics and energy-efficient software solutions for AI Infra.
- Work closely with cross-functional World-Wide teams, transfer R&D outcomes into commercial products and solutions.
- Represent in industry alliances such as OCP (Open Compute Project) to track and influence cutting-edge trends in sustainable computing.
Required Qualifications:
- Background: DCIM (Data Center Infrastructure Management) or Linux kernel software development experience.
- Hands-on experience in at least one of the following areas:
- Data center operations or AI infrastructure monitoring
- GPU/CPU energy efficiency optimization (e.g., power capping, DVFS, frequency scaling)
- Familiar with benchmarking and evaluation frameworks such as MLPerf, Energy Star, etc.
- BMC,RMC(Rack Management Controller),Redfish development
- Inference framework (e.g., vLLM, SGLang, etc.) optimizations
- AI workload characterization and performance analysis
- Programming skills: Proficiency in Golang, Python, etc.
- Communication: Strong verbal and written communication skills in English; ability to work effectively with both technical and business stakeholders across regions.
Preferred Qualifications:
- A Bachelors or Master’s Degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
- Experience with sustainable computing / green IT projects (e.g., power-aware scheduling, thermal management, liquid cooling integration).
- Direct experience with NVIDIA GPU software stacks (CUDA, NVML, DCGM, etc.).
- Familiarity with OCP standards or contributions to open-source energy-efficiency frameworks.
- Excellent problem-solving skills and a passion for tackling complex, open-ended challenges.
- Fluent in Mandarin (preferred but not required)