hero

Find Your Dream Job Today

Our mission is to help high-achieving LGBTQ+ undergraduates reach their full potential.

Principal Architect, GPU Tools & Diagnostics

Lenovo

Lenovo

IT
Morrisville, NC, USA
Posted on Dec 23, 2024

General Information

Req #
WD00076807
Career area:
Hardware Engineering
Country/Region:
United States of America
State:
North Carolina
City:
Morrisville
Date:
Monday, December 23, 2024
Working time:
Full-time
Additional Locations:
* United States of America - North Carolina - Morrisville

Why Work at Lenovo

We are Lenovo. We do what we say. We own what we do. We WOW our customers.
Lenovo is a US$57 billion revenue global technology powerhouse, ranked #248 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world’s largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo’s continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY).
This transformation together with Lenovo’s world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com, and read about the latest news via our StoryHub.

Description and Requirements

Lenovo 's Infrastructure Solutions Group (ISG) is seeking an experienced platform RAS, diagnostics, and software architect to define, design, and implement RAS and monitoring solutions for GenAI servers. Ideal candidate will have a deep understanding of GPU and ARM architectures, RAS principles, and system monitoring considerations.

Job Responsibilities:

  • Lead the architecture, design, and development of GPU tools for diagnostics, debugging, and performance analysis across a variety of GPU hardware and software environments.
  • Reviews current and future technology roadmaps to identify serviceability and supportability requirements for GPU tool development.
  • Work closely with cross-functional teams, including GPU hardware engineers, software developers, and system architects, to integrate diagnostic tools into GPU subsystems to identifying and troubleshooting complex issues and optimize serviceability.
  • Works with our technology partners on understanding their products and support capabilities, enhancing partner and Lenovo’s end customer support, facilitates support knowledge transfer to support organization.
  • Establishes a win-win relationship with technology partners to identify and drive operational (vs. design validation) support solutions to address needs of end customers.
  • Conducts market research and competitive support benchmarking.
  • Keeps up with industry standards and best practices for serviceability and supportability of new technologies.
  • Ability to write scripts or code to automate serviceability tasks and develop diagnostic tools.

Basic Requirements:

  • 10+ years' of relevant industry experience in GPU or CPU architecture and networking protocols (or other equivalent experience)
  • 5+ years' of experience in designing software and firmware for various compute environments.
  • Ph.D. or MS in Computer Science, Electrical Engineering or Computer Engineering or equivalent experience.

Preferred Requirements:

  • Expertise in designing and building debugging and diagnostic tools for CPU/GPU subsystems.

• Background in computer architecture, graphics algorithms, and parallel processing.

• Comprehensive knowledge of server components, operating systems, and network protocols

• Strong hands-on programming in C, C++, Perl and Python. GPU programming languages such as CUDA, OpenCL, or Vulkan.

• Expertise in current data center operating systems and software development methodologies.

• Proficiencies in high performance computing, DDR, PCIe, and communication protocols such as Redfish, I2C, SPI, and MDIO.

• Strong technical documentation skills and excellent written and verbal

We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, religion, sexual orientation, gender identity, national origin, status as a veteran, and basis of disability or any federal, state, or local protected class.

Additional Locations:
* United States of America - North Carolina - Morrisville
* United States of America
* United States of America - North Carolina
* United States of America - North Carolina - Morrisville