Advisory AI Infrastructure Engineer

Lenovo
Lenovo

Software Engineering, Other Engineering, Data Science

Edinburgh, UK

Posted on Jun 22, 2026

General Information

Req #
WD00100762
Career area:
Hardware Engineering
Country/Region:
United Kingdom
City:
Edinburgh
Date:
Thursday, June 11, 2026
Working time:
Full-time
Additional Locations:
* United Kingdom

Why Work at Lenovo

We are Lenovo. We do what we say. We own what we do. We WOW our customers.
Lenovo is a US$83 billion revenue global technology powerhouse, ranked #196 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world’s largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services. Lenovo’s continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY).
This transformation together with Lenovo’s world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit www.lenovo.com, and read about the latest news via our StoryHub.

Description and Requirements

Lenovo is seeking an accomplished Advisory AI Infrastructure Engineer to take a leadership role within our Advanced AI Technology Center. In this position, you will architect, build, and evolve the large-scale infrastructure and platforms that underpin AI model development, deployment, and operation across the organization. You will serve as a technical authority on infrastructure strategy, mentor junior engineers, drive best practices, and collaborate cross-functionally with research, engineering, and product teams. Your deep expertise will be instrumental in scaling our AI capabilities and ensuring production-grade reliability, performance, and security. If you are passionate about making Smarter Technology For All, come help us realize our Hybrid AI vision!

Responsibilities

  • AI Infrastructure Strategy and Architecture: Define and drive the long-term infrastructure strategy for AI workloads. Architect scalable, cost-efficient, and resilient compute, storage, and networking solutions that support the full AI lifecycle from experimentation through production.
  • Technical Leadership and Mentorship: Serve as a technical lead and mentor to infrastructure engineers, establishing engineering standards, conducting design reviews, and fostering a culture of operational excellence across the team.
  • AI Model Deployment and Platform Engineering: Design and own the platforms, frameworks, and processes for deploying, monitoring, and managing AI models at scale in production environments, including model serving, A/B testing infrastructure, and rollback mechanisms.
  • Advanced Automation and CI/CD: Architect and maintain sophisticated automation pipelines for AI model training, evaluation, testing, and deployment. Champion infrastructure-as-code practices and drive continuous improvement of CI/CD workflows.
  • Cross-Functional Collaboration: Partner with data scientists, ML engineers, product managers, and leadership to align infrastructure capabilities with research and business objectives. Act as the primary infrastructure point of contact for cross-team initiatives.
  • Performance Engineering: Lead efforts to profile, benchmark, and optimize AI infrastructure and model serving for throughput, latency, GPU utilization, scalability, and cost efficiency at scale.
  • Security, Compliance, and Governance: Establish and enforce security best practices, access controls, and compliance frameworks across AI infrastructure. Ensure adherence to relevant regulatory requirements and internal governance policies.
  • Capacity Planning and Cost Management: Own capacity forecasting, resource planning, and cost optimization for GPU clusters and associated infrastructure, balancing performance needs with budget constraints.

Qualifications

  • Bachelor’s or Master’s degree in Computer Engineering, Electrical Engineering, Computer Science, or a related field. Advanced degree preferred.
  • 8+ years of experience in software engineering, infrastructure engineering, DevOps/SRE, or a related field, with at least 4 years focused on AI/ML infrastructure.
  • Demonstrated experience leading or mentoring engineering teams on infrastructure projects.
  • Deep expertise in computer systems, distributed systems, and cloud computing architectures.
  • Extensive experience designing, deploying, and managing multi-node distributed GPU clusters using Slurm, Kubernetes, or equivalent orchestration platforms.
  • Expert-level Linux system administration, including package management, user/group management, file system internals, shell scripting (e.g., Bash), networking, and system configuration (e.g., systemd, kernel tuning).
  • Strong proficiency in Python and at least one additional systems language (e.g., Go, C++, Rust, Java).
  • Deep experience with AI-specific hardware and software stacks (e.g., NVIDIA GPUs, CUDA, cuDNN, NCCL, InfiniBand/RoCE networking).
  • Proven track record managing high-performance computing (HPC) environments, including job scheduling, resource allocation, cluster maintenance, and performance tuning.
  • Significant experience with AI model deployment, serving infrastructure, and lifecycle management in production.
  • Strong architectural thinking with the ability to balance trade-offs across performance, reliability, security, and cost.
  • Excellent communication and collaboration skills, with the ability to influence technical decisions across teams and present to senior leadership.
  • Ability to thrive in a fast-paced, ambiguous environment and drive clarity through technical leadership.

Bonus Points

  • Experience with AI and machine learning frameworks at scale (e.g., PyTorch, DeepSpeed, Megatron-LM, vLLM).
  • Hands-on experience with major cloud platforms (e.g., AWS, GCP, Azure) and hybrid cloud architectures.
  • Advanced experience with containerization (Docker) and orchestration (Kubernetes), including custom operators, Helm charts, and GPU scheduling plugins.
  • Expertise with observability stacks (e.g., Prometheus, Grafana, ELK/OpenSearch, Datadog) for infrastructure and model monitoring.
  • Experience with infrastructure-as-code tools (e.g., Terraform, Ansible, Pulumi).
  • Contributions to open-source infrastructure or ML tooling projects.
  • Experience with large language model (LLM) training and inference infrastructure.

#LATC

We are an Equal Opportunity Employer and do not discriminate against any employee or applicant for employment because of race, color, sex, age, religion, sexual orientation, gender identity, national origin, status as a veteran, and basis of disability or any federal, state, or local protected class.

Additional Locations:
* United Kingdom
* United Kingdom

AI PROCESSING NOTICE
We use AI-based tools to support some of our processes (e.g. online interviews recordings and transcripts) in order to achieve better efficiency, accuracy and for our documentation purposes. AI can make mistakes, but we always make sure that the outputs are manually reviewed by a human. You can always opt-out or contact us in case of any question.