Network Solutions Architect, AI Factory Services
Lenovo
Software Engineering, IT, Data Science
Morrisville, NC, USA
Why Work at Lenovo
Description and Requirements
Lenovo seeks a Network Solutions Architect to join the Hybrid Cloud Solutions and AI Offering Engineering team within SSG. This senior-level role designs, deploys, and validates high-performance network infrastructure for Lenovo's AI Factory and GigaFactory service offerings, covering both enterprise-scale and rack-scale GPU environments powered by NVIDIA Spectrum-X Ethernet and InfiniBand fabrics. The architect produces network reference architectures, deployment runbooks, and field-ready engineering documentation consumed directly by Lenovo Professional Services and Managed Services teams globally. This includes GPU cluster fabric design, BlueField DPU offload architectures, multi-tenant network isolation, and performance-validated RDMA and RoCEv2 configurations for AI and HPC workloads. The ideal candidate brings expert-level depth in NVIDIA Spectrum-X, Quantum InfiniBand, and BlueField DPU architectures, with proven experience designing networks for large-scale GPU clusters, high-density liquid-cooled environments, and multi-tenant AI Factory deployments. This role is based in Morrisville, NC, and includes some customer-facing responsibilities.
In this role, you will:
- Design GPU cluster network architectures for NVIDIA AI Factory environments: Spectrum-X Ethernet (Spectrum-4 SN5600 + BlueField-3 DPUs) for enterprise deployments and Q3400 XDR InfiniBand + Spectrum 5600 Ethernet for rack-scale GigaFactory deployments.
- Develop and validate RDMA/RoCEv2 configuration guides and deployment runbooks for AI and HPC workloads on NVIDIA GPU clusters.
- Architect multi-tenant network designs for NeoCloud and enterprise AI Factory customers, including namespace isolation, east-west traffic segmentation, and site resiliency aligned to NVIDIA Cloud Partner Reference Architecture.
- Design BlueField-3 DPU deployment architectures using DOCA for network function offload, security services, and service mesh in multi-tenant GPU environments.
- Define rail-optimized InfiniBand topologies and Spectrum-X Adaptive Routing, SHARP in-network computing, and congestion control configurations for AI training and inference workloads.
- Develop network architectures for high-density, liquid-cooled rack-scale GPU deployments, incorporating power-aware design patterns for performance optimization.
- Create operational runbooks for network provisioning, troubleshooting, and performance validation, consumed directly by PS and MS field teams.
- Publish network performance benchmarks, including RDMA throughput and RoCEv2 latency baselines, to support field delivery quality and pre-sales sizing.
- Collaborate with ISG product and engineering teams to validate fabric designs against NVIDIA Enterprise Reference Architectures and hardware roadmaps.
- Stay current with emerging NVIDIA networking technologies and contribute to Lenovo's AI infrastructure innovation roadmap.
- Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field.
- 5+ years of experience designing and deploying high-performance network infrastructure for AI, HPC, or large-scale GPU cluster environments.
- Expert-level technical depth in NVIDIA networking: Spectrum-X platform (Spectrum-4 SN5600 Ethernet), Quantum InfiniBand (XDR/HDR/NDR), BlueField DPUs, DOCA SDK, and RoCE/RDMA.
- Deep knowledge of AI fabric design: rail-optimized InfiniBand topologies, Adaptive Routing, SHARP in-network computing, and congestion control for GPU-to-GPU workload optimization.
- Proficiency in multi-tenant network design: namespace isolation, east-west traffic segmentation, NeoCloud or MSP-scale network architecture.
- Strong understanding of routing, switching, and overlay protocols: BGP, OSPF, EVPN, VXLAN, MPLS, IS-IS.
- Demonstrated ability to produce field-ready architecture documentation, deployment runbooks, and technical content consumed by PS and MS delivery teams.
- Cross-functional collaboration skills across product engineering, ISG, and service delivery organizations.
- NVIDIA Certified Networking Professional or InfiniBand Specialist certification.
- Cisco CCIE/CCNP Data Center or Enterprise Infrastructure certification.
- Experience with NVIDIA Base Command Manager for cluster network management and monitoring integration.
- Familiarity with Omniverse DSX Boost for power-aware network design in high-density GPU environments.
- Experience with network automation and Infrastructure-as-Code tools: Terraform, Ansible, Python, REST APIs.
- Knowledge of multi-tenant NeoCloud or managed service provider network design patterns.
- Understanding of compliance and security frameworks: SOC2, FedRAMP, TIC 3.0, and secure network segmentation.
- Familiarity with DevOps and observability platforms for network health, telemetry, and capacity planning.