IT AI System Administrator - AI Offerings
Lenovo
Software Engineering, IT, Data Science
Morrisville, NC, USA
Why Work at Lenovo
Description and Requirements
Job Description
We are seeking a highly skilled Unix System Administrator to manage, maintain, and optimize infrastructure supporting advanced Agentic AI systems built on the NVIDIA AI Enterprise platform. This role blends traditional Unix/Linux administration with modern AI/ML infrastructure operations, requiring expertise in GPU-accelerated environments, automation, and distributed systems.
You will play a critical role in ensuring reliability, scalability, and performance of autonomous AI agents and their supporting pipelines across on-prem and/or cloud environments.
Responsibilities:
System Administration & Infrastructure
- Administer and maintain Unix/Linux systems (RHEL, Ubuntu, or similar) in high-performance computing (HPC) and AI environments
- Manage GPU-enabled servers and clusters optimized for AI workloads
- Install, configure, and maintain software components within the NVIDIA AI Enterprise stack
- Monitor system performance, availability, and security across environments
Agentic AI Platform Operations
- Support deployment and lifecycle management of Agentic AI systems (autonomous agents, orchestration frameworks, inference pipelines)
- Ensure high availability and fault tolerance for AI agents operating in production
- Collaborate with AI engineers to optimize runtime environments for Agentic AI related workloads
Automation & DevOps
- Develop and maintain automation scripts (Bash, Python) for provisioning, configuration, and monitoring
- Implement Infrastructure as Code (IaC) using tools such as Terraform or Ansible
- Support CI/CD pipelines for AI model deployment and updates
- Integrate observability tools (Prometheus, Grafana, ELK stack) for system and AI workload monitoring
Containerization & Orchestration
- Manage containerized workloads using Docker and Kubernetes
- Deploy and maintain GPU-aware Kubernetes clusters (e.g., NVIDIA GPU Operator)
- Optimize resource allocation for AI workloads across clusters
Security & Compliance
- Enforce system hardening, patching, and vulnerability management
- Implement access controls and secure configurations for AI environments
- Ensure compliance with organizational and regulatory standards
Required Qualifications:
- Bachelors degree in Computer Science, Information Systems, or related field (or equivalent experience)
- 5+ years of Unix/Linux system administration experience
- Hands-on experience with GPU infrastructure and drivers (CUDA, NVIDIA drivers)
- Experience with the NVIDIA AI Enterprise ecosystem or similar AI platforms
- Strong scripting skills (Bash, Python)
- Experience with containerization (Docker) and orchestration (Kubernetes)
- Ability to work independently and as part of a team, with excellent communication and collaboration skills
- Excellent verbal and written communication skills. Ability to convey technical information to a non-technical audience.
- Ability to work collaboratively in a team environment. Strong interpersonal skills and the ability to build relationships with stakeholders.
If you are passionate about AI and advanced System Administration, and are looking for a challenging and rewarding role, we encourage you to apply for this position.