My job alerts

TechOps - DE - CloudOps - Infra Support Engineer - Senior

Customer Service

Chennai, Tamil Nadu, India

Posted on Jan 29, 2026

At EY, you’ll have the chance to build a career as unique as you are, with the global scale, support, inclusive culture and technology to become the best version of you. And we’re counting on your unique voice and perspective to help EY become even better, too. Join us and build an exceptional experience for yourself, and a better working world for all.

Infra Support Engineer

The opportunity

As a Lead Infra Support Engineer, you will take end‑to‑end ownership of critical infrastructure operations while guiding a team of engineers to deliver high‑quality, reliable, SRE‑driven support.

You will act as a technical lead, provide architectural recommendations, drive automation at scale, mentor junior team members, proactively identify risks, and collaborate closely with cross‑functional stakeholders.

The role demands strong leadership, deep hands‑on expertise, and the ability to drive continuous improvement for environment stability, reliability, and operational maturity.

Your key responsibilities

Technical Leadership & Ownership

Lead environment management activities across multiple platforms (on‑prem & cloud).
Provide SME‑level guidance on architecture, provisioning, configuration, and operational best practices.
Own high‑priority, complex issues and ensure timely resolution and proper RCA with corrective actions.
Drive reliability improvements through automation, observability, and SRE practices.

Team Leadership & Mentoring

Mentor, coach, and develop L1/L2/L3 engineers on technical and process skills.
Conduct code reviews for automation scripts and configuration management workflows.
Promote knowledge sharing through KT sessions, runbooks, SOPs, and internal workshops.
Act as escalation point for the team during major incidents.

Proactive Operations Management

Identify operational inefficiencies and lead optimization initiatives.
Drive proactive monitoring, trend analysis, and identify reliability risks before failures occur.
Define and measure operational KPIs such as SLAs, SLOs, latency, uptime, and MTTR.

ITSM, Incident & Change Leadership

Lead Major Incident Management (MIM) calls for infra‑related issues.
Ensure adherence to change/process governance and quality documentation.
Champion Problem Management by driving deep RCA and preventive actions across teams.

Automation, SRE & Innovation

Architect and implement automation workflows using Shell/Bash/Python/Ansible.
Enhance observability by contributing to dashboards, alert rules, logs, and metrics.
Accelerate adoption of GitOps, config‑drift controls, self‑healing scripts, and infrastructure-as-code practices.

Stakeholder Management

Collaborate with platform owners, product teams, cybersecurity, and environment management teams.
Provide regular updates, operational insights, and reliability metrics to leadership.
Influence decision‑making with data‑driven recommendations.

Lifecycle, Modernization & Migration

Lead end‑of‑life remediation planning, cloud migration initiatives, and infrastructure modernization.
Drive dependency mapping, cutover planning, and post‑migration validation.
Maintain updated asset and environment inventories.

Skills and attributes for success

Technical Excellence

Advanced Linux/Unix administration with strong debugging and performance‑tuning skills.
Deep understanding of Cloud Infra (Azure/AWS/GCP) including networking, IAM, storage, and compute stacks.
Ability to design, review, and optimize provisioning workflows (cloud/on‑prem).
Expertise in automation using Shell, Bash, Python, and Ansible with code-quality standards.
Strong understanding of observability—logs, metrics, traces, dashboards, alert tuning.
Ability to identify systemic issues across infra layers and drive long‑term fixes.
Hands‑on experience improving reliability through SRE principles (SLO/SLI, error budgets, chaos testing).
Strong knowledge of patching strategies, vulnerability remediation, and compliance requirements.

Leadership & people Skills

Ability to lead a team of L1/L2/L3 engineers with clarity, direction, and accountability.
Strong mentoring and coaching mindset—helping team members grow in technical and soft skills.
Capable of conducting peer reviews, approving changes, and challenging designs in constructive ways.
Ability to manage escalations calmly and guide the team through critical incidents.
Skilled in conflict resolution and fostering collaboration across distributed teams.

To qualify for the role, you must have

6–9 years of experience in Infrastructure Support (Cloud & On‑Prem) with demonstrated leadership responsibilities
Advanced / SME-level Unix/Linux expertise in logs, performance tuning, configs, kernel parameters, and deep troubleshooting
Hands-on experience architecting & supporting Cloud platforms (Azure/AWS/GCP) beyond day-to-day operations
Strong automation engineering skills using Shell/Bash/Python and experience reviewing team scripts for quality
Proven experience applying SRE principles (SLO/SLI definition, error budgets, automation at scale, proactive reliability improvements)
Experience leading complex provisioning, patching, migration, and environment modernization activities
Ability to lead Major Incident calls, coordinate cross‑functional teams, and drive post‑incident RCAs
Strong ITSM expertise, capable of influencing process improvements across incident, problem & change functions
Demonstrated capacity planning, performance analysis, and optimization experience
Experience mentoring L1/L2 teams, conducting KT, knowledge sharing, and enforcing engineering discipline
Strong stakeholder management with ability to interact effectively with product owners, app teams, cloud teams & leadership
Experience driving automation, monitoring uplift, and operational excellence initiatives
Willingness to support a 24x7 follow‑the‑sun model and act as the escalation point during critical issues
No location constraints and ability to work with global teams

Technologies and Tools

Must have

Cloud Infra Certification(s) – Azure/AWS/GCP (Architect/Engineer level preferred)
Expert-level Linux/Unix Administration – deep kernel tuning, performance debugging, advanced filesystem analysis
Strong scripting & automation engineering – Shell, Bash, Python with reusable framework-level coding
Advanced Patching & Vulnerability Management – experience designing patch strategies, automating patch cycles, and leading compliance programs
Monitoring & Observability Leadership – building dashboards, creating alerting strategies, log pipeline optimization (Splunk, Prometheus, Grafana, ELK, Dynatrace preferred)
Automation & Configuration Management Expertise – architecting solutions using Ansible, Git/GitHub, CI/CD workflows
ITSM – ServiceNow/Jira/Azure DevOps with ability to lead MIM calls, review RCAs
Infrastructure & Virtualization Depth – VMware vSphere, storage performance tuning, backup strategy design
Strong Networking Knowledge – advanced troubleshooting (TCP/IP, DNS, routing, firewalls, load balancers; tools like tcpdump, nmap, traceroute)
Experience with infra modernization – cloud migrations, refactoring, automation of legacy environments
Hands-on experience with infra-as-code concepts (GitOps, IaC fundamentals)
SRE Practitioner Certification or real-world SRE implementation exposure
Container & orchestration exposure – Docker, Kubernetes, AKS/EKS (design-level understanding preferred)

Good to have

Advanced networking certifications.
ITIL Intermediate or Practitioner-level knowledge in addition to Foundation
Advanced Linux/Unix certifications (RHCE, LFCS, RHCSA)
Ansible Automation Expert / Terraform experience
Experience with security tooling – endpoint protection, compliance scanners, SIEM/SOAR familiarity
Cloud-native tooling exposure – Helm, Lambda, Functions, cloud monitoring stacks
Experience designing monitoring standards and operational playbooks for teams

What we look for

Enthusiastic learners with a passion for cloud technologies and DevOps practices.
Problem solvers with a proactive approach to troubleshooting and optimization.
Team players who can collaborate effectively in a remote or hybrid work environment.
Detail-oriented professionals with strong documentation skills.

What we offer

EY Global Delivery Services (GDS) is a dynamic and truly global delivery network. We work across six locations – Argentina, China, India, the Philippines, Poland and the UK – and with teams from all EY service lines, geographies and sectors, playing a vital role in the delivery of the EY growth strategy. From accountants to coders to advisory consultants, we offer a wide variety of fulfilling career opportunities that span all business disciplines. In GDS, you will collaborate with EY teams on exciting projects and work with well-known brands from across the globe. We’ll introduce you to an ever-expanding ecosystem of people, learning, skills and insights that will stay with you throughout your career.

Continuous learning: You’ll develop the mindset and skills to navigate whatever comes next.
Success as defined by you: We’ll provide the tools and flexibility, so you can make a meaningful impact, your way.
Transformative leadership: We’ll give you the insights, coaching and confidence to be the leader the world needs.
Diverse and inclusive culture: You’ll be embraced for who you are and empowered to use your voice to help others find theirs.

EY | Building a better working world

EY exists to build a better working world, helping to create long-term value for clients, people and society and build trust in the capital markets.

Enabled by data and technology, diverse EY teams in over 150 countries provide trust through assurance and help clients grow, transform and operate.

Working across assurance, consulting, law, strategy, tax and transactions, EY teams ask better questions to find new answers for the complex issues facing our world today.

See more open positions at EY

Find Your Dream Job Today

TechOps - DE - CloudOps - Infra Support Engineer - Senior