Lead Infrastructure Engineer
JPMorganChase
Other Engineering
Singapore
Posted on Mar 13, 2026
Assume a vital position as a key member of a high-performing team that delivers infrastructure and performance excellence. Your role will be instrumental in shaping the future at one of the world's largest and most influential companies.
As a Lead Infrastructure Engineer at JPMorganChase within the Infrastructure Platform, you apply deep knowledge of software, applications, and technical processes within the infrastructure engineering discipline. Continue to evolve your technical and cross-functional knowledge outside of your aligned domain of expertise.
Job responsibilities
- Lead production operations for critical services: act as incident commander for Priority 1/2 events, drive rapid restoration, clear communications, and post-incident reviews with owned, time-bound remediations.
- Own stability and resiliency improvements: implement and standardize patterns (timeouts/retries, circuit breakers, bulkheads, back-pressure, graceful degradation) and run failover/chaos exercises to validate recovery.
- Drive cross-platform architecture and modernization: partner with application, platform, and security teams to design and implement changes that reduce operational risk and improve reliability and performance.
- Deliver hands-on design, development, and troubleshooting for complex infrastructure issues; create durable fixes and automation that prevent recurrence and reduce manual toil.
- Manage workstreams end-to-end across one or more infrastructure domains (e.g., Kubernetes, Linux, networking, databases, cloud), ensuring clear scope, milestones, and measurable outcomes.
- Apply strong systems thinking: assess upstream/downstream dependencies and data flows; identify technical implications and advise on mitigation, rollout sequencing, and safe change strategies.
- Operate effectively in a 24/7 model: support on-call rotations, improve runbooks and diagnostics, and continually raise the bar on detection, alert quality, and response time.
Required qualifications, capabilities, and skills
- Bachelor’s Degree in Computer Science, Cybersecurity, Data Science, or related disciplines
- 5+ years of relevant infrastructure engineering experience, with increasing scope/ownership.
- Deep expertise in one or more core areas: compute and OS (Linux), networking, databases/storage, container orchestration, CI/CD and deployment practices, integration/automation, scaling, resiliency, and performance engineering.
- Strong observability and monitoring proficiency, including metrics, logs, distributed tracing, alerting, and SLO/SLA design.
- Demonstrated troubleshooting across heterogeneous platforms and services, with hands-on administration in Linux, middleware, and databases.
- Practical experience operating modern infrastructure stacks: Linux, Kubernetes, AWS, Terraform; and observability tooling such as Splunk, Grafana, Datadog, AWS X-Ray.
- Database exposure with one or more of: Cassandra, Oracle, CockroachDB; ability to assess performance, capacity, and resilience trade-offs.
- Proficiency in scripting and software engineering for infrastructure (e.g., Bash, Python); ability to build automation, tooling, and integrations.
- Deep knowledge of cloud infrastructure and services across public and private clouds, including migration patterns and hybrid connectivity.
- Experience identifying and resolving production issues on public cloud platforms; ability to lead service improvement plans and problem management.
- Proven experience with LLM orchestration frameworks or custom agent runtimes; strong API design, reliability engineering, and end-to-end observability (tracing/metrics/logging). Delivered at least one agentic system to production with quantified impact (e.g., automation rate, latency, cost).
Preferred qualifications, capabilities, and skills
- Incident leadership: serves as incident commander for Sev1/Sev2 events, drives clear comms and rapid restoration, and ensures post-incident reviews with owned, time-bound remediations.
- SRE practices at scale: defines/enforces SLOs/SLIs and error budgets; improves on-call quality with actionable runbooks, sustainable alerting, and clear escalation paths.
- Observability and automation: advances metrics/logs/traces and synthetic probes ; builds self-heal automation for diagnostics and common remediations.
J.P. Morgan is a global leader in financial services, providing strategic advice and products to the world’s most prominent corporations, governments, wealthy individuals and institutional investors. Our first-class business in a first-class way approach to serving clients drives everything we do. We strive to build trusted, long-term partnerships to help our clients achieve their business objectives.
We recognize that our people are our strength and the diverse talents they bring to our global workforce are directly linked to our success. We are an equal opportunity employer and place a high value on diversity and inclusion at our company. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, gender expression, age, marital or veteran status, pregnancy or disability, or any other basis protected under applicable law. We also make reasonable accommodations for applicants’ and employees’ religious practices and beliefs, as well as mental health or physical disability needs. Visit our FAQs for more information about requesting an accommodation.
Our professionals in our Corporate Functions cover a diverse range of areas from finance and risk to human resources and marketing. Our corporate teams are an essential part of our company, ensuring that we’re setting our businesses, clients, customers and employees up for success.
Carry out critical infrastructure engineering solutions across multiple technical areas as an integral part of an agile team