My job alerts

Site Reliability Engineer II - AWS, Incident Response, Automation, Observability

JPMorganChase

Software Engineering

Hyderabad, Telangana, India

Posted on May 15, 2026

Join a team where your SRE expertise drives critical application reliability and operational excellence. Grow your skills in a collaborative, innovative environment.

As a Site Reliability Engineer at JPMorgan Chase within the Chief technology Office team, you will manage and optimize production operations for critical applications. You will leverage your AWS and SRE skills to ensure service stability, performance, and resilience. You will collaborate with engineering and security teams to deliver secure, reliable solutions. Your contributions will help us maintain a robust and thriving operating environment.

Job responsibilities

Manage and support production operations for critical applications, ensuring stability and predictable performance
Proactively monitor health signals, identify risks, and prevent incidents
Execute operational routines including release readiness, change coordination, and controlled rollouts
Lead or participate in incident triage, recovery, communications, and post-incident reviews with clear root cause analysis and follow-up actions
Drive problem management to eliminate repeat incidents
Build and maintain dashboards, alerts, and operational documentation for improved detection and diagnosis
Automate manual operational tasks and improve tooling using scripting or coding (Python, Bash, Go)
Define and track SLIs/SLOs, manage error budgets, and partner with development teams for reliability
Perform capacity planning, resilience testing, and performance tuning

Required qualifications, capabilities and skills

Formal training or certification on security engineering concepts and 5+ years applied experience
Experience supporting critical application production environments with strong operational discipline
Strong troubleshooting skills across Linux, application behavior, and networking fundamentals
Hands-on experience operating and diagnosing issues in AWS environments
Solid working knowledge of AWS IAM and access control best practices
Experience with observability tools (monitoring, logging, alerting)
Automation mindset with scripting/coding capability (Python, Bash, Go) and familiarity with CI/CD practices
Clear communication during incidents and strong documentation habits

Preferred qualifications, capabilities and skills

Experience with tracing tools for observability
Familiarity with resilience testing and performance tuning in cloud environments
Knowledge of operational security requirements and credential hygiene
Experience collaborating with platform and engineering teams

Site Reliability Engineer ensuring stable, resilient, and high-performing production services through AWS and SRE best practices.

See more open positions at JPMorganChase

Find Your Dream Job Today

Site Reliability Engineer II - AWS, Incident Response, Automation, Observability