hero

Find Your Dream Job Today

Out for Undergrad
companies
Jobs

Lead Software Engineer - Production Management

JPMorganChase

JPMorganChase

Software Engineering
Mumbai, Maharashtra, India · Bengaluru, Karnataka, India
Posted on Mar 23, 2026

We have an opportunity to impact your career in the Production Management / Site Reliability Engineering team

As a Lead Software Engineer at JPMorgan Chase within the Commercial & Investment Bank's Equities Technology team, you are an integral part of an agile team that ensure the stability, reliability, and performance of our production trading systems and market data infrastructure. You will bring deep technical expertise, a proactive approach to incident and problem management, and a forward-thinking mindset to drive automation and AI-driven operational excellence across the production environment.

Job Responsibilities

  • Oversee the end-to-end stability and availability of production systems supporting Markets trading desks across Equities. Lead incident management, root cause analysis, and problem resolution for critical production issues, ensuring minimal business impact.
  • Define and enforce SLAs, SLOs, and SLIs to measure and continuously improve system reliability.
  • Manage and optimize batch processing, real-time data flows, and upstream/downstream dependencies to ensure timely delivery of market data and index calculations. Drive capacity planning, performance tuning, and proactive monitoring to prevent service degradation.
  • Develop and maintain automation scripts and tools using Python and Unix/Linux shell scripting to streamline operational workflows, reduce manual intervention, and accelerate incident response.
  • Design and manage monitoring, alerting, and observability frameworks using Geneos, Splunk, and related tooling to provide real-time visibility into system health and performance. Administer and optimize MySQL databases, including query performance tuning, replication management, and data integrity validation.
  • Build and maintain dashboards and reporting solutions to provide actionable insights into production metrics, SLA adherence, and system trends.
  • Champions the integration of Artificial Intelligence and Machine Learning capabilities into production management processes, including predictive alerting, anomaly detection, and intelligent incident triage.
  • Evaluate and implement AI-driven tools to enhance monitoring, log analysis, and root cause identification, reducing mean time to detection and resolution. Leverage AI and automation to optimize runbook execution, event correlation, and capacity forecasting.
  • Collaborate with CDAO and engineering teams to integrate LLM-based solutions and AI assistants into operational workflows to improve efficiency and decision-making.
  • Act as a senior escalation point for critical production issues, coordinating across technology, business, and infrastructure teams to drive resolution. Partner closely with development, QA, and release management teams to ensure seamless production deployments and change management.
  • Mentor and guide junior team members, fostering a culture of operational excellence, continuous improvement, and knowledge sharing. Engage with business stakeholders across Markets trading desks to understand priorities, communicate production status, and align technology support with business objectives.

Required qualifications, capabilities, and skills

  • Formal training or certification on software engineering concepts and 5+ years applied experience
  • 12+ years of experience in Production Management, Site Reliability Engineering, or Infrastructure Engineering within financial services, preferably in a CIB Markets environment.
  • Strong proficiency in Python for scripting, automation, and tooling development.
  • Deep hands-on experience with Unix/Linux system administration and shell scripting.
  • Solid working knowledge of MySQL database administration, including performance tuning, replication, and troubleshooting.
  • Extensive experience with Geneos for real-time infrastructure and application monitoring.
  • Advanced proficiency in Splunk for log aggregation, search, alerting, and dashboard creation.
  • Demonstrated experience with incident, problem, and change management frameworks
  • Strong understanding of trading systems, market data platforms, and batch/real-time processing architectures.
  • Knowledge of CI/CD pipelines, Spinnaker, and infrastructure-as-code practices.

Preferred qualifications, capabilities, and skills

  • Experience integrating AI/ML solutions into production operations, including predictive monitoring, intelligent alerting, and automated remediation.
  • Familiarity with LLM-based tools and platforms for operational use cases and messaging systems such as Kafka or MQ.
  • Experience with cloud platforms (AWS, Azure) and containerized environments (Docker, Kubernetes), APM tools such as Dynatrace or AppDynamics.
  • Exposure to orchestration and workflow tools such as Autosys, Control-M, or Airflow.


Carry out critical tech solutions across multiple technical areas as an integral part of an agile team