hero

Find Your Dream Job Today

Out for Undergrad
companies
Jobs

Site Reliability Engineer - PTA Production Engineering, VP

Deutsche Bank

Deutsche Bank

Software Engineering
Bengaluru, Karnataka, India
Posted on Feb 13, 2026

Job Description:

Job Title: Site Reliability Engineer – PTA Production Engineering, VP

Location: Bangalore, India

Role Description

  • We are seeking a strong, senior calibre Site Reliability Engineer with expertise in Investment Banking domain, preferably Cash and FX Settlements, and of hands-on experience delivering reliability, performance, and architectural modernization across mission critical post trade platforms.
  • This role demands a technologist who combines strong engineering depth, settlement domain understanding, and architectural influence, and who can elevate reliability and efficiency across both modern strategic platforms and tightly coupled legacy systems.
  • As an SRE embedded within Production Engineering, you will serve as a key technical authority—shaping architecture, engineering automated resilience, driving cloud native transformation, reducing operational complexity, eliminating chronic failure modes, and embedding SRE best practices directly into the Post Trade ecosystem, thereby ensuring reliability and operational excellence.

What we’ll offer you

As part of our flexible scheme, here are just some of the benefits that you’ll enjoy

  • Best in class leave policy
  • Gender neutral parental leaves
  • 100% reimbursement under childcare assistance benefit (gender neutral)
  • Sponsorship for Industry relevant certifications and education
  • Employee Assistance Program for you and your family members
  • Comprehensive Hospitalization Insurance for you and your dependents
  • Accident and Term life Insurance
  • Complementary Health screening for 35 yrs. and above

Your key responsibilities

Reliability Engineering & Development:

  • Engineer high-impact reliability improvements across Post Trade systems, ensuring availability, consistency, latency control, and operational predictability in a global, multi-entity environment.
  • Define, implement, and continuously refine Service Level Objectives (SLOs) and Service Level Indicators (SLIs) that accurately reflect settlement timeliness, straight-through-processing, operational risk, and business priority.
  • Design and embed advanced resilience patterns—auto-scaling, adaptive throttling, circuit-breaking, bulk-heading, retry/backoff strategies, and distributed failover—directly into application and platform components.
  • Aggressively eliminate toil by engineering automated, self-healing workflows for deployment, validation, reconciliation, incident response, break detection, and system health management—materially improving MTTR and change stability.
  • Lead complex root-cause investigations across hybrid architectures; translate findings into permanent code-based remediations that reduce repeat incidents and systemic fragility.
  • Champion modern SRE practices, including GitOps pipelines, full-stack telemetry with OpenTelemetry, distributed tracing, chaos/failure-mode testing, and reliability-driven development.

CI/CD Engineering & Automation Development:

  • Develop high-quality automation frameworks, provisioning workflows, environment consistency checks, and release-validation tooling to eliminate manual change execution and reduce operational variance.

Your skills and experience

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related technical field
  • Programming experience with Java, Python or Shell would be a plus (preferably with production grade reliability, resiliency, or real time workflow implementations)
  • Proven expertise in reliability engineering, automation, and cloud-native transformation within complex financial systems.
  • Knowledge of Google Cloud Platform would be a strong advantage
  • Expertise in implementing advanced resilience patterns and self-healing workflows to improve system reliability and reduce operational toil.
  • Experience with modern SRE practices: GitOps, full-stack telemetry, distributed tracing, chaos engineering, and reliability-driven development.
  • Exceptional analytical and problem-solving skills for root-cause investigations and permanent remediation of systemic issues
  • Experience in ITIL Incident Management and Problem Management processes will be a big plus
  • Cloud Technologies GCP, Azure and/or AWS experience will be a plus

Soft Skills:

  • Exceptional analytical, diagnostic, and problem-solving capability, especially in high pressure, high stakes production environments.
  • Excellent communication skills, including the ability to drive cross team alignment and influence architectural direction.
  • High ownership mindset with the ability to make reasoned, data driven decisions and drive long term improvements.
  • Strong adaptability within fast paced, agile engineering environments.
  • Collaborative, confident, and passionate about quality, craftsmanship, learning, and mentoring.

How we’ll support you

  • Training and development to help you excel in your career
  • Coaching and support from experts in your team
  • A culture of continuous learning to aid progression
  • A range of flexible benefits that you can tailor to suit your needs

About us and our teams

Please visit our company website for further information:

https://www.db.com/company/company.html

We strive for a culture in which we are empowered to excel together every day. This includes acting responsibly, thinking commercially, taking initiative and working collaboratively.

Together we share and celebrate the successes of our people. Together we are Deutsche Bank Group.

We welcome applications from all people and promote a positive, fair and inclusive work environment.