Production Support Specialist, AVP
Deutsche Bank
Job Description:
Job Title: Production Support Specialist (SL3)
Corporate Title: Associate Vice President
Location: Pune, India
Role Description
- We are seeking a highly skilled and experienced Senior Production Support Specialist to join our dynamic Production Operations team. This critical role is responsible for providing expert-level (L3) technical support, troubleshooting, and incident resolution for our complex & Global Banking strategic platform for supporting backend functions and infrastructure in a fast-paced environment.
- The ideal candidate should possess a strong technical background across various domains, a keen problem-solving attitude, excellent analytical skills, and the ability to operate autonomously while collaborating effectively with development, infrastructure, and business teams. This role demands proactive identification of issues, root cause analysis, and the implementation of permanent solutions to ensure optimal system performance and reliability.
What we’ll offer you
As part of our flexible scheme, here are just some of the benefits that you’ll enjoy,
- Best in class leave policy.
- Gender neutral parental leaves
- 100% reimbursement under childcare assistance benefit (gender neutral)
- Sponsorship for Industry relevant certifications and education
- Employee Assistance Program for you and your family members
- Comprehensive Hospitalization Insurance for family and your dependents
- Accident and Term life Insurance
- Complementary Health screening for 35 yrs. and above
Your key responsibilities
- Incident Management (L3 Support):
- Serve as the primary escalation point for complex production incidents, providing expert-level diagnosis and resolution for critical issues that cannot be resolved by L1/L2 teams.
- Lead incident resolution efforts, coordinating with multiple teams (Dev, QA, Infra, Network) to restore service rapidly and minimize business impact.
- Perform in-depth root cause analysis (RCA) for major incidents, identifying underlying technical problems and proposing long-term preventative measures.
- Participate in a 24/7 on-call rotation to provide support for critical production systems.
- Problem Management:
- Identify recurring issues and systemic problems, working collaboratively with development teams to implement permanent fixes and architectural improvements.
- Proactively monitor system health, performance, and trends to identify potential issues before they impact users.
- System Health & Performance:
- Utilize monitoring tools (e.g., Splunk, Grafana, ELK, Prometheus, Datadog) to analyze system performance, identify bottlenecks, and ensure optimal resource utilization.
- Develop and refine monitoring alerts and dashboards to provide early warnings for potential issues.
- Optimize application performance and stability through configuration tuning, code analysis, and infrastructure recommendations.
- Technical Expertise & Mentorship:
- Maintain deep technical expertise in [specific technologies, e.g., Java/Spring, .NET, Python, SQL, NoSQL, Kafka, AWS/Azure/GCP].
- Act as a subject matter expert (SME) and provide technical guidance and training to L1/L2 support teams.
- Document troubleshooting procedures, runbooks, and knowledge articles to enhance the team's capabilities.
- Automation & Tooling:
- Develop and implement automation scripts and tools (e.g., Python, Bash, PowerShell) to streamline operational tasks, reduce manual effort, and improve efficiency.
- Contribute to the continuous improvement of our production support toolkit and processes.
- Deployment & Release Support:
- Support application deployments, environment refreshes, and production releases, ensuring stability and verifying post-deployment health.
- Conduct pre-release checks and post-release validation to minimize risks.
- Collaboration & Communication:
- Communicate effectively with technical and non-technical stakeholders during incidents, providing clear and concise updates.
- Collaborate closely with development teams to understand new features, provide feedback on supportability, and ensure smooth handovers.
- Participate in design and architecture reviews to represent operational requirements and provide input on system resilience and maintainability.
Your skills and experience
- Education:
- Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field; or equivalent practical experience.
- Experience:
- 5+ years of hands-on experience in a Production Support, Site Reliability Engineering (SRE), or DevOps role, with at least 2-3 years at an L3 level.
- Proven experience supporting complex, high-transaction, and mission-critical applications in a Follow the SUN Model.
- Technical Proficiency (Demonstrated expert-level knowledge in several of the following areas):
- Operating Systems: Linux (RHEL, CentOS, Ubuntu) and/or Windows Server administration.
- Databases: Strong proficiency in SQL (e.g., PostgreSQL, MySQL, MS SQL Server) including complex query writing, performance tuning, and troubleshooting. Experience with NoSQL databases (e.g., MongoDB, Cassandra, Redis) is a plus.
- Programming/Scripting: Proficiency in at least one scripting language (Python, Bash, PowerShell) for automation and data analysis.
- Application Servers/Web Servers: Experience with technologies like Tomcat, JBoss, WebLogic, Nginx, Apache HTTP Server, IIS.
- Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS, Azure, GCP), including understanding of cloud services (EC2, S3, RDS, Lambda, AKS, GKE, etc.).
- Monitoring & Logging Tools: Extensive experience with tools such as Splunk, ELK Stack (Elasticsearch, Logstash, Kibana), Grafana, Prometheus, Datadog, AppDynamics, Dynatrace.
- Networking Fundamentals: Solid understanding of TCP/IP, DNS, Load Balancers (e.g., F5, Nginx, AWS ELB/ALB), Firewalls.
- Messaging Queues: Experience with Kafka, RabbitMQ, ActiveMQ, or similar.
- Containerization/Orchestration: Experience with Docker and Kubernetes is highly desirable.
- Problem-Solving & Analytical Skills:
- Exceptional analytical and diagnostic skills with the ability to quickly triage, isolate, and resolve complex technical issues under pressure.
- Strong ability to perform thorough root cause analysis and implement effective preventative measures.
- Soft Skills:
- Excellent written and verbal communication skills, with the ability to articulate complex technical issues to both technical and non-technical audiences.
- Strong interpersonal skills, with the ability to build relationships and collaborate effectively across teams.
- High degree of initiative, proactivity, and self-motivation.
- Ability to manage multiple priorities and work independently with minimal supervision.
- A strong sense of ownership and accountability.
- ITIL/Service Management:
- Familiarity with ITIL principles (Incident, Problem, Change Management) is a plus.
How we’ll support you
- Training and development to help you excel in your career.
- Coaching and support from experts in your team.
- A culture of continuous learning to aid progression.
- A range of flexible benefits that you can tailor to suit your needs.
About us and our teams
https://www.db.com/company/company.html
We strive for a culture in which we are empowered to excel together every day. This includes acting responsibly, thinking commercially, taking initiative and working collaboratively.
Together we share and celebrate the successes of our people. Together we are Deutsche Bank Group.
We welcome applications from all people and promote a positive, fair and inclusive work environment.