hero

Find Your Dream Job Today

Our mission is to help high-achieving LGBTQ+ undergraduates reach their full potential.

Principal Infrastructure Engineer

JPMorganChase

JPMorganChase

Other Engineering
Hyderabad, Telangana, India
Posted on Monday, August 19, 2024

Job Description

Job Description

As an experienced Public Cloud Platform Engineer Lead, you will be an integral part of the Public Cloud Enablement Team and you'll be guiding application teams on assessing for application readiness in moving to public cloud, postproduction reviews, and to resolve production incidents that impact our customers, clients, and businesses around the globe. Your experience in public cloud migrations of complex systems, anticipating problems, and finding ways to mitigate risk, and issues will be key in leading numerous public cloud initiatives and incidents. Some of the key pillars you would be driving are, Technology life cycle management, problem management, Resiliency and Automation. In addition, you will:

  • Collaborate with product and engineering teams to deliver robust cloud-based solutions that drive enhanced customer experiences.
  • Own end-to-end platform issues, problem management & help provide solutions to platform production issues on the AWS Cloud & ensure the applications are available as expected.
  • Guide various product teams on the standards and best practices related to the Public Cloud process and help them mitigate issues in production cloud with minimal downtime.
  • Lead a team to Develop, enhance, and maintain established standards and best practices,

And while you'll be part of a tight-knit team that shares your passion for modern technology, you’ll also gain access to the best minds in the business—both as part of the JPMorgan Chase & Co. global technology community, and through our partnerships with some of the most important technology firms in the world.

Role/Responsibilities

  • Drive, support, and deliver on a strategy to operate on a build broad use of Amazon's utility computing web services (e.g., AWS EC2, AWS S3, AWS RDS, AWS CloudFront, AWS EFS, CloudWatch, EKS)
  • Analyze upcoming platform level changes into production ensure communication of relevant impact.
  • Identify opportunities to improve resiliency, availability, secure, high performing platforms in Public Cloud using JPMC best practices
  • Improve reliability, quality, and reduce to time to resolve issues in production incidents on software applications in prod
  • Implement continuous process improvement, including but not limited to policy, procedures, and production monitoring
  • Identify, coordinate, and implement initiatives/projects and activities that create efficiencies and optimize technical processing
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
  • Provide primary operational support and engineering for the public cloud platform. Show leadership for any production issue and manage all the corresponding team in working towards fix and also should ensure minimal customer impact
  • Debug and optimize systems and automate routine tasks.
  • Collaborate with a cross-functional team to identify potential risks in production and opportunities to improve user experiences at every interaction.
  • Drive work streams to ensure Applications meet strict non-functional requirements for Public Cloud On-boarding
  • Evaluate production readiness through game days, resiliency tests and chaos engineering exercises.
  • Utilize programming languages like Java, Python, SQL, Node, Go, and Scala, Open Source RDBMS and NoSQL databases, Container Orchestration services including Docker and Kubernetes, and a variety of AWS tools and services
  • Monitor metrics and program health, anticipate and clear blockers, manage escalations
  • Roll your sleeves up in deep problem solving

Required Qualifications

  • A strong understanding of business technology drivers and their impact on architecture design, performance and monitoring, best practices
  • A dynamic individual with excellent communication skills, who can adapt verbiage and style to the audience at hand and deliver critical information in a clear and concise message.
  • The candidate must be a strong analytical thinker, with business acumen and the ability to assimilate information quickly, with a solution based focus on incident and problem management.
  • 10+ years experience across the SDLC process – Design and/or Development and/or support with atleast 5 years on a technology leadership role
  • 5-7 years experience/knowledge building or supporting environments on AWS, which includes working with services like EC2, ELB, RDS, and S3
  • Experience using DevOps tools in a cloud environment, such as Ansible, Artifactory, Docker, GitHub, Jenkins, Kubernetes, Maven, and Sonar Qube
  • Experience/Knowledge using monitoring solutions like CloudWatch, Prometheus, Datadog
  • Experience/Knowledge of writing Infrastructure-as-Code (IaC), using tools like CloudFormation or Terraform
  • Experience with one or more public cloud platforms like AWS, GCP, Azure
  • Experience with one or more automation tools like Terraform, Puppet, Ansible
  • Experience with high volume, mission critical applications and their interdependencies with other applications and databases
  • Ability to leverage Splunk and Dynatrace to identify and troubleshoot issues.
  • Experience of ITIL process such as incident, problem, and life cycle management
  • Experience with high volume, mission critical applications, and building upon messaging and or event-driven architectures.
  • Knowledge of container platforms such as Docker and Kubernetes.
  • Strong understanding of architecture, design, and business processes
  • Keen understanding of financial and budget management, control and optimization of Public Cloud expenses
  • Experience in working in in large, collaborative teams to achieve organizational goals
  • Passionate about building an innovative culture
  • Experience with production/non-production support of highly available applications
  • Experience with system performance monitoring and operational capacity management
  • Strong communication and collaboration skills

Preferred Qualifications

  • Bachelor’s degree in computer science or other technical, scientific discipline
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
  • SRE mindset Culture/Approaches: To run better production systems by creating engineering solutions to operational problems.
  • Ability to program (structured and OO) with one or more high level languages, such as Python, Java, C/C++, Ruby, and JavaScript
  • Ansible and other dev ops tools is added advantage.
  • AWS Certification.