DevEx Platform Engineer
The Carlyle Group
Software Engineering
Washington, USA
USD 160k-180k / year
Position Summary
- Operational Reliability: Treats running the platform as a first-class responsibility. Experience defining SLOs and SLIs, instrumenting services with metrics, logging, and tracing, responding to incidents, and conducting blameless postmortems. Reduces operational toil through automation.
- Observability: Hands-on experience with modern observability platforms (Datadog, Grafana, OpenTelemetry, or similar). Builds dashboards, alerts, and traces that surface the right signals at the right time, and partners with engineering teams to instrument their services for production readiness.
- Communication: Able to articulate the value of AI and platform investments to a range of stakeholders, and to build credibility with both technical and non-technical colleagues.
- Hands-On Engineering: Spends a significant portion of working hours writing and automating code. Comfortable with production code, building integrations, debugging pipelines, and shipping features end-to-end across distributed systems.
- Adaptability: Comfortable with ambiguity and committed to ongoing learning as the AI tooling landscape evolves. Looks for practical ways to improve the developer experience.
Responsibilities
- Operate and monitor AI-enabled developer platforms in production: define SLOs and SLIs, build dashboards and alerting, and ensure the reliability, performance, and availability of the services engineering teams depend on every day.
- Drive troubleshooting and remediation of production issues when platform services degrade, and run blameless postmortems that turn incidents into lasting reliability improvements.
- Automate operational tasks to reduce toil, and continuously tune capacity, cost, and performance as platform adoption grows.
- Design, build, and operate AI-enabled developer platforms serving a development community of over 500 engineers across the technology organization.
- Build and maintain agentic frameworks, skills and tool registries, MCP gateways, and the supporting infrastructure that allows engineering teams to safely and effectively use AI across the software delivery lifecycle.
- Develop self-service cloud development environments and internal developer portal capabilities (using platforms such as Coder) that let application teams provision, configure, and operate their environments with minimal central involvement.
- Prototype and evaluate new AI development tools, agent runtimes, and orchestration patterns, and incorporate the most useful capabilities into the platform.
- Partner directly with engineering teams across the firm to drive adoption of AI-enabled development practices and translate user feedback into platform improvements.
- Collaborate with security, networking, cloud operations, and architecture teams to ensure platform capabilities are secure, compliant, and well-integrated with existing infrastructure, and contribute to internal documentation and reference implementations that help engineers get the most out of AI-driven development.
Qualifications
- Bachelor’s Degree required
- Concentration in Computer Science, Software Engineering, Information Technology, or similar discipline, strongly preferred
- AWS Certified Solutions Architect, AWS Certified Developer, or similar cloud certifications preferred
- Industry-standard certifications in platform engineering, Kubernetes, or AI/ML preferred
- 5+ years of relevant software or platform engineering experience, required.
- Hands-on experience integrating AI platforms and tooling into enterprise engineering workflows, including agent runtimes, AI coding assistants, MCP-based integrations, and the supporting infrastructure required to operate them safely at scale.
- Practical experience implementing and managing AI development tools such as: Claude Code, Cursor, MCP servers, Coder..
- Hands-on experience operating production services using SRE practices, including defining and measuring SLOs/SLIs, managing error budgets, and reducing operational toil through automation.
- Proficiency with observability and monitoring tooling (such as ELK, Datadog, OpenTelemetry, or CloudWatch) to instrument, alert on, and troubleshoot distributed systems.
- Experience troubleshooting and remediating production issues, including triage, mitigation, and blameless postmortems.
- Strong hands-on experience with AWS, including the services that underpin modern developer platforms (EKS, ECS, Lambda, API Gateway, IAM, VPC, S3, and similar).
- Strong proficiency with GitHub Enterprise, including GitHub Actions, reusable workflows, and platform-level patterns such as IssueOps and workflow orchestration.
- Strong proficiency with infrastructure as code (Terraform) and modern CI/CD practices for both application and platform delivery.
- Strong proficiency in at least one general-purpose programming language (Python, TypeScript, Go, or similar) and comfort writing production code.
- Strong understanding of cloud security principles, identity and access management, and secure-by-default platform patterns.
- Commitment to ongoing learning as AI and developer tooling continue to evolve.
- Strong problem-solving skills, with a track record of troubleshooting distributed systems and developer tooling issues.
- Strong interpersonal and communication skills, with the ability to explain complex topics and present to both technical and non-technical audiences.
- Proven collaboration skills across multiple technology and project teams to deliver effective solutions.
- Ability to communicate clearly in presentations and written documentation.
- Ability to prioritize and deliver across competing demands.
Company Information
The Carlyle Group (NASDAQ: CG) is a global investment firm with $475 billion of assets under management, across 678 investment vehicles as of March 31, 2026. Founded in 1987 in Washington, DC, Carlyle has grown into one of the world's largest and most successful investment firms, with more than 2,500 professionals operating in 28 offices in North America, Europe, the Middle East, Asia and Australia.
Carlyle’s purpose is to connect people, ideas, and capital to fuel growth for companies and performance for investors, which range from public and private pension funds to wealthy individuals and families to sovereign wealth funds, unions and corporations. Carlyle invests across three segments – Global Private Equity, Global Credit and Carlyle AlpInvest – and has deep expertise across industries, markets, and geographies.
At Carlyle, we believe that a wide spectrum of experiences and viewpoints drives performance and success. Our CEO, Harvey Schwartz, has stated that, "To build better businesses and create value for all of our stakeholders, we are focused on assembling leadership teams with the strongest insights from a range of perspectives." Reflecting this view, emphasis is placed on development, retention and inclusion through our internal processes and seven Employee Resource Groups (ERGs). We cultivate a culture where ideas are openly shared and challenged, connecting diverse expertise and perspectives to drive enduring value.