Senior Software Engineer - AI App Enablement & Observability
Bloomberg
Software Engineering, Data Science
Dublin, Ireland
Posted on Apr 27, 2026
Platform Engineering builds the core platforms, tooling, and paved roads that Bloomberg engineers rely on to ship reliable, secure, and high-performing systems at scale.
The AI App Enablement & Observability team accelerates how AI products are built across Bloomberg Industry Group. Our mission is to make AI systems reliable, performant, cost-efficient, and continuously improving through platform tooling, deep observability, and automated feedback loops.
We build developer-facing platforms and workflows that enable teams to experiment, deploy, and operate AI and agent-based systems with confidence. This includes LLM gateways, agent platforms, benchmarking systems, telemetry pipelines, and self-improving infrastructure that closes the loop between observability and action. We emphasise strong developer experience, intuitive APIs/SDKs, and end-to-end ownership.
What’s in it for you?
You will help define how Bloomberg Industry Group builds and operates AI systems at scale by working on platforms that:
- Accelerate AI product development through reusable tooling and paved roads
- Provide end-to-end observability across AI systems (models, agents, pipelines, applications)
- Enable self-improving systems through telemetry-driven feedback loops
- Optimise cost, performance, and reliability of AI workloads
- Support both production AI systems and internal engineering agents
You’ll collaborate across AI product, infrastructure, and platform teams to deliver foundational systems.
We’ll trust you to:
Platform & Enablement
- Build and evolve AI platform tooling (e.g., developer workflows, benchmarking systems)
- Design developer-friendly APIs, SDKs, and interfaces
- Contribute to systems across the Model Development Lifecycle (experimentation, deployment, evaluation)
Observability & Telemetry
- Build and operate observability platforms and telemetry pipelines (logs, metrics, traces, events)
- Provide visibility into latency, token usage, cost, quality, drift, and reliability
- Define instrumentation standards, schemas, and conventions
- Implement distributed tracing using modern approaches (e.g., OpenTelemetry)
AI System Insights & Debugging
- Enable end-to-end debugging of AI and agent workflows (model calls, tool usage, retrieval, orchestration)
- Build benchmarking, regression detection, and performance analysis capabilities
- Support observability for both production systems and internal engineering agents
Closed-loop Optimization & Automation
- Develop systems that turn telemetry into action (automated experimentation, regression detection, alerting)
- Build feedback loops that continuously improve model quality and system behavior
- Enable self-healing and self-optimising workflows
Cost, Performance & Reliability
- Build tooling for cost visibility, forecasting, and optimization
- Define SLOs, alerting, and performance tuning practices
- Improve reliability and scalability of AI infrastructure
Ownership & Collaboration
- Own projects end-to-end (RFCs, architecture, implementation, rollout, production support)
- Partner with AI teams to drive adoption of platform tooling and standards
- Produce high-quality documentation and improve developer experience
You’ll need to have:
- Demonstrated experience building production software or platform systems
- Strong engineering fundamentals with distributed systems or backend platforms
- Experience or strong interest in observability and debugging complex systems
- Experience or strong interest in AI/ML systems, LLMs, or agent-based architectures
- Strong ownership mindset and ability to drive ambiguous problems to production
- Hands-on experience with modern agentic coding tools (e.g., Claude Code, Codex CLI, Cursor) and multi-model workflows
- Working knowledge of agent architecture internals (context engineering, tool loops, sub-agent orchestration)
We’d love to see:
- Experience with OpenTelemetry and modern observability ecosystems, including instrumentation, collectors, exporters, and tools like Prometheus, Grafana, and tracing/log systems
- Experience designing and operating telemetry pipelines, including sampling, retention, cardinality, and cost tradeoffs, as well as integrating observability into CI/CD and developer workflows
- Familiarity with AI/agent frameworks, including instrumentation of LLM calls, tool usage, workflows, and evaluation signals (quality metrics, benchmarking, regression detection)
- Experience building cost monitoring, forecasting, and optimization systems for AI workloads
- Familiarity with cloud and infrastructure tooling (e.g., AWS, Azure, Kubernetes, Terraform)
- Experience with agentic infrastructure concepts such as MCP servers, hooks, skills, subagents, sandboxing, and persistent memory patterns
- Active engagement with the agentic engineering frontier, including emerging patterns (e.g., harness vs. model, review debt, feedback loops)
- Demonstrated agent-native development practices (iterating with agents using testing, verification, and feedback loops)
- Strong security awareness for autonomous systems, including sandboxing, prompt injection risks, credential exposure, and guardrails