Senior Software Engineer - AI App Enablement & Observability
Bloomberg
Software Engineering, Data Science
Dublin, Ireland
Posted on Apr 28, 2026
Location
Dublin
Business Area
Engineering and CTO
Ref #
10050677
Description & Requirements
Platform Engineering builds the core platforms, tooling, and paved roads that Bloomberg engineers rely on to ship reliable, secure, and high-performing systems at scale.
The AI App Enablement & Observability team accelerates how AI products are built across Bloomberg Industry Group. Our mission is to make AI systems reliable, performant, cost-efficient, and continuously improving through platform tooling, deep observability, and automated feedback loops.
We build developer-facing platforms and workflows that enable teams to experiment, deploy, and operate AI and agent-based systems with confidence. This includes LLM gateways, agent platforms, benchmarking systems, telemetry pipelines, and self-improving infrastructure that closes the loop between observability and action. We emphasise strong developer experience, intuitive APIs/SDKs, and end-to-end ownership.
What’s in it for you?
You will help define how Bloomberg Industry Group builds and operates AI systems at scale by working on platforms that:
We’ll Trust You To
Platform & Enablement
Discover what makes Bloomberg unique - watch our for an inside look at our culture, values, and the people behind our success.
Dublin
Business Area
Engineering and CTO
Ref #
10050677
Description & Requirements
Platform Engineering builds the core platforms, tooling, and paved roads that Bloomberg engineers rely on to ship reliable, secure, and high-performing systems at scale.
The AI App Enablement & Observability team accelerates how AI products are built across Bloomberg Industry Group. Our mission is to make AI systems reliable, performant, cost-efficient, and continuously improving through platform tooling, deep observability, and automated feedback loops.
We build developer-facing platforms and workflows that enable teams to experiment, deploy, and operate AI and agent-based systems with confidence. This includes LLM gateways, agent platforms, benchmarking systems, telemetry pipelines, and self-improving infrastructure that closes the loop between observability and action. We emphasise strong developer experience, intuitive APIs/SDKs, and end-to-end ownership.
What’s in it for you?
You will help define how Bloomberg Industry Group builds and operates AI systems at scale by working on platforms that:
- Accelerate AI product development through reusable tooling and paved roads
- Provide end-to-end observability across AI systems (models, agents, pipelines, applications)
- Enable self-improving systems through telemetry-driven feedback loops
- Optimise cost, performance, and reliability of AI workloads
- Support both production AI systems and internal engineering agents
We’ll Trust You To
Platform & Enablement
- Build and evolve AI platform tooling (e.g., developer workflows, benchmarking systems)
- Design developer-friendly APIs, SDKs, and interfaces
- Contribute to systems across the Model Development Lifecycle (experimentation, deployment, evaluation)
- Build and operate observability platforms and telemetry pipelines (logs, metrics, traces, events)
- Provide visibility into latency, token usage, cost, quality, drift, and reliability
- Define instrumentation standards, schemas, and conventions
- Implement distributed tracing using modern approaches (e.g., OpenTelemetry)
- Enable end-to-end debugging of AI and agent workflows (model calls, tool usage, retrieval, orchestration)
- Build benchmarking, regression detection, and performance analysis capabilities
- Support observability for both production systems and internal engineering agents
- Develop systems that turn telemetry into action (automated experimentation, regression detection, alerting)
- Build feedback loops that continuously improve model quality and system behavior
- Enable self-healing and self-optimising workflows
- Build tooling for cost visibility, forecasting, and optimization
- Define SLOs, alerting, and performance tuning practices
- Improve reliability and scalability of AI infrastructure
- Own projects end-to-end (RFCs, architecture, implementation, rollout, production support)
- Partner with AI teams to drive adoption of platform tooling and standards
- Produce high-quality documentation and improve developer experience
- Demonstrated experience building production software or platform systems
- Strong engineering fundamentals with distributed systems or backend platforms
- Experience or strong interest in observability and debugging complex systems
- Experience or strong interest in AI/ML systems, LLMs, or agent-based architectures
- Strong ownership mindset and ability to drive ambiguous problems to production
- Hands-on experience with modern agentic coding tools (e.g., Claude Code, Codex CLI, Cursor) and multi-model workflows
- Working knowledge of agent architecture internals (context engineering, tool loops, sub-agent orchestration)
- Experience with OpenTelemetry and modern observability ecosystems, including instrumentation, collectors, exporters, and tools like Prometheus, Grafana, and tracing/log systems
- Experience designing and operating telemetry pipelines, including sampling, retention, cardinality, and cost tradeoffs, as well as integrating observability into CI/CD and developer workflows
- Familiarity with AI/agent frameworks, including instrumentation of LLM calls, tool usage, workflows, and evaluation signals (quality metrics, benchmarking, regression detection)
- Experience building cost monitoring, forecasting, and optimization systems for AI workloads
- Familiarity with cloud and infrastructure tooling (e.g., AWS, Azure, Kubernetes, Terraform)
- Experience with agentic infrastructure concepts such as MCP servers, hooks, skills, subagents, sandboxing, and persistent memory patterns
- Active engagement with the agentic engineering frontier, including emerging patterns (e.g., harness vs. model, review debt, feedback loops)
- Demonstrated agent-native development practices (iterating with agents using testing, verification, and feedback loops)
- Strong security awareness for autonomous systems, including sandboxing, prompt injection risks, credential exposure, and guardrails
Discover what makes Bloomberg unique - watch our for an inside look at our culture, values, and the people behind our success.