Senior Software Engineer - AI App Enablement & Observability

Bloomberg

Bloomberg

Software Engineering, Data Science

Dublin, Ireland

Posted on Apr 28, 2026
Location

Dublin

Business Area

Engineering and CTO

Ref #

10050677

Description & Requirements

Platform Engineering builds the core platforms, tooling, and paved roads that Bloomberg engineers rely on to ship reliable, secure, and high-performing systems at scale.

The AI App Enablement & Observability team accelerates how AI products are built across Bloomberg Industry Group. Our mission is to make AI systems reliable, performant, cost-efficient, and continuously improving through platform tooling, deep observability, and automated feedback loops.

We build developer-facing platforms and workflows that enable teams to experiment, deploy, and operate AI and agent-based systems with confidence. This includes LLM gateways, agent platforms, benchmarking systems, telemetry pipelines, and self-improving infrastructure that closes the loop between observability and action. We emphasise strong developer experience, intuitive APIs/SDKs, and end-to-end ownership.

What’s in it for you?

You will help define how Bloomberg Industry Group builds and operates AI systems at scale by working on platforms that:

  • Accelerate AI product development through reusable tooling and paved roads
  • Provide end-to-end observability across AI systems (models, agents, pipelines, applications)
  • Enable self-improving systems through telemetry-driven feedback loops
  • Optimise cost, performance, and reliability of AI workloads
  • Support both production AI systems and internal engineering agents

You’ll collaborate across AI product, infrastructure, and platform teams to deliver foundational systems.

We’ll Trust You To

Platform & Enablement

  • Build and evolve AI platform tooling (e.g., developer workflows, benchmarking systems)
  • Design developer-friendly APIs, SDKs, and interfaces
  • Contribute to systems across the Model Development Lifecycle (experimentation, deployment, evaluation)

Observability & Telemetry

  • Build and operate observability platforms and telemetry pipelines (logs, metrics, traces, events)
  • Provide visibility into latency, token usage, cost, quality, drift, and reliability
  • Define instrumentation standards, schemas, and conventions
  • Implement distributed tracing using modern approaches (e.g., OpenTelemetry)

AI System Insights & Debugging

  • Enable end-to-end debugging of AI and agent workflows (model calls, tool usage, retrieval, orchestration)
  • Build benchmarking, regression detection, and performance analysis capabilities
  • Support observability for both production systems and internal engineering agents

Closed-loop Optimization & Automation

  • Develop systems that turn telemetry into action (automated experimentation, regression detection, alerting)
  • Build feedback loops that continuously improve model quality and system behavior
  • Enable self-healing and self-optimising workflows

Cost, Performance & Reliability

  • Build tooling for cost visibility, forecasting, and optimization
  • Define SLOs, alerting, and performance tuning practices
  • Improve reliability and scalability of AI infrastructure

Ownership & Collaboration

  • Own projects end-to-end (RFCs, architecture, implementation, rollout, production support)
  • Partner with AI teams to drive adoption of platform tooling and standards
  • Produce high-quality documentation and improve developer experience

You’ll Need To Have

  • Demonstrated experience building production software or platform systems
  • Strong engineering fundamentals with distributed systems or backend platforms
  • Experience or strong interest in observability and debugging complex systems
  • Experience or strong interest in AI/ML systems, LLMs, or agent-based architectures
  • Strong ownership mindset and ability to drive ambiguous problems to production
  • Hands-on experience with modern agentic coding tools (e.g., Claude Code, Codex CLI, Cursor) and multi-model workflows
  • Working knowledge of agent architecture internals (context engineering, tool loops, sub-agent orchestration)

We’d Love To See

  • Experience with OpenTelemetry and modern observability ecosystems, including instrumentation, collectors, exporters, and tools like Prometheus, Grafana, and tracing/log systems
  • Experience designing and operating telemetry pipelines, including sampling, retention, cardinality, and cost tradeoffs, as well as integrating observability into CI/CD and developer workflows
  • Familiarity with AI/agent frameworks, including instrumentation of LLM calls, tool usage, workflows, and evaluation signals (quality metrics, benchmarking, regression detection)
  • Experience building cost monitoring, forecasting, and optimization systems for AI workloads
  • Familiarity with cloud and infrastructure tooling (e.g., AWS, Azure, Kubernetes, Terraform)
  • Experience with agentic infrastructure concepts such as MCP servers, hooks, skills, subagents, sandboxing, and persistent memory patterns
  • Active engagement with the agentic engineering frontier, including emerging patterns (e.g., harness vs. model, review debt, feedback loops)
  • Demonstrated agent-native development practices (iterating with agents using testing, verification, and feedback loops)
  • Strong security awareness for autonomous systems, including sandboxing, prompt injection risks, credential exposure, and guardrails

If indicated, please note that years of experience are a guide; we will consider applications from all candidates who can demonstrate the skills necessary for the role.

Discover what makes Bloomberg unique - watch our for an inside look at our culture, values, and the people behind our success.