Lead Software Cloud Observability Engineer
Wells Fargo
About this role:
Wells Fargo is seeking a...
In this role, you will:
- Own the architecture, reliability, and scalability of enterprise logging platforms.
- Lead design and implementation of high‑volume, resilient log ingestion pipelines across hybrid and cloud environments.
- Define and enforce logging standards, schemas, and governance aligned with enterprise observability strategy.
- Design and integrate AI/ML models for anomaly detection, log classification, predictive alerting, and signal enrichment.
- Build and operationalize agentic AI systems capable of:
- Autonomous log analysis and root‑cause hypothesis generation
- Context‑aware remediation recommendations
- Intelligent correlation across logs, metrics, and traces
- Partner with platform and SRE teams to embed AI‑driven insights into incident response workflows.
- Develop self‑service onboarding, configuration, and compliance automation for logging consumers.
- Enable OpenTelemetry‑aligned ingestion patterns and standardized integrations.
- Drive automation to reduce manual toil and improve MTTR across application and infrastructure observability.
- Ensure platform availability, performance, and data quality through proactive monitoring and SLI/SLO ownership.
- Lead production issue resolution, RCA analysis, and continuous improvement initiatives.
- Partner with security and compliance teams to support auditability, retention, and access controls.
Required Qualifications:
- 10+ years of experience in software engineering or platform engineering, with at least 3+ years in a lead role.
- Deep hands‑on expertise with Splunk (search, data models, dashboards, alerts, ES, APIs, ingestion patterns).
- Strong experience designing distributed, high‑throughput data platforms.
- Proven experience applying machine learning to operational data (logs, metrics, events).
- Hands‑on experience with agentic AI frameworks or autonomous agents (LLM‑based or rule‑driven).
- Strong understanding of prompt engineering, tool‑using agents, feedback loops, and guardrails.
- Proficiency in one or more languages: Python, Java, Go, or Scala.
- Experience with cloud platforms, containerization, and Kubernetes/OpenShift.
- Familiarity with OpenTelemetry, observability standards, and telemetry correlation.
Desired Qualifications:
- Worked on a large Splunk infrastructure, including clustered environments, multi-site deployments, and cloud/SAAS deployment.
- Exposure to containerization and orchestration tools (Docker, Kubernetes).
- Familiarity with DevOps practices and CI/CD pipelines.
- Certifications in Splunk, Cribl, or cloud technologies (AWS, Azure).
- Experience applying AI/ML techniques to operational or telemetry data.
Job Expectations:
- Develop complex dashboards, reports, and alerts tailored to business and operational needs.
- Develop migration strategies, including data ingestion, configuration, and app compatibility assessments.
- Design data pipelines to optimize Splunk ingestion, reduce licensing costs, and improve system performance.
- Proactively identify and resolve bottlenecks in ingestion, indexing, and search processes.
Posting End Date:
10 Apr 2026*Job posting may come down early due to volume of applicants.
We Value Equal Opportunity
Wells Fargo is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other legally protected characteristic.
Employees support our focus on building strong customer relationships balanced with a strong risk mitigating and compliance-driven culture which firmly establishes those disciplines as critical to the success of our customers and company. They are accountable for execution of all applicable risk programs (Credit, Market, Financial Crimes, Operational, Regulatory Compliance), which includes effectively following and adhering to applicable Wells Fargo policies and procedures, appropriately fulfilling risk and compliance obligations, timely and effective escalation and remediation of issues, and making sound risk decisions. There is emphasis on proactive monitoring, governance, risk identification and escalation, as well as making sound risk decisions commensurate with the business unit’s risk appetite and all risk and compliance program requirements.
Candidates applying to job openings posted in Canada: Applications for employment are encouraged from all qualified candidates, including women, persons with disabilities, aboriginal peoples and visible minorities. Accommodation for applicants with disabilities is available upon request in connection with the recruitment process.
Applicants with Disabilities
To request a medical accommodation during the application or interview process, visit Disability Inclusion at Wells Fargo.
Drug and Alcohol Policy
Wells Fargo maintains a drug free workplace. Please see our Drug and Alcohol Policy to learn more.
Wells Fargo Recruitment and Hiring Requirements:
a. Third-Party recordings are prohibited unless authorized by Wells Fargo.
b. Wells Fargo requires you to directly represent your own experiences during the recruiting and hiring process.