Senior Data Software Engineer (Python & PySpark) - Vice President
Citi
Software Engineering
Singapore
The Senior Data Software Engineer is a senior level position responsible for establishing and implementing new or revised application systems and programs in coordination with the Technology team. The overall objective of this role is to lead applications systems analysis and programming activities.
Responsibilities
Partner with multiple management teams to ensure appropriate integration of functions to meet goals as well as identify and define necessary system enhancements to deploy new products and process improvements
Resolve variety of high impact problems/projects through in-depth evaluation of complex business processes, system processes, and industry standards
Provide expertise in area and advanced knowledge of applications programming and ensure application design adheres to the overall architecture blueprint
Utilize advanced knowledge of system flow and develop standards for coding, testing, debugging, and implementation
Develop comprehensive knowledge of how areas of business, such as architecture and infrastructure, integrate to accomplish business goals
Provide in-depth analysis with interpretive thinking to define issues and develop innovative solutions
Serve as advisor or coach to mid-level developers and analysts, allocating work as necessary
Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency.
Requirements
Bachelor's or Master's degree in Computer Science, Engineering, or a related quantitative field.
7+ years of experience in data engineering, with a strong focus on Python and big data technologies.
Proven expertise in designing and implementing large-scale data processing solutions using PySpark.
Extensive experience with distributed computing frameworks like Apache Spark.
Strong understanding of data warehousing concepts, dimensional modeling, and ETL/ELT principles.
Proficiency in SQL and experience with various relational and NoSQL databases.
Experience with cloud platforms (AWS, Azure, GCP) and their data services (e.g., S3, ADLS, Google Cloud Storage, Redshift, Snowflake, BigQuery, Databricks).
Familiarity with workflow orchestration tools (e.g., Apache Airflow, Azure Data Factory, AWS Step Functions).
Experience with version control systems (e.g., Git).
Excellent problem-solving, analytical, and communication skills.
Preferred
Experience with streaming data technologies (e.g., Kafka, Kinesis).
Familiarity with containerization technologies (Docker, Kubernetes).
Knowledge of data governance, data lineage, and metadata management tools.
Experience with CI/CD pipelines for data solutions.
Understanding of machine learning concepts and MLOps principles.
Certifications in cloud data engineering (e.g., AWS Certified Data Analytics, Google Cloud Professional Data Engineer).
Technical Skills
Programming Languages
Python: Advanced proficiency, including data manipulation libraries (Pandas, NumPy) and object-oriented programming.
PySpark: Expert-level knowledge for data processing, transformations, and performance tuning on Spark.
SQL: Advanced proficiency for complex queries, database design, and optimization.
Big Data Frameworks
Apache Spark (PySpark)
Data Warehousing & Databases
Data Modeling (Star Schema, Snowflake Schema)
ETL/ELT Methodologies
Relational Databases (e.g., PostgreSQL, MySQL, SQL Server, Oracle)
Cloud Data Warehouses (e.g., Snowflake, Amazon Redshift, Google BigQuery)
NoSQL Databases (e.g., MongoDB, Cassandra, DynamoDB)
Cloud Platforms
AWS: S3, EMR, Glue, Lambda, Redshift, Athena, Kinesis
Azure: Data Lake Storage, Databricks, Synapse Analytics, Azure Data Factory, Event Hubs
GCP: Cloud Storage, Dataproc, BigQuery, Dataflow, Pub/Sub
Orchestration & Automation
Apache Airflow
Azure Data Factory
AWS Step Functions
Other Tools & Concepts
Version Control (Git, GitHub, GitLab, Bitbucket)
Containerization (Docker, Kubernetes)
CI/CD Principles
Data Governance & Security
Performance Optimization & Tuning
#LI-Hybrid
------------------------------------------------------
Job Family Group:
Technology------------------------------------------------------
Job Family:
Applications Development------------------------------------------------------
Time Type:
Full time------------------------------------------------------
Most Relevant Skills
Please see the requirements listed above.------------------------------------------------------
Other Relevant Skills
For complementary skills, please see above and/or contact the recruiter.------------------------------------------------------
Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.
If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.