About the Role:
We are currently seeking a mid-level Data Engineer to join our team to develop and maintain scalable data pipelines to support growing data volume and complexity. This position involves close interaction with cross-functional teams to improve data models that feed business intelligence tools, increasing data accessibility and fostering data-driven decision making across the organization.
Responsibilities:
Data Pipeline Development and Maintenance:
• Design, build, and maintain data pipelines and ETL processes on AWS to support analytics and machine learning initiatives.
• Use AWS services such as RDS, Glue, Lambda, S3 to manage and process large data sets.
Machine Learning infrastructure:
• Enable model training, testing, and deployment using SageMaker and other AWS ML services.
• Design and build scalable infrastructure for deploying machine learning models into production, incorporating best practices for monitoring, logging, and retraining models.
• Implement and automate workflows that support the end-to-end ML lifecycle, including data preprocessing, feature engineering, and model evaluation.
DevOps and Automation
• Incorporate DevOps best practices into data engineering workflows, focusing on automation, CI/CD, and infrastructure as code (IaC) for repeatable, scalable solutions.
• Develop and maintain IaC templates (e.g. Terraform, CloudFormation) to provision and manage AWS resources for data engineering and ML tasks.
• Build monitoring, logging, and alerting systems to ensure data pipeline and model uptime, performance, and data quality.
Qualifications:
• You’ll typically have a degree in a computer science, mathematical or science-based subject.
Knowledge, Skills and Abilities Required:
• Must hold hands-on experience with ETL process.
• Cloud Expertise: Strong experience with AWS, including services like S3, Glue, Lambda, and SageMaker.
• Programming: Advanced proficiency in Python, including experience with data processing libraries (e.g. Pandas) and automation.
• Machine Learning: Familiarity with machine learning pipelines, especially using SageMaker for model training, deployment, and monitoring.
• DevOps: Experience with DevOps practices, including CI/CD pipelines, Infrastructure as Code (Terraform, CloudFormation), and monitoring (CloudWatch).
• Data Engineering: Solid understanding of data warehousing, ETL/ELT processes.
• Collaboration: Strong communication skills and experience working in cross-functional Agile teams.
• Excellent analytical and problem-solving skills.
• Effective listening skills to understand the requirements of the business.
• Planning, time management and organisational skills.
• The ability to deliver under pressure and to tight deadlines.