This role is for one of the Weekday's clients
We are seeking a Data Engineer with expertise in Python, PySpark, SQL, and Azure to design, develop, and optimize scalable data pipelines and architectures. You will play a crucial role in processing large datasets, ensuring data quality, and enabling analytics teams to drive business decisions. If you are passionate about data engineering and cloud-based solutions, this role is for you.
Key Responsibilities
- Data Pipeline Development: Design, build, and maintain scalable ETL/ELT pipelines using Python and PySpark.
- Big Data Processing: Work with large datasets and optimize processing using distributed computing frameworks.
- SQL & Database Management: Write efficient SQL queries to extract, transform, and load data across various relational and non-relational databases.
- Azure Data Engineering: Leverage Azure Data Factory, Azure Databricks, Azure Synapse, and Azure Storage to manage and optimize data workflows in the cloud.
- Performance Optimization: Improve the efficiency and performance of data pipelines by tuning queries, caching, and leveraging big data best practices.
- Data Quality & Governance: Implement data validation, cleansing, and integrity checks to ensure high-quality data is available for business use.
- Collaboration: Work closely with data scientists, analysts, and software engineers to support data-driven decision-making and real-time analytics.
- Automation & Monitoring: Develop automated data ingestion and processing workflows, and set up monitoring and alerting mechanisms for data pipelines.
Required Skills & Qualifications
- 3+ years of experience in data engineering or a related field.
- Strong programming skills in Python and experience with PySpark for big data processing.
- Expertise in SQL for data querying, optimization, and transformation.
- Hands-on experience with Azure cloud services like Azure Data Factory, Azure Databricks, Azure Synapse Analytics, and Azure Storage.
- Experience with data modeling, ETL processes, and data warehouse concepts.
- Familiarity with orchestration tools such as Apache Airflow or Azure-native alternatives.
- Knowledge of distributed computing frameworks and big data technologies.
- Strong problem-solving skills and ability to optimize data workflows for efficiency and performance.
- Excellent communication and teamwork skills, with a collaborative mindset.
Preferred Qualifications
- Experience with containerization and orchestration (Docker, Kubernetes).
- Understanding of data security, governance, and compliance in a cloud environment.
- Familiarity with CI/CD pipelines and infrastructure-as-code (Terraform, ARM templates).
- Exposure to real-time data processing using Kafka or similar streaming platforms.
Success seems to be connected with action. Successful people keep moving.
“Conrad Hilton”