Job Description:

We are looking for a skilled Data Engineer with expertise in Python, Pyspark, and AWS to join our team. The ideal candidate should have a strong foundation in working with data structures, data transformations, and cloud technologies. You will be responsible for developing and optimizing data pipelines, managing large datasets, and working with various AWS services.

Key Responsibilities:

Python Programming:
- Proficient in data structures (lists, dictionaries).
- Skilled in list comprehensions, loops (while, for), and control flow (if/else).
- Ability to modularize code by creating reusable functions.
- Debugging and optimizing Python code for data handling.
Pyspark:
- Implement narrow and wide transformations on large datasets.
- Experience with user-defined functions (UDFs) using multiple parameters.
- Strong knowledge of Pyspark optimization techniques (bucketing, partitioning).
- Ability to optimize performance for large-scale data processing jobs.
SQL:
- Expertise in joins (left, right, inner, outer).
- Proficient in pattern matching and using regular expressions in SQL queries.
- Writing complex SQL queries to extract, transform, and analyze data.
AWS:
- Hands-on experience with AWS Lambda for serverless architecture.
- Proficient with AWS Glue for ETL (Extract, Transform, Load) processes.
- Ability to schedule AWS Lambda and Glue jobs based on specific events.
- Familiarity with other AWS services related to data processing and storage.
Good-to-have Skills:
- Experience with Shell Scripting for automating tasks.
- Exposure to Exploratory Data Analysis (EDA) and some data analysis skills.

Python Developer with Pyspark + AWS

Ready to kickstart your career journey? Let's build something incredible together!"

Services

Company

Newsletter

Refer Now