Key Responsibilities:
Design, develop, and optimize scalable, high-performance Spark applications using Scala.
Work on mission-critical projects, ensuring high availability, reliability, and performance.
Analyze and optimize Spark jobs for efficient data processing and resource utilization.
Collaborate with cross-functional teams to deliver robust, production-ready solutions.
Troubleshoot and resolve complex issues related to Spark applications and data pipelines.
Integrate Spark applications with Kafka for real-time data streaming and MongoDB for data storage and retrieval.
Follow best practices in coding, testing, and deployment to ensure high-quality deliverables.
Mentor junior team members and provide technical leadership.
Mandatory Skills and Qualifications:
7+ years of hands-on experience in Scala programming and Apache Spark.
Strong expertise in Spark architecture, including RDDs, DataFrames, and Spark SQL.
Proven experience in performance tuning and optimization of Spark applications.
Must have hands-on experience with Spark Streaming for real-time data processing.
Solid understanding of distributed computing and big data processing concepts.
Proficient in Linux with the ability to work in a Linux environment.
Strong knowledge of data structures and algorithms, with a focus on space and time complexity analysis.
Ability to work independently and deliver results in a fast-paced, high-pressure environment.
Excellent problem-solving, debugging, and analytical skills.
Good-to-Have Skills:
Experience with Apache Kafka for real-time data streaming.
Knowledge of MongoDB or other NoSQL databases.
Familiarity with cloud platforms (e.g., AWS, Azure, GCP) and containerization (e.g., Docker, Kubernetes).
Understanding of DevOps practices and CI/CD pipelines.
Interview Focus Areas:
Coding Exercise in Scala: A hands-on coding assessment to evaluate your problem-solving and coding skills.
Spark Integration with Other Technologies: Practical understanding of how Spark integrates with tools like Kafka, MongoDB, etc.
Spark Streaming: Demonstrated experience with real-time data processing using Spark Streaming.
Best Practices and Optimization in Spark: In-depth knowledge of Spark job optimization, resource management, and performance tuning.
Data Structures, Space, and Time Complexity Analysis: Strong grasp of data structures and algorithms, with a focus on optimizing space and time complexity.