Job Category: Tech Jobs
Hiring For: Tech Mahindra
Job Location: Hyderabad
Referral Bonus: 20000
Experience: 4 Years
Salary Upto: 2000000
Key Responsibilities:
- Design, implement, and manage scalable, secure, and highly available AWS infrastructure using services like EC2, S3, RDS, Lambda, and VPC.
- Build and maintain CI/CD pipelines to automate the deployment process using tools like Jenkins, GitLab CI, or AWS CodePipeline.
- Monitor and manage cloud infrastructure to ensure 24×7 availability and reliability, using tools such as CloudWatch, Prometheus, Grafana, etc.
- Collaborate with cross-functional teams to enhance application performance, security, and resilience in production.
- Troubleshoot and resolve complex infrastructure and application issues related to AWS, Linux/Unix systems, and networking.
- Implement Infrastructure as Code (IaC) using Terraform, CloudFormation, or similar tools to automate provisioning and configuration.
- Establish SRE practices, including Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
- Conduct post-mortem analyses on outages and incidents and take corrective actions to prevent recurrence.
- Implement automation for repetitive tasks and reduce manual intervention using scripting languages (Python, Shell, etc.).
- Ensure compliance with security and governance standards in cloud environments and collaborate with security teams on best practices.
- Participate in on-call rotation to support incident resolution in production environments.
Required Skills & Experience:
- 4-8 years of experience in DevOps and Site Reliability Engineering (SRE) with a focus on AWS cloud infrastructure.
- Strong expertise in AWS services (EC2, S3, RDS, Lambda, CloudFormation, CloudWatch, etc.).
- Proficiency in Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
- Experience with CI/CD pipeline automation using Jenkins, GitLab CI, AWS CodePipeline, or similar tools.
- Strong knowledge of containerization and orchestration tools such as Docker and Kubernetes.
- Experience in monitoring and observability tools such as Prometheus, Grafana, Nagios, or AWS CloudWatch.
- Expertise in Linux/Unix system administration, with knowledge of networking, storage, and virtualized environments.
- Familiarity with scripting languages like Python, Shell, or Bash for automation tasks.
- Strong problem-solving and troubleshooting skills with a focus on incident resolution and root cause analysis.
- Hands-on experience with security best practices in cloud environments (IAM, Security Groups, NACLs, Encryption, etc.).
- Familiarity with Agile methodologies and version control systems like Git.
- Excellent communication skills, both verbal and written, with the ability to collaborate across team.