AWS Devops / SRE Engineer

Design, implement, and manage scalable, secure, and highly available AWS infrastructure using services like EC2, S3, RDS, Lambda, and VPC.
Build and maintain CI/CD pipelines to automate the deployment process using tools like Jenkins, GitLab CI, or AWS CodePipeline.
Monitor and manage cloud infrastructure to ensure 24×7 availability and reliability, using tools such as CloudWatch, Prometheus, Grafana, etc.
Collaborate with cross-functional teams to enhance application performance, security, and resilience in production.
Troubleshoot and resolve complex infrastructure and application issues related to AWS, Linux/Unix systems, and networking.
Implement Infrastructure as Code (IaC) using Terraform, CloudFormation, or similar tools to automate provisioning and configuration.
Establish SRE practices, including Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
Conduct post-mortem analyses on outages and incidents and take corrective actions to prevent recurrence.
Implement automation for repetitive tasks and reduce manual intervention using scripting languages (Python, Shell, etc.).
Ensure compliance with security and governance standards in cloud environments and collaborate with security teams on best practices.
Participate in on-call rotation to support incident resolution in production environments.

4-8 years of experience in DevOps and Site Reliability Engineering (SRE) with a focus on AWS cloud infrastructure.
Strong expertise in AWS services (EC2, S3, RDS, Lambda, CloudFormation, CloudWatch, etc.).
Proficiency in Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
Experience with CI/CD pipeline automation using Jenkins, GitLab CI, AWS CodePipeline, or similar tools.
Strong knowledge of containerization and orchestration tools such as Docker and Kubernetes.
Experience in monitoring and observability tools such as Prometheus, Grafana, Nagios, or AWS CloudWatch.
Expertise in Linux/Unix system administration, with knowledge of networking, storage, and virtualized environments.
Familiarity with scripting languages like Python, Shell, or Bash for automation tasks.
Strong problem-solving and troubleshooting skills with a focus on incident resolution and root cause analysis.
Hands-on experience with security best practices in cloud environments (IAM, Security Groups, NACLs, Encryption, etc.).
Familiarity with Agile methodologies and version control systems like Git.
Excellent communication skills, both verbal and written, with the ability to collaborate across team.

Refer Now