Checkmate's Profile Image

Site Reliability Engineer

Company: Checkmate

Job Location: India

Job Type: FULL_TIME - (ON_SITE)

Date Posted: April 06, 2025

External

Apply Now

● Ensure the reliability and availability of production systems and services by monitoring, troubleshooting, and responding to incidents.

● Develop and maintain tools and automation for system monitoring, alerting, and incident response to minimize manual intervention.

● Collaborate with development teams to plan for capacity scaling and performance improvements based on usage patterns and growth forecasts.

● Collaborate with development and product teams to ensure that new features and services are designed with reliability in mind.

● Maintain documentation for operational processes, system configurations, and best practices.

● Bachelor's degree in computer science, information technology, or a related field (or equivalent work experience).

● Proven experience in software development and/or system administration.

● Strong scripting and coding skills (e.g., Python, Go, Shell) for automation and tool development.

● Familiarity with containerization and orchestration technologies like Docker and Kubernetes.

● Experience with cloud platforms (e.g., AWS, Azure, GCP) and infrastructure as code tools (e.g., Terraform).

● Proficiency in monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).

● Knowledge of network, security, and database concepts.

● Strong problem-solving skills and the ability to work well under pressure.

● Understanding of agile and DevOps methodologies.

● Excellent communication and collaboration skills.

● Availability to work during US hours till 3 pm ET is essential for this role.

● Candidates must have their own system/work setup for remote work.

Success is not measured by what you accomplish, but by the opposition you have encountered, and the courage with which you have maintained the struggle against overwhelming odds.

“Orison Swett Marden”
Apply Now