Site Reliability Engineer

October 12, 2023

Job Description

Location – Bengaluru, Karnataka

Responsibilities

•Design and implement the lifecycle of services from conception to inception, including: system design, build, and deployment • Develop software solutions to enable operability of large scale distributed systems capable of handling millions of transactions and petabytes of data • Manage capacity and performance to help scale the infrastructure both on public and private clouds around the world • Define and implement standards and best practices related to: System Architecture, Deployment, metrics, operational tasks • Support services through activities such as monitoring availability, system health, and incident response • Improve system performance, application delivery and efficiency through automation, process refinement, post mortem reviews, and in-depth configuration analysis • Engage in Communications across all areas of the organization • Troubleshooting and monitoring production systems to ensure the highest uptimes are maintained • Support and improve upon existing high-availability architecture solutions as well as manage the operational activity.

Educational Requirements

Bachelor of Engineering

Service Line Quality

Additional Responsibilities:

Must have :

• Experience in one or more high-level programming languages like Python or Ruby or GoLang and familiar with Object Oriented Programming.

• Proficient with designing, deploying, and managing distributed systems and service-oriented architectures • Design and implement the CI/CD/CT pipeline on one or more tool stacks, like Jenkins, Bamboo, Azure DevOps, and AWS Code pipeline with hands-on experience in common DevOps tools (Jenkins, Sonar, Maven, Git, Nexus, and UCD, etc.)

• Experience in deploying, managing and monitoring applications and services on one or more Cloud and on-premises infrastructures like AWS, Azure, OpenStack, Cloud Foundry, Open shift, etc.

• Proficiency in one or more Infrastructures as code tools (e.g. Terraform, Cloud Formation, Azure ARM, etc)

• Developing, and managing monitoring tools and log analysis tools to manage operations with exposure to tools such as App Dynamics, Data Dog, Splunk, Kibana, Prometheus, Grafana Elasticsearch, etc.

• Proven ability to maintain enterprise-scale production software with the knowledge of heterogeneous system landscapes (e.g. Linux, Windows)

• Expertise in analyzing and troubleshooting large-scale distributed systems and Micro Services with experience with Unix/Linux operating systems internals and administration (e.g., file systems, inodes, system calls) and networking (e.g., TCP/IP, routing, network topologies).

Preferred Skills: DevOps