Site Reliability Engineer Job at Interswitch Group - NewBalancejobs

Site Reliability Engineer Job at Interswitch Group

Interswitch Group

Site Reliability Engineer Job at Interswitch Group. Please ensure you read the job requirements before applying for this position.

Interswitch is an Africa-focused integrated digital payments and commerce company that facilitates the electronic circulation of money as well as the exchange of value between individuals and organisations on a timely and consistent basis. We started operations in 2002 as a transaction switching and electronic payments processing, and have progressively evolved into an integrated payment services company, building and managing payment infrastructure as well as delivering innovative payment products and transactional services throughout the African continent. At Interswitch, we offer unique career opportunities for individuals capable of playing key roles and adding value in an innovative and fun environment.

We are recruiting to fill the position below:

Job Title: Site Reliability Engineer

Location: Lagos
Employment Type: Permanent
Department: Technology

Job Description

  • Manage Availability and Capacity on the Core Applications. Provide support for the Applications and ensure their optimal performance. Implement setup of new Applications in the company’s environment.

Duties and Responsibilities

  • Deployment of Applications
  • Support the deployment of Applications on the production environment
  • Implement projects involving Setup and deployment of new Applications and enhancement of existing applications
  • Automation
  • Implement Automations of Activities that are involved in the management of Applications.
  • Application Environment Management
  • Ensure 24×7 Availability of all Core Applications
  • Carry out Capacity planning to ensure Applications are always available to meet demands.
  • Create visibility into site health and key performance indicators of the Application Systems
  • Ensure up-to date patching and full compliance to security standards of the Application Systems.
  • Ensure up-to date documentation on all Core Applications as well as changes made
  • Balance feature development speed and reliability with well-defined Service Level Objectives (SLO) and Service Level Indicators (SLI)
  • Monitor Systems
  • Monitor the performance, health, and capacity of:
    • Servers
    • Databases
    • Services
    • Storage
    • Network Links
  • Use a variety of monitoring tools like Nagios, Solarwinds, Kibana, PagerDuty, AppDynamics, etc.
  • Troubleshooting.
  • Troubleshoot reported issues, and proactively identify areas in need of optimization
  • Working with technical support engineers to resolve critical incidents
  • Create and update clear troubleshooting guides for Applications
  • Requests Fulfilment.
  • Implement Requests relevant to the operation and enhancement of the Core Processing Applications.

Qualifications

  • Academic Qualification(s) – Good First Degree in Computer Science / Computer Engineering or other related fields
  • Professional Qualification(s) – Service Management Certifications (eg ITIL) is an advantage.
  • Experience (Number of relevant years) – Minimum of (1) year relevant experience.

Requirements:

  • Expertise in Linux and Windows Operating systems and Shell scripting
  • Technical experience working with cloud technologies
  • Build and Deployment Management (Jenkins) in a CI/CD workflow
  • Experience with Chef, Puppet or Ansible, automating all aspects of system and server management
  • Good understanding of distributed systems and container technologies like Docker/Kubernetes container infrastructure and orchestration
  • Good understanding of SLO and SLI for Applications
  • Experience with DNS, Networking and High Availability solutions
  • Proficient in at least one of the following languages: Python, Ruby, Go Ability to work across teams to continuously analyze system performance in production, troubleshoot reported issues, and proactively identify areas in need of optimization
  • Previous experience with developing and driving real time monitoring solutions that provide visibility into site health and key performance indicators
  • Working knowledge of databases
  • Working understanding of Load balancing technologies.
  • Working understanding of IT service management (Incident, Problem, Change and Knowledge management).
  • Ability to work within a technical team of support engineers through day-to-day operations and critical incidents.

Application Closing Date
8th September, 2022.

Method of Application

Interested and qualified candidates should:
Click here to apply online