Who are we?

Sporty's sites are some of the most popular on the internet, consistently staying in Alexa's list of top websites for the countries they operate in

In addition to our DevOps Team we are building a Site Reliability Team whose purpose is to focus on site reliability and security. It will also involved deployment, configuration, and monitoring, as well as the availability, latency, change management, emergency response, and capacity management of services in production.

Responsibilities

Work with a team of DevOps/SRE and DBA professionals

Improve existing infrastructure and processes in the 6 countries we're currently deployed in as well as streamlining processes deploy to new countries in the future

Holistically improve all aspects of our current infrastructure including: reducing costs; streamlining environment provisioning; lowering response times and incorporating the latest techniques and technologies

Monitor and maintain the existing cloud infrastructure via autoscaling, automated alerts, andOpsWork and Grafana dashboards

Take ownership and responsibility for our cloud operation activities

Liaise with external security agencies for annual audits as well as perform our own internal security sweeps

Aid in reconfiguring existing architecture to allow for rapid deployments to new countries

Mentoring less experienced team members

Requirements

3+ years SRE experience
Experience independently leading the planning and deployment of a project
Experienced with cloud platforms, especially AWS, including solid knowledge of how to utilize cloud resources to fulfill the demand from other teams and production
Familiar with one program language or script language (Python, Java....)
Experience managing multiple kubernetes clusters in production (virtualization, orchestration, scalability, security, and high availability), skillset such as Helm, Rancher, ArgoCD.
Solid networking protocol and cyber security knowledge, especially the TCP / IP stack and HTTP protocol
A strong understanding of cache, including CDN, HTTP cache (CloudFlare, AWS CloudFront)
Experienced with CloudNative Monitoring solution in Large distributed system using observation model(Trace, Metric, Logging), skillset such as Prometheus, Jaeger, Loki, ELK, Grafana.
Excellent troubleshooting skills, including Linux OS issue diagnosis and OS parameter optimization

Beneficial

Experience working with other cloud platform is a plus. (GCP, Azure, AliCloud)
Familiar with at least one of infrastructure as Code (Terraform, Cloudformation)
Design and implement CI/CD workflow is a plus (Jenkins, Github Action)
Experience with system automation tools (Ansible, Salt, Chef)
Understanding of modern Micro Services and Service Mesh concepts is a plus(Containers, Istio)

Interview Process

HackerRank Test
Remote video screening + ID check with our Talent Acquisition Team
Remote 90 Minute Video Interview with 3x Team Members (30 Minutes Each)
24-72 hour feedback loops throughout process

Benefits

Quarterly and flash bonuses

Flexible working hours

Top-of-the-line equipment

Referral bonuses

28 days paid annual leave

Annual company retreat - we all went to Dubai in 2022 and are planning 2 more retreats for 2023!

Highly talented, dependable co-workers in a global, multicultural organisation

Payment via DEEL, a world class online wallet system

We score 100% on The Joel Test

Our teams are small enough for you to be impactful

Our business is globally established and successful, offering stability and security to our Team Members

Please mention that you found the job on Real Work From Anywhere, this helps us grow. Thanks.

Fully Remote Work From Anywhere Jobs

Site Reliability Engineer

SportyBet

About the job

Posted on

Apply before

Job type

Category

Region

Tags

Share this job

Similar Jobs