Site Reliability Engineer
Casumo is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.
Join us at Casumo, where you are invited to be your authentic YOU-MO!

Company Overview:
Welcome to Casumo, your passport to a world of fun, excitement, and responsible gaming. We're an international online casino company with a knack for creating unforgettable gaming experiences. Our secret sauce? A blend of innovation, security, and a dash of playful charm.
Nowadays, we're on the hunt for a curious and problem-solving oriented Site Reliability Engineer!
Position Overview:
As a Site Reliability Engineer, you’ll play a key role in ensuring the reliability, scalability, and performance of our production systems. You’ll work closely with engineering teams to build resilient infrastructure, improve observability, and drive operational excellence.
Responsibilities:
Operate, scale, and continuously improve our production Kubernetes clusters on Google Cloud Platform (GCP).
Manage and provision cloud infrastructure using Infrastructure as Code (Terraform).
Maintain and optimise critical messaging and event-streaming systems (RabbitMQ, Kafka).
Manage edge networking, traffic routing, and security using Cloudflare.
Improve CI/CD pipelines to enable safe, fast, and reliable deployments.
Partner with development teams to optimise Java services (JVM tuning, connection pooling, container resource allocation).
Manage and troubleshoot logging and observability tools (e.g. Elasticsearch, Kibana).
Support and advise on high-availability data stores such as MySQL and ClickHouse.
Reliability & Incident Management
Lead incident response as Incident Commander during major production events.
Coordinate cross-functional teams and communicate effectively with both technical and non-technical stakeholders.
Conduct blameless postmortems and drive improvements to prevent recurring issues.
Design and execute load testing strategies to validate system performance under peak conditions.
Define and maintain Service Level Objectives (SLOs) and Service Level Indicators (SLIs).
Improve monitoring and alerting using Prometheus and Grafana, reducing noise and improving MTTR.
Requirements:
We’re looking for a hands-on engineer who thrives in modern cloud environments and is passionate about reliability, automation, and developer experience.
3+ years of experience in Site Reliability Engineering, DevOps, or similar roles.
Strong hands-on experience with Kubernetes in production environments.
Solid experience with GCP (or another major cloud provider).
Proficiency with Infrastructure as Code tools (Terraform preferred).
Experience with messaging systems or event streaming platforms (RabbitMQ, Kafka).
Strong troubleshooting skills across infrastructure, networking, and application layers.
Experience handling production incidents and conducting postmortems.
Scripting and automation skills (e.g. Bash, Python).
Strong Linux systems knowledge.
Preferred Experience
Experience running MySQL or ClickHouse in production (HA, replication, failover, backups).
Experience in high-scale environments (e.g. iGaming, fintech, SaaS).
Experience implementing and evolving SLO/SLI frameworks.
Think we're a good match? Apply now!
The Perks (Malta Office)
Being a part of the Casumo group provides an unparalleled experience. You’ll find yourself surrounded by the brightest minds within the most inspiring and collaborative office spaces! In addition to that, you’ll enjoy:
Private health insurance
Wellness incentives, including a fitness allowance and mental well-being services
Flexible national holidays: public holidays mean more time off, choose how and when to enjoy them!
2 weeks Work From Anywhere (10 days), increased to 4 weeks (20 days) after longer duration of employment within the Company: explore the world while working remotely
Gourmet lunches and healthy snacks prepared by our in-house chef
Variety of discounts from local vendors
Access to some of the greatest tools and platforms for developing your professional skills and building success within your role
A range of training courses, known as Casumo College, for continuous learning and growth
Social events for building strong relationships with colleagues from all across the organisation
Our ABC values:
ASPIRE
At Casumo, "aspire" means pushing beyond the ordinary and transforming obstacles into stepping stones. Challenges are our breakfast of champions, and comfort zones are out of bounds. Mediocrity? Left behind. Our mantra? Dream big, aim high, and always be ready for the next adventure in innovation.
BELIEVE
Belief at Casumo isn't just a feel-good sticker; it's the glue that binds us. Turning "me" achievements into "we" victories, we're a tight-knit crew of dreamers, doers, and relentless supporters. With a high-five arsenal and a trusty cheerleading squad, we're on a mission to prove that together, we're not just strong; we're Casumo strong.
CARE
Care is our secret ingredient, the cherry on top of our game. It's not only about ensuring our players have a blast (responsibly, of course); it's about weaving a fabric of support so tight, even the toughest challenges can't tear us apart. From tailoring player experiences to being there for each other, we're all about creating memorable moments.
- Department
- Tech
- Locations
- Swieqi, Malta
- Remote status
- Hybrid
- Employment type
- Full-time