Job Details
Principal Site Reliability Engineer
- ID
- 15460
- Location
- Dublin, Ireland
- Role Type
- Permanent
Principal Site Reliability Engineer
Principal TechOps Engineer – SRE
Overview
We are seeking a Principal TechOps Engineer (SRE) to play a key role in designing, building, and operating highly available cloud infrastructure. This position involves close collaboration with engineering teams to drive initiatives from concept through to production.
You will work within a modern, multi-region Kubernetes environment (AWS EKS) supporting mission-critical workloads, helping to shape infrastructure strategy and improve reliability, scalability, and automation across the platform.
This is a high-impact opportunity to influence cloud architecture, deployment practices, and operational excellence in a fast-paced, collaborative environment.
Key Responsibilities
- Partner with engineering teams to deliver infrastructure and platform initiatives end-to-end
- Design and operate highly available, secure, and scalable cloud-native systems
- Manage and optimize Kubernetes environments (AWS EKS) across multiple regions and availability zones
- Lead efforts in infrastructure automation and infrastructure-as-code (IaC)
- Build and maintain CI/CD pipelines and deployment frameworks
- Define and implement monitoring, logging, and alerting strategies
- Drive adoption of DevOps best practices and automation-first mindset
- Provide technical leadership and mentorship to SRE / Cloud Engineering teams
- Collaborate cross-functionally with product, engineering, and risk stakeholders
- Champion reliability, performance, and operational excellence across all systems
Required Skills & Experience
- 5+ years of hands-on experience with AWS in production environments
- Strong experience with Docker and containerized workloads
- Proven experience running and managing Kubernetes workloads (preferably AWS EKS)
- Experience deploying and managing Kubernetes clusters
- Hands-on experience with CI/CD tools (Jenkins preferred)
- Experience creating and managing Helm charts and libraries
- Strong knowledge of monitoring and observability tools (e.g., CloudWatch, Datadog, Splunk)
- Solid experience with UNIX/Linux systems and shell scripting
- Experience working in large-scale AWS environments (multi-account, IAM, SSO)
- Strong communication skills with the ability to engage across all levels
- Ability to work independently and take ownership of initiatives
Preferred Experience
- Infrastructure-as-code experience (Terraform preferred)
- Programming experience (Python preferred)
- Experience with Git or other distributed version control systems
- Experience with Kafka / Confluent Kafka
- Familiarity with agile methodologies (Kanban preferred)
- Experience with CDN providers (e.g., Akamai)
Desirable Traits
- Strong automation mindset – sees problems as opportunities to improve processes
- Proven leadership experience within SRE / Cloud Engineering teams
- Passion for building resilient, scalable systems
- Ability to thrive in a fast-moving, evolving environment
Team & Environment
You will join a highly skilled Technical Operations team focused on cloud transformation, reliability engineering, and scalable infrastructure.
The team operates with a strong DevOps culture, emphasizing:
- Infrastructure-as-code
- Automation and continuous delivery
- Security and resilience
- High availability and system reliability
Similar Jobs
Search Jobs
Match my CV
We take the hard work out of finding you a new job. Simply upload your CV (or call us) and we’ll get hunting for you!