Senior SRE
Viafoura
Toronto, ON, Canada · Remote
Senior SRE
Description
Senior Site Reliability Engineer
About Viafoura
The Role
Required Qualifications
Kubernetes & Container Orchestration
- Amazon EKS (Elastic Kubernetes Service): Demonstrated experience designing, deploying, and managing production EKS clusters from the ground up
- Cluster Architecture: Hands-on experience with end-to-end cluster setup including:
- Designing and implementing cluster networking architecture
- Configuring and managing Container Network Interfaces (CNI)
- Deploying and managing AWS Load Balancer Controller for ingress management
- Implementing External DNS for automated DNS management
- Setting up and maintaining service mesh solutions, particularly Istio, for advanced traffic management, observability, and security
- Strong understanding of Kubernetes security best practices, RBAC, pod security policies, and network policies
- Experience with cluster upgrades, scaling strategies, and disaster recovery procedures
AWS Cloud Infrastructure
- Networking: Deep expertise in AWS networking services including VPC design, subnets, security groups, NACLs, Transit Gateway, VPC peering, and PrivateLink
- EC2: Extensive experience managing EC2 instances, AMI management, Auto Scaling Groups, and instance optimization
- RDS: Production experience with Amazon RDS including database engine selection, Multi-AZ deployments, read replicas, backup strategies, and performance tuning
- Strong understanding of AWS security best practices, IAM policies, and compliance frameworks
- Experience with additional AWS services such as CloudWatch, CloudTrail, S3, and Route 53
Infrastructure as Code (IaC)
- Terraform: Advanced proficiency in writing, testing, and maintaining Terraform modules and configurations
- Terragrunt: Hands-on experience using Terragrunt for managing multiple environments, DRY configurations, and remote state management
- Strong understanding of IaC best practices including state management, module design, version control, and testing strategies
- Experience with infrastructure testing frameworks and validation tools
CI/CD & Automation
- GitHub Actions: Proven experience designing and implementing CI/CD pipelines using GitHub Actions
- Experience with workflow automation, deployment strategies (blue-green, canary), and rollback procedures
- Knowledge of GitOps principles and tooling
- Strong scripting skills (Bash, Python, or similar) for automation
Key Responsibilities
- Design, deploy, and manage production Kubernetes clusters on AWS EKS with focus on reliability, security, and performance
- Architect and maintain AWS infrastructure including networking, compute, and database layers
- Develop and maintain infrastructure-as-code using Terraform and Terragrunt following best practices
- Build and optimize CI/CD pipelines using GitHub Actions for automated testing and deployment
- Implement comprehensive monitoring, logging, and alerting solutions
- Participate in on-call rotation and incident response
- Collaborate with development teams to improve application reliability and performance
- Lead capacity planning and cost optimization initiatives
- Mentor junior engineers and contribute to team knowledge sharing
Preferred Qualifications
- Experience with observability tools (Prometheus, Grafana, Datadog)
- Knowledge of HashiCorp Vault or similar secrets management solutions
- Experience with disaster recovery planning and execution
- Contributions to open-source projects
What We Offer
- Competitive salary
- Comprehensive health benefits
- Professional development opportunities
- Remote-friendly work environment
- Collaborative and innovative team culture
