Senior Site Reliability Engineering (SRE) Developer
Helcim Inc.
Helcim is searching for a Senior DevOps Developer to be responsible for building and maintaining the next generation of payments technology. We're looking for a talented individual with a passion for cloud computing, infrastructure automation, security, enjoys learning about the latest technologies, strong problem-solving skills, and a user-focused approach.
As a member of our team, you will be working on numerous ongoing and new projects. Helcim is a fast-paced, high-demand company and team members must work well under pressure. The ideal candidate will embody our values and culture, and be a steward of The Way of the Helcim (see link below to our culture book).
Responsibilities:
- Implement and manage CI/CD pipelines for automating the build, test, and deployment processes, ensuring safe and efficient release of software updates.
- Collaborate with development teams to ensure smooth integration of code changes into the pipeline.
- Implement and maintain infrastructure as code (IaC) practices using tools like Terraform or Ansible to manage infrastructure changes.
- Automate infrastructure provisioning and configuration to support scalability and reliability.
- Build and maintain Docker containers for applications and services.
- Orchestrate container deployments using Kubernetes and Docker Compose, and other relevant technologies.
- Ensure the reliability and availability of services by proactively identifying and mitigating potential issues and responding to incidents.
- Participate in on-call rotations to respond to critical incidents and minimize downtime.
- Conduct post-incident reviews to identify root causes and prevent future occurrences.
- Collaborate with security teams to ensure the security and compliance of systems and applications.
- Identify and address performance bottlenecks and optimize systems for efficiency and cost-effectiveness.
- Develop and test disaster recovery plans and procedures to minimize data loss and downtime in case of failures.
- Maintain documentation for infrastructure, processes, and procedures to facilitate knowledge sharing and team collaboration.
- Continuously seek opportunities for process improvement and optimization.
- Mentor and provide guidance to junior team members, fostering a culture of knowledge sharing and skill development.
- Collaborate with cross-functional teams to define and implement DevOps best practices and standards.
- Evaluate emerging technologies and tools, providing recommendations for adoption based on the organization's needs.
- Lead and contribute to strategic initiatives, such as optimizing cost structures, implementing advanced monitoring and automation, and driving innovation.
- Take ownership of complex technical issues and coordinate resolutions across teams.
- Define and enforce best practices in areas such as performance, and reliability.
Qualifications:
- Bachelor's or technical degree in computer science or related field.
- Extensive experience (5+ years) in a DevOps/Site Reliability role, demonstrating a track record of successfully leading and implementing complex projects.
- In-depth knowledge of advanced DevOps/SRE concepts, methodologies, and best practices.
- Proven experience in designing and implementing CI/CD pipelines for large-scale, distributed systems.
- Strong leadership skills with the ability to influence and guide team members towards achieving common goals.
- Experience with architectural design and planning for highly available and scalable systems.
- Proficient in conducting root cause analysis and implementing preventive measures.
- Advanced knowledge of cloud computing platforms (e.g. GCP, AWS) and container orchestration technologies (e.g. Docker, Docker Composer, Kubernetes).
- Strong programming and scripting skills (e.g. Golang, Python, Bash).
- Experience with database administration (e.g. MySQL, PostgreSQL).
- Proficient in infrastructure as code (IaC) tools and practices (e.g. Terraform, Ansible).
- Expertise with monitoring and alerting tools (e.g. Elastic, Prometheus, Grafana).
- Excellent problem-solving and troubleshooting skills.
- Effective communication and collaboration skills.
- Commitment to continuous learning and professional development.
What it’s like working at Helcim
At Helcim we build teams of engaged, caring and intelligent people. In return we provide an environment where you’ll be excited to come to work each day and tackle challenges with your colleagues. Learn more about working at Helcim in our culture book The Way of the Helcim.
Our approach to total rewards
As part of our team you’ll receive amazing benefits including salary, paid health benefits, stock options and generous vacation time. You’ll also enjoy the opportunity to recharge and connect with your team members at company social events.
Hybrid work and flexibility
Being together helps everyone learn and grow really fast while keeping us all focused on our mission. This is why we've embraced hybrid work, it allows for the best of remote and in-person interactions while giving us time for heads down focused work and opportunities for collaboration. We know hybrid work is not for everyone, and that's ok. But if you want to combine flexibility and being surrounded by amazing people, this is the place for you.
Helcim uses a hybrid work structure where team members can work 3 days in the Calgary office (Monday/Wednesday/Friday) and 2 days (Tuesday/Thursday) at home.
Join our team
We invest a lot of time and energy imagining and creating a company and culture that encourages discussion, the trade of ideas, and the execution of amazing products and services. We’re friendly and collaborative, working together to achieve big goals. If you want to join our team and feel you can contribute to the growth and success of our company - we want to hear from you!
Candidates must be eligible to work in Canada and be located in Calgary for this position.