Staff Site Reliability Developer @ Vena Solutions

Vena is looking for a Staff SRE to join our SaaS Technology and Operations (STO) team. This role is a match for you if you love building highly scalable, resilient and automated services.

We are an innovative team which aims to provide exceptional customer experience by leveraging best-in-class automation and orchestration practices for Vena's SaaS platform. As a Staff Site Reliability Developer, you will utilize your software and systems engineering background to build and run large-scale, distributed, fault-tolerant systems and services. We strive to hire people who are looking to make an impact and thrive in a flexible work environment driven by business objectives.

Your role is to ensure that our systems - both internally and externally facing-have been designed with maximizing resiliency and uptime. Our team focuses on optimizing existing systems, building infrastructure and reducing toil through automation. Practices such as limiting time spent on manual operational work, post-mortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and technical standards.

What you will do:

Be the STO team’s SME in how the Vena platform operates. You will foster and maintain relationships with other Staff-level engineers and Vena’s Architecture team.
Work closely with the Sr. Director, SaaS Engineering in helping develop the STO team’s technical roadmap and working closely with the Architecture team in determining where the team needs to be several quarters ahead of where they currently are.
Help Vena's technology organization build scalable systems, using best practices around automation (reliability) and developer self-service (velocity).
Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, planning and reviews.
Define and document runbooks and standard operating procedures.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Provide mentorship and training to other Vena SREs as well as members of the Product and Technology organization on emerging technologies and new processes, drive education and knowledge transfer of design patterns and technical practices.
Drive high standards around incident response practices and policies with a focus on automated response and remediation.
Participate in on-call rotation.

What we use:

Please note this reflects only a portion of our current technical stack, and we are constantly evolving and revisiting our stack as we grow:

A modern AWS cloud infrastructure managed through infrastructure-as-code (Terraform), configuration-as-code (Ansible), and CI/CD (Jenkins)
RDS MySQL, Redshift, Redshift Spectrum, MongoDB, and Elasticsearch
Kinesis, SQS, and RabbitMQ
DevOps tools written in Python
Back-end applications written using Java, Dropwizard, Spring Boot, and Hibernate
Front-end applications written using TypeScript, JavaScript, React (Context Api and Hooks), and Redux
Monitoring with Datadog, and CloudWatch

What you will bring:

10+ years of experience in an IT Operational, DevOps, Site Reliability Engineer, or Software Engineering role.
You possess technical professional-level certifications with AWS and Azure such as Solutions Architect Professional or DevOps Professional.
You are an authority and evangelist of SRE concepts such as SLO, SLIs and error budgets and have direct experience in helping multiple teams at an org-level in implementing them.
You have mastery of cloud computing platforms (AWS and Azure) and expert-level experience in setup and management of cloud infrastructure using various IaC and orchestration tools.
You can write code - in any language. You have implemented your work in a production environment and can back it up with examples.
You have mastery of tools and platforms such as: AWS, Azure, Ansible, Artifact storage (such as Artifactory, ECR), Build/Release Pipelines (such as Jenkins, Gitlab, GH Actions, or equivalents), Docker, Github, Kubernetes, Terraform etc.
Direct experience with large-scale distributed systems in the cloud using observability and telemetry for oversight of code deployments
Experience with the operational aspects of software systems using telemetry, centralized logging, and alerting with tools such as: CloudWatch, Datadog, Prometheus, etc.

This job is no longer accepting applications

See open jobs at Vena Solutions.See open jobs similar to "Staff Site Reliability Developer" Work In Tech.

See more open positions at Vena Solutions

Privacy policy Cookie policy

Communitech Hub

151 Charles Street West, Suite 100,
Kitchener, Ontario, Canada, N2G 1H6

Proud member of Canada's Tech Network

Hours: Monday - Thursday 8:30 a.m. - 5 p.m. ET
Friday 8:30 a.m. - 4 p.m. ET
Phone: +1 (519) 888-9944
Email: front.desk@communitech.ca

Connect with us

Government of Ontario logo

Government of Canada logo

Communitech acknowledges that the Hub is situated on the Haldimand Tract, land that was granted to the Haudenosaunee of the Six Nations of the Grand River, and are within the territory of the Neutral, Anishinaabe, and Haudenosaunee peoples.

To access services in French please email marketinghelp@communitech.ca