Staff DevOps Engineer, Observability & Monitoring
Posted on Saturday, April 15, 2023
StackAdapt is a self-serve advertising platform that specializes in multi-channel solutions including native, display, video, connected TV, audio, in-game, and digital out-of-home ads. We empower hundreds of digitally-focused companies to deliver outcomes and exceptional campaign performance everyday. StackAdapt was founded with a vision to be more than an advertising platform, it’s a hub of innovation, imagination and creativity.
Our system handles over 3 million requests per second and billions of data points in the real-time advertising ecosystem. We're seeking a Infrastructure Engineer to join our growing team and focus on Infrastructure as Code, Security, and Business Continuity planning. Our technologies hosted on AWS include Ruby on Rails, Go, React, Aerospike, Redis, ScyllaDB, Elasticsearch, and others.
Learn more about our engineering culture here: https://www.youtube.com/watch?v=LXM2NrkhKZc
Watch our talk at Amazon Tech Talks: https://www.youtube.com/watch?v=lRqu-a4gPuU
StackAdapt is a Remote First company, we are open to candidates located anywhere in the Canada for this position.
What you will be doing:
- Develop and maintain a robust observability platform, leveraging tools such as Grafana, to enable comprehensive monitoring, logging, and tracing of system performance and behavior.
- Design and implement custom dashboards and visualizations in Grafana, providing real-time insights and actionable data for informed decision-making and issue resolution.
- Driving the adoption of log aggregation and analysis tools, such as Loki or similar, to centralize and streamline log management, enhancing visibility and understanding of system health.
- Work with service owners to implement tracing solutions, such as OpenTelemetry and Tempo, to gain deeper insights into system performance and identify bottlenecks and areas for optimization.
- Collaborate with cross-functional teams to define and establish key performance indicators (KPIs) and service-level objectives (SLOs) for monitoring, alerting, and incident response.
- Drive the adoption of observability best practices across the organization, promoting a data-driven approach to problem-solving and continuous improvement.
- Provide training and support to team members on the use of observability tools, fostering a culture of shared responsibility for system performance and reliability.
- Stay up-to-date with the latest trends and advancements in observability, ensuring the organization's monitoring and analysis capabilities remain cutting-edge and effective
- You will be responsible for creating and maintaining documentation for infrastructure and application components, including installation, configuration, and troubleshooting guides
We will be reaching out to candidates that have the following:
- We're looking for a strong problem solver with a track record of successfully executing complex projects
- Familiarity with establishing KPIs on critical systems and operations
- At least 6 years of experience in Infrastructure/DevOps/Site Reliability
- At least 3 years experience with AWS
- Expert experience with Linux, Kubernetes, and containerized services
- Experience managing instrumentation for complex AWS environments at scale
- Experience with as many of our existing technologies is a large plus: Terraform, Vault, Nomad, Packer, Vagrant, Boundary, Consul, Waypoint, Redis, Aerospike, Go, ScyllaDB, Elasticsearch, RDS, Grafana, Prometheus, Kafka, Envoy, Nginx
- Competitive salary + equity
- RRSP matching
- 3 weeks vacation + 3 personal care days + 1 Culture & Belief day + birthdays off
- Access to a comprehensive mental health care platform
- Full benefits from day one of employment
- Work from home reimbursements
- Optional global WeWork membership for those who want a change from their home office
- Robust training and onboarding program
- Coverage and support of personal development initiatives (conferences, courses, etc)
- Access to StackAdapt programmatic courses and certifications to support continuous learning
- Mentorship opportunities with industry leaders
- An awesome parental leave policy
- A friendly, welcoming, and supportive culture
- Our social and team events!
StackAdapt is a diverse and inclusive team of collaborative, hardworking individuals trying to make a dent in the universe. No matter who you are, where you are from, who you love, follow in faith, disability (or superpower) status, ethnicity, or the gender you identify with (if you’re comfortable, let us know your pronouns), you are welcome at StackAdapt. If you have any requests or requirements to support you throughout any part of the interview process, please let our Talent team know.
We've been recognized for our diverse and supportive workplace, high performing campaigns, award-winning customer service, and innovation. We've been awarded:
2023 Best Workplaces for Women by Great Place to Work®
2023 Best Workplaces in Canada by Great Place to Work®
See more open positions at StackAdapt
Something looks off?