hero

Canada's Talent Marketplace

Find your next role at Canada's fastest-growing tech companies
companies
Jobs

QTG - Digital Investing Engineering - Senior Production Support Engineer

Flexiti Financial

Flexiti Financial

Customer Service
Canada · North York, Toronto, ON, Canada
CAD 100k-120k / year
Posted on Feb 20, 2026

What’s in it for you as an employee of QFG?

  • Health & wellbeing resources and programs

  • Paid vacation, personal, and sick days for work-life balance

  • Competitive compensation and benefits packages

  • Work-life balance in a hybrid environment with at least 3 days in office

  • Career growth and development opportunities

  • Opportunities to contribute to community causes

  • Work with diverse team members in an inclusive and collaborative environment

This job posting is for an existing vacancy

We’re looking for our next Senior Production Support Engineer. Could It Be You?

We are searching for a Senior Production Support Engineer who acts as a pillar of platform stability. At Questrade, we don’t just watch dashboards—we build the intelligence that anticipates failures. Our mission is to achieve absolute reliability and resilience, ensuring our high-frequency trading platforms remain unshakable under the pressure of global markets.

In this role, you don't just fix problems in Production; you engineer out their possibilities. You are a practitioner of Site Reliability Engineering (SRE) principles who views manual toil as a bug to be automated. Most importantly, you are at the forefront of the AI revolution, leveraging Machine Learning (ML) and Large Language Models (LLMs) to transform reactive ops into predictive, self-healing systems.

Need more details? Keep reading…

Your Domain: The Intersection of SRE & AI

As a Senior Production Support Engineer within the Questrade Brokerage Engineering Group, your playground involves:

  • SRE-First Operations: Treat operations as a software problem. Define and manage SLIs, SLOs, and Error Budgets to balance the velocity of innovation with the necessity of stability.

  • AI-Powered Automation: Lead the charge in applying ML models for "Smart Alerting" and anomaly detection. You will move beyond static thresholds to systems that understand patterns and mitigate outages before they happen.

  • Autonomous Resilience: Architect and implement self-healing workflows. Whether it's auto-scaling GKE clusters or building LLM-assisted incident response playbooks, you are building the "autopilot" for our production environments.

  • Cloud Native Mastery: Orchestrate highly available systems using GKE (Google Kubernetes Engine), Terraform, and Helm, ensuring our infrastructure is declarative, versioned, and resilient.

  • The Calm in the Storm: When critical incidents occur, you are the Incident Commander. You lead cross-functional "War Rooms" with Engineering, DBAs, and Network teams, always followed by a blameless post-mortem to drive long-term system improvement.

  • Domain Expertise: Strong foundational knowledge of trading systems is desired but not required.

So are YOU our next Senior Production Support Engineer? You are if you…

You are a high-level problem solver who bridges the gap between deep systems knowledge and cutting-edge software engineering.

Your Technical Arsenal

  • Automating Support: You have 5+ years of experience in production support and strong engineering knowledge , with a mastery of Python, C#, and Bash.

  • Cloud & Orchestration: Mastery of Google Cloud Platform (GCP) and Google Kubernetes Engine (GKE); advanced proficiency in container orchestration and IaC using Kubernetes (Kubectl), Helm, and Terraform.

  • Modern Observability: Advanced capability in Splunk (Administration/Querying) and Datadog, with experience building AI-enhanced dashboards.

  • System Versatility: You bring strong experience with Linux systems and a good understanding of Windows IIS server, allowing you to troubleshoot and optimize service layers across a diverse application stack.

  • Database & Network Mastery: Strong grasp of MS SQL/MySQL and a deep understanding of Network protocols (TCP/IP, UDP).

Mastery of the Incident Lifecycle

  • Incident Command: Act as the primary lead during high-severity events. You will coordinate between Engineering, DBA, Network, and Executive stakeholders to drive rapid service restoration.

  • The "Resolution First" Mindset: You possess the technical intuition to quickly distinguish between symptoms and root causes under pressure, applying immediate remediation to restore service followed by a rigorous post-mortem analysis to eliminate technical debt.

  • Automated Incident Response: You leverage Python, Bash, and AI-driven tools to automate common remediation patterns (e.g., auto-restarting stalled pods, triggering failovers, or clearing cache bottlenecks).

  • Predictive Analysis: Use AI and smart alerting to identify "near-misses"—anomalies that didn't cause an outage this time but will if left unaddressed.

The SRE Mindset

  • Toil’s Worst Enemy: You aggressively automate repetitive tasks, freeing the team to focus on high-value engineering.

  • AI Enthusiast: You are curious about how Generative AI can be used to summarize logs, generate root cause analyses, or enhance internal documentation.

  • Strategic Leader: You take ownership of high-stakes tasks, mentoring junior members of the team and setting the gold standard for technical excellence.

  • Resilient Under Pressure: You possess the focus and detail-oriented nature required to navigate high-pressure trading environments, including on-call rotations and market-critical batch windows.

Why You’ll Thrive at Questrade

  • Leading the Fintech Frontier: Work at the center of Canada's most innovative brokerage, where your work directly impacts the financial future of our clients.

  • A Culture of Ownership: You aren't a cog; you are a decision-maker. We empower you to propose architectural shifts and lead technical investments.

  • Infinite Learning: We don't just support your growth—we demand it. From AI workshops to advanced cloud certifications, you’ll be constantly evolving

Compensation Information:

  • Base salary range: $100,000 - $120,000

  • The final compensation package will be commensurate with the successful candidate's experience, skills, and geographic location (Canada). It includes a comprehensive benefits plan and a competitive incentive (bonus) program for Full-Time Permanent roles.

Sounds like you? Click below to apply!

#LI-Hybrid #LI-SK1