Thrive is hiring its first LLMOps Engineer to own and scale the operational backbone of our AI-powered products. This is a newly created, high-impact role responsible for deploying, optimizing, and operating large language models in production.

You’ll work closely with our AI, Engineering, and DevOps teams to ensure our LLM-driven features are reliable, performant, secure, and cost-effective. If you enjoy building scalable AI infrastructure in fast-paced environments and want to shape how LLMs are used across a growing SaaS platform, this role is for you.

What You’ll Do

Lead LLM infrastructure efforts across multiple engineering teams, enabling scalable and secure AI-powered features
Design, build, and operate production-grade LLM systems, including:
Model and prompt versioning
A/B testing and evaluation workflows
Rollback and deployment strategies
Partner with the AI team to implement prompt management, prompt versioning, and token optimization strategies
Monitor and optimize:
Inference latency and throughput
Caching strategies
Multi-provider cost management (OpenAI, Anthropic, AWS Bedrock, etc.)
Build observability pipelines, including quality metrics, error monitoring, evaluation frameworks, and user feedback loops
Implement and maintain Retrieval-Augmented Generation (RAG) systems, embedding pipelines, and vector databases
Support fine-tuning workflows and manage model registries for proprietary and open-source models
Implement AI safety guardrails, content filtering, and compliance measures
Contribute ~10% of your time to general DevOps initiatives (CI/CD, cloud infrastructure improvements)
Maintain clear documentation of LLM infrastructure, workflows, and best practices

The Problem You’ll Solve

This role establishes the foundation of Thrive’s AI infrastructure. By building robust LLMOps systems and evaluation pipelines, you’ll directly enable faster product iteration, higher-quality AI outputs, and scalable delivery of AI-driven career solutions to our customers.

What We’re Looking For

Required Experience & Skills

3+ years of experience in LLMOps, MLOps, or production-focused AI/ML roles
Strong Python programming skills and experience with LLM frameworks and tooling
Hands-on experience with LLM providers such as OpenAI, Anthropic, AWS Bedrock, Azure, Vertex, or Databricks
Experience working with vector databases (Pinecone, Weaviate, Qdrant, Chroma, or similar)
Familiarity with model serving tools (vLLM, TGI, Ray Serve)
Strong knowledge of Docker, Kubernetes, and cloud infrastructure (AWS preferred)
Experience with prompt engineering, token optimization, and LLM evaluation metrics
Familiarity with LLM observability and experimentation tools (LangSmith, Weights & Biases, Phoenix, MLflow)
Ability to troubleshoot LLM-specific challenges such as latency, hallucinations, and context window constraints
Strong communication skills and the ability to collaborate with both technical and non-technical stakeholders

Nice to Have

Experience with open-source LLMs (Llama, Mistral, etc.)
Knowledge of advanced RAG techniques (hybrid search, re-ranking)
Exposure to agent frameworks and real-time LLM applications
Background in traditional MLOps, data engineering, or multimodal models
Experience working in Ruby on Rails environments
Understanding of AI safety, governance, and alignment principles

Our Hiring Process

Talent Acquisition Screen – 30 minutes
Take-Home Technical Assignment – 3 days to complete
Hiring Manager Interview (Ali) – 30–45 minutes
Live PR / Pairing Session with Staff Engineer – 60 minutes
Meet the Leadership Team

Life at Thrive

Fast-paced, high-trust environment with meaningful ownership
Opportunity to shape Thrive’s AI infrastructure from the ground up
Strong mentorship and long-term career growth

Total Rewards

3 weeks paid vacation + 1-week holiday shutdown
Health insurance & wellness coverage
Annual Learning & Development allowance
Annual workspace allowance

Thrive is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. Accommodation is available throughout the recruitment process upon request.

Candidates must be legally eligible to work in Canada.

See more open positions at Thrive Career Wellness Platform

Privacy policy Cookie policy

Communitech Hub

151 Charles Street West, Suite 100,
Kitchener, Ontario, Canada, N2G 1H6

Proud member of Canada's Tech Network

Hours: Monday - Thursday 8:30 a.m. - 5 p.m. ET
Friday 8:30 a.m. - 4 p.m. ET
Phone: +1 (519) 888-9944
Email: front.desk@communitech.ca

Connect with us

Government of Ontario logo

Government of Canada logo

Communitech acknowledges that the Hub is situated on the Haldimand Tract, land that was granted to the Haudenosaunee of the Six Nations of the Grand River, and are within the territory of the Neutral, Anishinaabe, and Haudenosaunee peoples.

To access services in French please email marketinghelp@communitech.ca