We are seeking an experienced and highly skilled LLMOps Engineer to join our team at Thrive. This newly created role will be responsible for deploying, optimizing, and scaling large language model (LLM) applications across our platform. The successful candidate will own the operational backbone of our AI-driven products, ensuring performance, reliability, and cost-efficiency while collaborating closely with our AI and engineering teams.

If you are someone who thrives in fast-paced environments, enjoys building scalable AI infrastructure, and is excited about shaping the future of LLM capabilities at Thrive, this is the role for you.

Key Responsibilities:

Lead LLM infrastructure efforts across multiple engineering teams, ensuring scalable, secure, and efficient delivery of AI-powered features.
Design, build, and maintain production-grade systems for deploying and managing LLMs, including versioning, A/B testing, and rollback strategies.
Collaborate with the AI team to implement prompt management systems, prompt versioning, and token optimization strategies.
Monitor and optimize inference latency, throughput, caching strategies, and multi-provider cost management (OpenAI, Anthropic, AWS Bedrock, etc.).
Develop observability pipelines including quality metrics, evaluation workflows, error monitoring, and user feedback loops.
Implement and maintain Retrieval-Augmented Generation (RAG) systems, embedding pipelines, and vector database operations.
Support fine-tuning workflows and manage model registries for both proprietary and open-source models.
Implement AI safety guardrails, content filtering, and compliance measures to ensure responsible deployment.
Support general DevOps initiatives ~10% of the time, including CI/CD improvements and cloud infrastructure updates.
Maintain thorough documentation of all LLM infrastructure, processes, and best practices.

Business Problem the LLMOps Engineer Will Solve:

This role will serve as the foundation of Thrive’s AI infrastructure, ensuring our LLM-powered features are reliable, cost-effective, and scalable. By establishing strong operational systems and evaluation pipelines, the LLMOps Engineer will directly accelerate Thrive’s ability to deliver meaningful, AI-driven career solutions for our customers.

Ideal Candidate Demographics:

3+ years of experience in LLMOps, MLOps, or similar production-focused AI/ML roles.
Strong Python programming skills and familiarity with LLM libraries and frameworks.
Hands-on experience with LLM providers (OpenAI, Anthropic, AWS Bedrock, Azure, Vertex, Databricks).
Experience with vector databases such as Pinecone, Weaviate, Qdrant, or Chroma.
Knowledge of model serving tools (vLLM, TGI, Ray Serve).
Proficiency with Docker, Kubernetes, and cloud environments (AWS preferred).
Familiarity with prompt engineering, token optimization, chain-of-thought approaches, and evaluation metrics.
Experience with LLM-specific tooling (LangSmith, Weights & Biases, Phoenix, MLflow).
Ability to troubleshoot LLM issues such as latency improvements, hallucination mitigation, and context window strategies.
Strong communication skills with both technical and non-technical stakeholders.

Nice-to-Have:

Experience with open-source LLMs (Llama, Mistral, etc.).
Knowledge of advanced RAG techniques including hybrid search and re-ranking.
Exposure to agent frameworks and real-time LLM applications.
Background in traditional MLOps, data engineering, or multimodal models.
Experience with Ruby on Rails.
Understanding of AI safety and alignment principles.

Our Hiring Process:

Talent Acquisition Screening – 30 minutes

Take Home Technical – 3 days to complete

Meet Ali (Hiring Manager) - 30 -45 minutes

Live PR with our Staff Engineer - 1 Hour

Meet The Leaders

Life at Thrive:

Fast-paced, high-trust environment with significant ownership.
Opportunity to shape the foundation of Thrive’s AI infrastructure from day one.
Strong career progression and mentorship opportunities.

Total Rewards Package:

3 weeks paid vacation + 1-week holiday shutdown
Health insurance & wellness coverage
Yearly Learning & Development Allowance
Yearly Workspace Allowance

At Thrive, we understand and value diversity in our employees and are proud to be an Equal Opportunity Employer. If you require accommodation at any time during the recruitment process, please let us know.

Only those who are legally entitled to work in Canada will be considered for interview and employment.

See more open positions at Thrive Career Wellness Platform

Privacy policy Cookie policy

Communitech Hub

151 Charles Street West, Suite 100,
Kitchener, Ontario, Canada, N2G 1H6

Proud member of Canada's Tech Network

Hours: Monday - Thursday 8:30 a.m. - 5 p.m. ET
Friday 8:30 a.m. - 4 p.m. ET
Phone: +1 (519) 888-9944
Email: front.desk@communitech.ca

Connect with us

Government of Ontario logo

Government of Canada logo

Communitech acknowledges that the Hub is situated on the Haldimand Tract, land that was granted to the Haudenosaunee of the Six Nations of the Grand River, and are within the territory of the Neutral, Anishinaabe, and Haudenosaunee peoples.

To access services in French please email marketinghelp@communitech.ca