Senior MLOps Engineer
Deep Genomics
Toronto, ON, Canada
CAD 175k-200k / year + Equity
Key Responsibilities
- Maintain and improve cloud infrastructure (GCP) using Infrastructure-as-Code tools (Terraform).
- Manage IAM, RBAC, and permission policies across cloud environments.
- Own and evolve CI/CD pipelines (CircleCI, GitHub Actions) and ensure best practices are followed across the engineering and ML teams.
- Administer and support workflow orchestration platforms (e.g., Seqera/Nextflow, Argo, Kubeflow).
- Operate and configure ML experiment tracking and registry tooling (e.g., W&B, MLflow).
- Build and maintain containerized environments (Docker) and manage Kubernetes clusters.
- Manage GPU resources – provisioning, scheduling, and debugging hardware and driver issues.
- Write and maintain Python tooling, scripts, and integrations that support ML infrastructure.
- Help deploy ML models to production environments and monitor their performance.
Basic Qualifications
- 4+ years of experience operating production infrastructure.
- Proficiency with cloud platforms (GCP preferred; AWS/Azure acceptable) and Infrastructure-as-Code (Terraform).
- Extensive Hands-on experience with Kubernetes and containerization (Docker).
- Solid background in CI/CD systems (CircleCI, GitHub Actions, or similar).
- Experience managing GPU compute (provisioning, debugging, driver management).
- Familiarity with Python package and environment management (e.g., pip, conda, pixi).
- Strong Python programming skills.
- Self-motivated problem solver with excellent communication skills.
Preferred Qualifications
- Understanding of ML frameworks (e.g., PyTorch, PyTorch Lightning), ML workflows (training, inference, evaluation), and the model lifecycle.
- Familiarity with MLOps tooling (e.g., W&B, Ray, VertexAI) and distributed compute patterns
(e.g., DDP, realtime/batch inference, multi-node training). - Familiarity with Kubernetes CRDs and batch/gang schedulers (e.g., Volcano, Kueue).
- Experience working with large-scale datasets (storage, versioning, efficient access patterns).
- Experience working directly with scientists and researchers in an interdisciplinary setting.
- Knowledge of biology and/or machine learning science.
- Familiarity with data compliance and governance frameworks (e.g., HIPAA, SOC 2).
- Previous startup experience.
What We Offer
- A collaborative and innovative environment at the frontier of computational biology, machine learning, and drug discovery.
- Highly competitive compensation, including meaningful stock ownership.
- Comprehensive benefits - including health, vision, and dental coverage for employees and families, employee and family assistance program.
- Flexible work environment - including flexible hours, extended long weekends, holiday shutdown, unlimited personal days.
- Maternity and parental leave top-up coverage, as well as new parent paid time off.
- Focus on learning and growth for all employees - learning and development budget & lunch and learns.
- Facilities located in the heart of Toronto - the epicenter of machine learning and AI research and development, and in Kendall Square, Cambridge, Mass. - a global center of biotechnology and life sciences.
