(Senior) Research Scientist - Large Language Models for Genomics
Deep Genomics
Key Responsibilities
- Design and implement multi-agent workflows that integrate internal foundation models (e.g., BigRNA, REPRESS, FlashRNA) and external tools to identify new biological hypotheses.
- Develop systems that leverage Retrieval Augmented Generation (RAG) by connecting LLMs to internal scientific documents, SOPs, and structured biological databases.
- Collaborate with the machine learning team on model distillation strategies to create smaller, faster models suitable for a real-time, interactive chat interface.
- Build out and maintain the infrastructure for the LLM agent, including databases and model context protocol (MCP) endpoints.
- Work closely with end-users in therapeutic design, target discovery, and experimental biology to identify key use cases, gather feedback, and rapidly iterate on the product.
- Ensure the system is transparent and trustworthy by building "explainable AI" features that help users understand and verify the AI's outputs and decisions.
Basic Qualifications
- MSc or PhD in Computer Science, Computational Biology, Bioinformatics, or a related field.
- 3+ years of hands-on experience architecting and building complex applications using Large Language Models.
- Expert knowledge of Python and modern MLOps frameworks and tools; experience with agentic frameworks like LangChain is essential.
- Demonstrated experience in building multi-agent systems that can plan, execute tasks, and interact with external tools and APIs.
- Familiarity with high-performance computing environments and cloud services (e.g., AWS, GCP).
- Excellent communication skills and the ability to work effectively in a multidisciplinary team, translating the needs of biologists and drug developers into technical solutions.
- Intellectual curiosity, critical thinking, and a commitment to innovation and scientific rigor.
Preferred Qualifications
- A strong background in genomics, computational biology, or bioinformatics, including experience with NGS data analysis or large-scale biological datasets.
- Prior experience in the biotech or pharmaceutical industry, particularly in a drug discovery context.
- Experience with model distillation or creating smaller, specialized models from larger foundation models.
- Familiarity with scientific workflow management systems and tools (e.g., Docker, Conda).
What we offer
- A collaborative and innovative environment at the frontier of computational biology, machine learning, and drug discovery.
- Highly competitive compensation, including meaningful stock ownership.
- Comprehensive benefits - including health, vision, and dental coverage for employees and families, employee and family assistance program.
- Flexible work environment - including flexible hours, extended long weekends, holiday shutdown, unlimited personal days.
- Maternity and parental leave top-up coverage, as well as new parent paid time off.
- Focus on learning and growth for all employees - learning and development budget & lunch and learns.
- Facilities located in the heart of Toronto - the epicenter of machine learning and AI research and development, and in Kendall Square, Cambridge, Mass. - a global center of biotechnology and life sciences.