Senior Software Engineer, Data Infrastructure
Software Engineering, Other Engineering
Toronto, ON, Canada
Posted on Thursday, July 20, 2023
Who are we?
We’re a team of engineers, thinkers, and champions whose aim is to give technology language. Every day our team is breaking new ground, as we build transformational AI technology and products for enterprise and developers that wish to harness the power of Large Language Models.
We're driven by ambition, as we firmly believe that our technology has the potential to revolutionise the way industries engage with natural language. Our strong technical foundation speaks for itself, with our team composed of world-class experts who have collectively accumulated hundreds of thousands of citations in academia.
The Cohere team is a collective of college dropouts, PhDs, alumni of big tech and scrappy start-ups, new grads and career pivots, who believe a diverse team is the key to a safer, more responsible technology. At Cohere, work isn’t the opposite of play, as we build the future of language AI with team members on almost every continent in the world, working from high rises, cabins, tour buses, and dog-friendly offices.
There’s no better time to herald the next step with us as we shape the future of Generative AI.
Why this role?
At Cohere, we strive to continually improve our large language models. Academic research and real-world experience has demonstrated that high quality, diverse datasets can contribute as much to the performance and capabilities of LLMs as the underlying model architecture and training regimen. We at Cohere believe data will play a central role in accelerating the advancement of our already world-class language models.
Data is therefore critical to our success. Our ability to acquire data that is accurate, relevant, and timely is key to our ability to improve the quality of our models. We strive to continuously improve our data acquisition processes and systems to ensure that we have the data we need to stay competitive and meet the needs of our customers. We run frequent experiments to learn more about the role of data for model quality, from data mixtures, to cleaning techniques, to quality control.
This role will be part of the Data Acquisition team, which broadly provides data for training models and is responsible for building and maintaining the infrastructure that acquires, cleans, and formats data for model training. We are looking for a technically skilled, resourceful problem-solver who is able to work in areas of ambiguity and find efficient and sometimes creative solutions. The main responsibility of this role is to improve our internal data acquisition infrastructure, which includes data crawlers, formatters, and integrations with data providers. This role would also work closely with different teams at Cohere to support their data acquisition needs, as well as engage in more experimental work to develop highly informative data signals.
Please Note: We have offices in Toronto, Palo Alto, and London but embrace being remote-first! There are no restrictions on where you can be located for this role.
As a Senior Software Engineer specializing in Data Infrastructure, you will:
- Design, build, and maintain data infrastructure to support large-scale data pipelines
- Establish infrastructure to enable rigorous experimentation and evaluation
- Implement best practices for data management and governance
- Develop technical proposals and lead vendor selection processes
- Collaborate cross-functionally to ensure infrastructure alignment across the organization
- Mentor junior team members and contribute to their professional growth
You may be a good fit if:
- You have more than 5 years of experience working as a software engineer specializing in data infrastructure.
- You have proficiency in Python and have used distributed processing technologies like Spark, Dask, etc.
- You have experience working with job orchestrators (e.g. Airflow, Dagster, Prefect) and/or MLOps frameworks (e.g. Metaflow, Pachyderm, DVC).
- You have built data pipelines and ETL processes to support large-scale datasets
- You have experience working with unstructured and/or human-annotated data (e.g., collecting or assessing sample quality).
- You have worked with ML frameworks such as Tensorflow, TF-Serving, JAX, and XLA/MLIR
- You have strong communication and problem-solving skills, and prefer using the right tool for the job even if it’s outside your wheelhouse.
- You have a demonstrated passion for applied NLP models and products
If some of the above doesn’t line up perfectly with your experience, we still encourage you to apply! If you consider yourself a thoughtful worker, a lifelong learner, and a kind and playful team member, Cohere is the place for you.
We value and celebrate diversity and strive to create an inclusive work environment for all. We welcome applicants of all kinds and are committed to providing an equal opportunity process. Cohere provides accessibility accommodations during the recruitment process. Should you require any accommodation, please let us know and we will work with you to meet your needs.
🤝 An open and inclusive culture and work environment
🧑💻 Work closely with a team on the cutting edge of AI research
🍽 Free daily lunch
🦷 Full health and dental benefits, including a separate budget to take care of your mental health
🐣 100% Parental Leave top-up for 6 months for employees based in Canada, the US, and the UK
🎨 Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
🏙 Remote-flexible, offices in Toronto, Palo Alto, San-Francisco and London and co-working stipend
✈️ 6 weeks of vacation
Note: This post is co-authored by both Cohere humans and Cohere technology.