Software Engineer - Data Engineer
BenchSci
You Will:
- Collaborate with Machine Learning, Full-stack engineers and Science to solve complex document mining challenges, helping us capture and model additional scientific experiments
- Scale data pipelines to allow our data to go from research to platform quickly and reliably
- Work with sources that contain both semi-structured and unstructured data
- Use your experience to help define and apply best practices for a broad platform of technologies in a cloud-based environment
- Architect and maintain robust data pipelines that ingest diverse sources and utilize LLMs for high-fidelity entity extraction into structured formats
- Implement evaluation frameworks to monitor the accuracy, drift, and hallucination rates of extraction models within the production pipeline.
- Lead or consult the authoring of engineering design proposals following the unified Platform Stream roadmap at BenchSci
- Leverage a deep understanding of the business context and the team’s goals to unlock independent technical decisions in the face of open-ended requirements
- Proactively identify new opportunities (from both internal and external sources) and advocate for and implement improvements to the current state of projects
- Respond with urgency and drive urgency in own team to operational issues, owning resolution within one's sphere of responsibility
- Challenge the status quo and propose newertechnologies or ways of working
You Have:
- A degree in Computer Science/Engineering or a related field within science
- 3+ years experience working as a software developerin the industry
- Proficient with Python
- Proficient with SQL
- Experience using LLMs for structured data extraction
- Experience with event-driven architecture with Pub/Sub
- A track record in building high-quality, maintainable code
- Experience with cloud computing (for example: GCP, Azure, AWS)
Nice To Have:
- ML/Data science exposure
- Worked with Auth0, Terraform
- Have experience with data warehouse solutions like BigQuery, and databases including AlloyDB and Spanner
- Have experience with agentic driven development and AI-based tools like Cursor or Claude Code
- Have experience with building ConversationalAI solutions
