At DNAnexus we are solving the most challenging computer science problems you’re ever likely to see.
In the last few years, there has been a dramatic development in the world of genomics that has created a huge new opportunity. The price to sequence the full human genome (all of your DNA, not just a sample of it) has fallen to the point were it will soon be affordable for a patient to have multiple samples of their whole genome sequenced to help treat their disease. Want to know what specific gene mutation caused a patient’s cancer? We are building the platform to answer that kind of question. One of the many challenges is the huge amount of data. Think you’ve seen big-data problems? Think again – with each genome comprising 100 GB and months of CPU time to crunch the information, DNA is the next big-data problem, requiring exabytes of storage and parallel workloads distributed across 100,000 servers. We are tackling this by combining web technologies, big-data analytics, and scalable systems on cloud computing infrastructure.
We are a well-funded start-up backed by Google Ventures, TPG Biotech, and First Round capital. Our founders, Andreas Sundquist, Arend Sidow, Serafim Batzoglou are world-renowned genomics and bioinformatics experts from Stanford University.
You are a computer scientist with expertise in computational genomics, algorithms, or machine learning, and you aspire to work on challenging compute and data-intensive problems. You may not have a background in all three areas, but you are very strong in at least one area and are able to learn quickly.
Research, implement, and test new analysis methods in computational genomics
Design parallel methods to orchestrate data transfer and compute across 1000s of nodes
Innovate on storage encoding and transfer technologies for efficiently handling Petabyte datasets
Computational Genomics / Bioinformatics background: data structures, statistics, classic genomics algorithms including Smith-Waterman, BLAST, Burrows-Wheeler transform. Comfortable implementing these methods from scratch and innovating on them.
Algorithms background: Dynamic programming, complexity, randomized algorithms, distributed algorithms, online algorithms, compression. Good understanding of practical implementation issues.
Machine learning background: Probabilistic/Bayesian modeling and inference, classification, regression, optimization.
Deep understanding of machine code execution, operating systems, memory hierarchy and locality, multi-processor concurrency
Fluency in C++ ideal, not afraid to code low-level modules from scratch
Strong mathematical skills, especially statistics
Data mining skills
BS, MS or PhD in Bioinformatics, Computer Science or relevant technical or life sciences discipline
Competitive base salary, stock options and health benefits.