Metaweb's data pipeline operations team is responsible for making sure that the cutting-edge algorithms written by our world-class semantic engineers are continuously (and consistently) populating Freebase with ever more topics and assertions. We need your help in building out our pipeline architecture, integrating in new algorithms, and writing monitoring frameworks to make sure everything is running smoothly. We've also got a number of data sources that need to be analyzed and mined.
Your efforts will help us grow Freebase into a compendium of all the world's knowledge!
If this sounds like you, then please apply with your resume in HTML or pdf.
Let yourself stand out from the crowd by sending us your thoughts on the following:
1. A reliable data pipeline consists of more than just continuous running of code that was successful for single data load. What are some design strategies for a data pipeline that will increase reliability, auditability, and maintainability? Are there other important characteristics that a data pipeline should have?
2. Complex data operations often produce output that cannot be automatically audited for correctness, because there is no "gold standard" to compare to (other than the algorithm itself). Discuss three strategies for assuring that these algorithms are indeed running properly.
3. As the number of data sources and algorithms added to an data pipeline increases, the chances of system failure also increases. What failure modes are to be expected as the pipeline's complexity increases? How can they be prevented?
Architecting stable, scalable data processing frameworks
Creating innovative parallel processing systems
Designing solutions for problems of massive scale
Working with the best, brightest, and funnest people in the industry
Experience with large-scale operational data processes, such as news processing, sentiment analysis, preference matching, data-centric web applications, and or web indexing operations
A passion for process, detail, and quality
Built real-time monitoring and test frameworks to assure operational quality
Understands the difference between production code and quick scripts, and is comfortable creating either when required