Startup Jobs Startup Companies Post a Job! Startup Newswire Job Widgets
Search Startup Jobs
Funding20M-50M USD
EmployeesShhhh...
SalaryShhhh...
FulltimeYes
TelecommuteNo
IndustryWeb 2.0, Consumer, Database, Semantic Web, Infrastructure
View full company profile
Other Openings
Engineering
Engineering
Company News
Community updates: SF meetup, 10 Million Topics! Freebase workshop in New York Freebase community meeting on Some new Freebase.com features

Data Pipeline Engineer

at Metaweb in San Francisco, CA   —   Jul 23, 2014   |  
Overview
Metaweb's data pipeline operations team is responsible for making sure that the cutting-edge algorithms written by our world-class semantic engineers are continuously (and consistently) populating Freebase with ever more topics and assertions. We need your help in building out our pipeline architecture, integrating in new algorithms, and writing monitoring frameworks to make sure everything is running smoothly. We've also got a number of data sources that need to be analyzed and mined.

Your efforts will help us grow Freebase into a compendium of all the world's knowledge!

If this sounds like you, then please apply with your resume in HTML or pdf.

Let yourself stand out from the crowd by sending us your thoughts on the following:

1. A reliable data pipeline consists of more than just continuous running of code that was successful for single data load. What are some design strategies for a data pipeline that will increase reliability, auditability, and maintainability? Are there other important characteristics that a data pipeline should have?
2. Complex data operations often produce output that cannot be automatically audited for correctness, because there is no "gold standard" to compare to (other than the algorithm itself). Discuss three strategies for assuring that these algorithms are indeed running properly.
3. As the number of data sources and algorithms added to an data pipeline increases, the chances of system failure also increases. What failure modes are to be expected as the pipeline's complexity increases? How can they be prevented?
Responsibilities
Architecting stable, scalable data processing frameworks
Creating innovative parallel processing systems
Designing solutions for problems of massive scale
Working with the best, brightest, and funnest people in the industry
Experience
Experience with large-scale operational data processes, such as news processing, sentiment analysis, preference matching, data-centric web applications, and or web indexing operations
A passion for process, detail, and quality
Built real-time monitoring and test frameworks to assure operational quality
Understands the difference between production code and quick scripts, and is comfortable creating either when required
Want this Job? Apply Now
About Us  |  Privacy Policy   |   Terms & Conditions  |  Contact Us
© 2014 Job Alchemist, Inc. All rights reserved.

Feedback

Startuply is in beta. Love it? Hate it? Want to suggest new features or report a bug? We'd love to hear from you.
    
Feedback