Startup Jobs Startup Companies Post a Job! Startup Newswire Job Widgets
Search Startup Jobs
Funding20M-50M USD
EmployeesShhhh...
SalaryShhhh...
FulltimeYes
TelecommuteNo
IndustryWeb 2.0, Consumer, Database, Semantic Web, Infrastructure
View full company profile
Other Openings
Engineering
Engineering
Company News
Community updates: SF meetup, 10 Million Topics! Freebase workshop in New York Freebase community meeting on Some new Freebase.com features

Distributed Data Processing Guru

at Metaweb in San Francisco, CA   —   Jun 23, 2014   |  
Overview
Freebase needs YOU in its quest to build a database of the world's public knowledge. We are developing highly scalable methods for creating, loading, and reconciling large data sets. We're looking for someone that lives and breathes distributed processing.

If you choose to accept this mission, please:

1. Submit a cover letter and resume in plain text, HTML, or PDF.
2. Distinguish yourself from others by responding in writing to the following questions:
2a. What is your favorite programming language? Why?
2b. What's most broken with SQL as an API of database access? How would you fix or replace it? What would a representation of your personal music collection information in your new, improved design allow you to do that you couldn't easily do with a standard relational database?
2c. Imagine a graph that consists of directional links between nodes identified by small non-negative integers < 2**16. We define a "cycle" in the graph as a nonempty set of links that connect a node to itself. Imagine an application that allows insertion of links, but wants to prevent insertion of links that close cycles in the graph. For example, starting from an empty graph, inserting links 1 ->2 and 2 ->3 would succeed; but inserting a third link 3 -> 1 would fail, since it would close the cycle 1 ->2 ->3 ->1. However, inserting a link 1 ->3 instead would succeed. In your favorite programming language, declare data structures to represent your graph, and provide code to populate your data structures with a starting graph and to perform an "insert link" function that fails if a new link would close a cycle. What is the time and space complexity of your solution? Hint: a good solution performs an insert much more efficiently than in O(e) time, where e is the number of edges in the graph.
Responsibilities
Developing and evolving a declarative query language on a distributed compute cluster
Write large-scale data manipulation operators
Creating directed search crawler operations for structure extraction from the web
Hard problems that require you to learn quickly and take ownership
A chance to work with world-class people to change the world
Freedom to do things the right way
Experience
We want you to have a passion to work with data and give it away to the world.
Database or OLAP engine internals, or analytical business applications
Distributed computing
Language design, interpreters or compilers
Map/Reduce algorithms
Want this Job? Apply Now
About Us  |  Privacy Policy   |   Terms & Conditions  |  Contact Us
© 2014 Job Alchemist, Inc. All rights reserved.

Feedback

Startuply is in beta. Love it? Hate it? Want to suggest new features or report a bug? We'd love to hear from you.
    
Feedback