Startup Jobs Startup Companies Post a Job! Startup Newswire Job Widgets
Search Startup Jobs
Funding20M-50M USD
EmployeesShhhh...
SalaryShhhh...
FulltimeYes
TelecommuteNo
IndustryWeb 2.0, Consumer, Database, Semantic Web, Infrastructure
View full company profile
Other Openings
Engineering
Engineering
Company News
Community updates: SF meetup, 10 Million Topics! Freebase workshop in New York Freebase community meeting on Some new Freebase.com features

Senior Identity Services Engineer

at Metaweb in San Francisco, CA   —   Jun 23, 2014   |  
Overview
Metaweb has built a large database to support collaborative web applications. One of the issues with a large database is that there's lots of things that are the same but have slightly different names, and then there's lots of things with the same name that are actually different. You'll recognize this problem as entity resolution, a.k.a., reconciliation.

We're looking for a principal engineer and architect to play a key role in what we call "Identity Services", a set of reusable capabilities related to reconciliation. You will devise novel methods for high-performance, distributed matching of records, which we will put to use in a number of applications. You'll create mechanisms for assigning confidence scores to potential matches.

Ideally, you'd have experience with graph-based entity resolution in either an academic or commercial setting. A background in clustering and classification algorithms would be a big plus (e.g., k-means, SVM, and Bayesian classifiers).

We are passionate about making a large scale, community driven repository for structured information. We like to create tools that make crowd-sourced databases fun, interesting and easy to use. Freebase provides you with an enormous, real-world set of topics for developing innovations in similarity, clustering, and classification.
Instructions

If this seems like your kind of scene, please submit a cover letter and resume in PDF, plain text or HTML, and include your answer to the following questions:

1. What is your favorite programming language? Why?
2. Why is cosine similarity considered a good similarity measure for use with tf-idf in information retrieval? Why not use Euclidean distance?
3. How does the use of kernels play into creating a good similarity measure?
4. Devise a similarity measure that compares descriptions of objects or topics. Consider the following pairs: 1) Mark Twain, Samuel Clemens, 2) Mark Twain, Kurt Vonnegut, and 3) Mark Twain, Shania Twain. Ideally your similarity measure should predict that the first pair is identical or very similar, and the successive ones are lesser so. If not, explain why. What kinds of descriptions of objects are better than others, and why?
Want this Job? Apply Now
About Us  |  Privacy Policy   |   Terms & Conditions  |  Contact Us
© 2014 Job Alchemist, Inc. All rights reserved.

Feedback

Startuply is in beta. Love it? Hate it? Want to suggest new features or report a bug? We'd love to hear from you.
    
Feedback