Senior Software Engineer – Crawler at TheFind
We’re looking for a top notch crawler engineer. We have a web scale distributed crawler which crawls a billion web pages. We encounter some unique problems in crawling like crawling dynamic websites, very deep crawl, preventing crawler traps etc. One part of the crawler is a distributed file system, which we’ve built from scratch to store billions of objects in a distributed fault tolerant way. Another component is a web scale duplicate detection system for which we’ve been granted a patent. Your responsibility will be to manage and build upon all these systems. You’ll work on both content analysis (trap detection, web graph mining etc.), and actual crawling (scheduling, politeness, storage system etc.)
If you have experience in building or managing a large scale crawler, that’s a huge plus. If you’ve architected high transaction distributed systems written in C++, you might be the right person for the job. You will be the sole owner of the crawler, and primary responsibility will center on improvements in performance and scalability of the crawler. You’ll work with the systems architects and CTO as needed to identify crawler enhancements, design solutions, development and deployment to production.
TheFind is a pre-IPO startup in the Bay Area in the area of ecommerce ($200 billion market). We’re building a shopping search engine that will be the premier shopping destination on the web, beating Amazon and Google. TheFind is solving fundamental problems in comprehensiveness and information extraction in the shopping domain by using machine learning.
We successfully crawl a billion web pages and use automated machine learning methods to do product extraction and search relevance. We are focusing on all areas of shopping search like comprehensiveness, price comparison, coupons & deals, reviews, local shopping search, eco-friendly shopping, hot trends, personalized and social shopping, visual shopping search, mobile shopping etc.
The vision of TheFind is to become the starting point for shopping search on the web.
Some Stats about TheFind:
- Targeting $5.2B online shopping advertising spend. Funded by Redpoint, Lightspeed, Bain Capital Ventures
- Leadership team from Stanford, Princeton, UCB, Harvard, IIT, BITS Pilani, GSB, Yahoo/Inktomi/Altavista, Oracle, Microsoft, VMware, eBay, Verity
- 450 million products, 500,000 stores. Compare this with shopzilla/shopping.com/pricegrabber/amazon (~20 million products, ~5000 stores)
- Largest coupon search index, largest local shopping search index, largest visual search index, largest social commerce index
- 6 patents granted
MS/PhD in Computer Science
3+ years of relevant industry experience
Expert in developing, maintaining and troubleshooting c++ systems on linux
Experience in distributed systems design, specifically network and multi-threaded programming.
An ability to work both independently and collaboratively
Excellent communication skills, solid work ethic, and a strong desire to write production-quality code
We give competitive salary, stock options, health benefits, 401K etc. You’ll be joining a group of experienced folks who love to innovate