Please apply at www.castlighthealth.com/careers
The ideal candidate is someone who loves the web and web technologies, and rather than continuing to build their own site(s), is interested in helping to make the web more useful by understanding the data already there.
Building a great web crawling operation is a combination of formal artificial intelligence techniques, system scalability problems, and clever heuristics with a helping of good judgment. For the right person, this should sound like a really fun problem.
Work with team to develop software that can mine structured data from the web
Develop and scale a distributed web crawling engine
Use a combination of formal and heuristic techniques to cover the largest possible set of cases with the minimum amount of effort
Continually test, refine, and improve the accuracy of extraction technology
Strong overall software engineering background as evidenced by a degree in CS or equivalent industry experience
Current working knowledge of one of: Ruby, Java/Scala, Python
Demonstrated experience with multi-threaded systems on Unix-like environments
Fluency with web services
Significant experience with one or more web crawling technologies
Familiarity with RabbitMQ or other queuing systems
Excel at producing simple "just enough" solutions to complex problems
5 or more years of experience writing software