Overview
Evri’s Entity Store team utilizes cloud-based technologies to develop a distributed, scalable system used to identify, categorize, and retrieve data records for as many distinct newsworthy entities as possible. These entities are consumed by the natural language processing system to tag news articles, which are then served up through our website and various mobile applications.
We are looking for a strong engineer to expand our automated system for targeting and importing structured data from the web, and to design methods for minimizing the number of duplicate and erroneously merged entities created by our automated import pipeline.
Responsibilities
Implement a production-quality data importer which uses ontological-type definitions to harvest structured information from the web
Develop solutions for minimizing the manual tasks required of a human data curator by improving data quality measurement tools and streamlining curation tasks
Integrate a sports statistics database with the rest of the entity store so real-time data can be associated with the entities it reflects
Develop a general system for handling structured data about other entities which changes in real time, such as stock quotes
Identify redundant or overly complex code and systems in an effort to simplify and harden the existing pipeline
Experience
Rock solid software engineering fundamentals are the only absolute requirement – but Hadoop, Lucene, Solr, SQL, Ruby and Java are a big plus
Experience with ontologies, knowledge representation, and/or processing large, heterogeneous data sets is another plus
An obsession with creating things, whether it’s code or personal projects