The Open Refine Reconciliation API allows Open Refine users to match company names to legal corporate entities. This is especially useful when you have an exist spreadsheet or dataset featuring lots of companies. Matching (or reconciling) to legal entities allows you to get more information about the companies (for example the registered address or statutory filings), and makes it easier to match with other datasets or exchange with other organisations.
https://opencorporates.com/reconcile Please note that OpenCorporates' reconciliation endpoint is now https only, and if you have previously used it you will need to update the reconciliation service URLs to be https rather than http (e.g. https://opencorporates.com/reconcile rather than http://opencorporates.com/reconcile)
Setting up Open Refine to match against companies
Start off with following this webcast, which shows you how to install Open Refine and start reconciling companies against the OpenCorporates database. Once you've done that, and understood the basics, you'll want to improve the quality of matching by doing reconciliations restricted to jurisdictions and other attributes. We're planning on doing a screencast on this but this page will give you the information you need to do these. For those familiar with Open Refine, the URL (endpoint) of the OpenCorporates reconciliation service is https://opencorporates.com/reconcile
The scoring system
Open Refine expects the API to return a matching score with each possible result. This is computed by OpenCorporates in the following way: we search all the companies matching the name of the company, doing a certain amount of normalising of the company name, and restricting to jurisdiction if requested. We then compare the possible results against the given term, and calculate the score based on the similarity. We then adjust the score, adjusting it down if the company is inactive, and also if it's a foreign branch of a company. This ensures that home companies, and current companies score more highly.
Restricting to jurisdictions
Many company names are quite common across different locations, and there is the additional problem of foreign branches, which are usually named the same as the home company. To avoid false matches, it's a good idea to limit matches to a given jurisdiction. This is very easy to do with OpenCorporates, as we provide a specific reconciliation URL (or endpoints) for each jurisdiction. This blog post explains it fully, but basically you just add the jurisdiction_code for the jurisdiction to the end of the normal reconciliation URL, e.g. https://opencorporates.com/reconcile/es for Spain, instead of https://opencorporates.com/reconcile
You can currently submit two additional parameters with each company – the jurisdiction code and a relevant date. The jurisdiction code allows the search to be restricted to a specific jurisdiction which may vary from company to company (e.g. an EU dataset). A submitted date changes the scoring behaviour, meaning that a company will score higher if it was active at that date, and score significantly lower if it didn't yet exist, or was inactive at that date. This is useful if you are matching data from the past, so that you want to match the companies which were relevant to the date of the datapoint.
Can I use the Open Refine API without using Open Refine?
Yes. This is what OpenSpending does to match individual transactions to companies. The API is a simple REST api, and is easy to call from within a program
Is the reconciliation data under an open licence?
Yes, it's under the same share-alike attribution Open Database Licence as the rest of OpenCorporates. This means you can reuse the data, even commercially, provided you release the resultant data under the same share-alike attribution licence and, of course, attribute OpenCorporates. That way the open data community benefits from more open data, and OpenCorporates gets due attribution.
Is it possible to use this for our proprietary/internal data.
Yes. Contact us at email@example.com and get a non-share-alike licence, which allows you to use without sharing the results back into the community as open data