Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Everything gets matched against first line of CSV #23

Open
NumerousHats opened this issue Sep 1, 2015 · 2 comments
Open

Everything gets matched against first line of CSV #23

NumerousHats opened this issue Sep 1, 2015 · 2 comments

Comments

@NumerousHats
Copy link

I have a canonical list of law firm names, and I am trying to fuzzy match them against a column of messy, user-generated names in OpenRefine.

All seems to work with no errors, except that every single name in the OpenRefine column appears to be matching to the first line of the canonical list, even though there are exact matches present.

Here is the CSV file being read into reconcile-csv:

firm,key
Aaronson Rappaport,1
Adams Reese,2
Adelson Testan,3
Adler Pollock,4
Ahlers Cooney,5
Ahmuty Demers,6
Akerman,7
Akin Gump,8
Allen Kopet,9
Allen Matkins,10
Alston Bird,11
Alston Hunt,12
Alvarado Smith,13
Anderson Kill,14
Andrews Kurth,15
Archer Greiner,16
Archer Norris,17
Arent Fox,18
Armstrong Teasdale,19
Arnall Golden,20
Arnold Porter,21
Arnstein Lehr,22
Arthur Chapman,23

and here is some made-up data that I have in OpenRefine:

Akerman
Akin Gump Something Something Else
Whatsa
Allen Thingy
Alston Bird
Alston Hunter
Alvarado Gracioso
Anderson Killer
Andrews Girth
Archer Greiner
Archer Norris Joe & Bob
Aberrant Fox
Armstrong Teasdale
Arnall Golden Dawn
Arnold Porter

As you can see, this contains exact matches, various misspellings, "extra text", and complete non-matches.

I started reconcile-csv as java -Xmx2g -jar reconcile-csv-0.1.2.jar canonical.csv firm key, and after adding the local reconciliation service and running the reconciliation, the result looks like this:

untitled

It looks like everything is matching to "Aaronson Rappaport" (the first line of the CSV file). Is this a bug, or am I doing something stupid?

@mihi-tr
Copy link
Contributor

mihi-tr commented Sep 8, 2015

This looks like an interesting but. As if reconcile can only read the first line. Which plattform are you running on?

@NumerousHats
Copy link
Author

MacOS 10.10.5 (corrected typo: I initially wrote 10.5.5 as a finger-slip)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants