-
Notifications
You must be signed in to change notification settings - Fork 1
Description
When we applied string2vocabulary with strings representing cities and towns to match with Geonames in SILKNOW, we obtained a lot of bad results.
Example: http://data.silknow.org/production/41481202-0c96-3171-82ca-099088faf425.
The original city mentioned is simply "Saint Etienne" identified by http://www.geonames.org/2980291/. Strangely, string2vocabulary has matched it with a much smaller town, "Saint-Étienne-du-Rouvray" identified by http://sws.geonames.org/2980236/. Having said this, there are a 100 cities in France named "Saint Etienne something".
This shows the limit of pure fuzzy string matching. Should we consider having more complex matching techniques, e.g. relying on pre-trained word embeddings. It is possible that "Saint Etienne" used with the other contextual words (satin, faille, soie, tissu façonné) will have lead to the right city.