Feature Request: Geolocation using alternate Postal Code DB

Started by ThankYou, November 18, 2024, 03:35:30 AM

Previous topic - Next topic

ThankYou

Is it possible to build the geolocation database using the Postal Code dataset from GeoNames?

I have been using the geolocation feature of exiftool which is awesome. The issue I've encountered is inconsitencies in the source data from the full GeoNames dataset. I noticed that geonames also provides a Postal Code database which is a simpler but more consistent, and I think some users including myself may prefer that to the larger dataset.

https://download.geonames.org/export/zip/

StarGeek

Phil is currently away for a couple of weeks, so there will be some time before you get a response on this.

See this post for instructions on how to get an email notification when there's an update.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

FrankB

Maybe this gives you an idea for an alternative. You can query https://Geocode.maps.co for postal codes. For the Netherlands it works. You will need to get a free Api_key!

https://geocode.maps.co/search?q=5421%20cc&api_key=Your_Api_Key
 
place_id    339196343
licence    "Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright"
boundingbox   
0    "51.394593186486"
1    "51.714593186486"
2    "5.5298097189189"
3    "5.8498097189189"
lat    "51.554593186486485"
lon    "5.6898097189189185"
display_name    "Gemert, Gemert-Bakel, North Brabant, Netherlands, 5421 CC, Netherlands"
class    "place"
type    "postcode"
importance    0.325

You can test this also with ExifToolGui.
Gui_1.jpg
Gui_2.jpg

Frank

stoffball

Quote from: ThankYou on November 18, 2024, 03:35:30 AMI noticed that geonames also provides a Postal Code database which is a simpler but more consistent,

This is not generally valid. For Germany the postal code database contains e.g. also many companies which have an own postal code. And for bigger cities which have several postal codes there are several entries for the different sub-regions of the cities. So this postal code database is not per se simpler.

Also exiftool is using the feature code and the population number to find the next match. Both of these search criteria are missing in the postal code database. So this would require more changes to the code.

ThankYou

Thank you everyone for the comments. @FrankB, I think your suggestion seems like it would be exactly what I'm looking for, but is that something that can be called via the exiftool terminal command?

My use case is that I am reorganizing a large library of photos and images. I'm using exiftool to create an XMP sidecar for every image with the various meta data refactoring rules I've developed to suit my needs.

Ideally, I would like to include the city, county, state/province, and country metadata in the XMP file.

The challenge I've found with the geonames Gazetteer dataset (the one exiftool uses) is that population data is missing for a lot of cities so using the population filter on a larage diverse set of images has incosistent results, and it has nothing to do with exiftool directly.

I've tried to remedy this first by tweaking my exiftool geolocation settings to craft the output, and then by tweaking the build_geolocation settings to craft the dataset. My ultimate conclusion is that geonames dataset is just too inconsistent for my use case.

GeoNames provides a second Postal Codes dataset that lacks the population data, and is limited to cities and postal codes, and in my cursory review is much more consistent than the Gazetteer dataset. I've considered taking the Postal Code dataset, and then manually adapting it to the GeoNames Gazetteer dataset format, and then running the exiftool build_geolocation tool as-is to produce a postal code dataset that works with exiftool. The issue I perceive with this approach is that the geolocation function will return the postal code center nearest the images' GPS coordinates which may not be the actual postal code the image is located in. That's a compromise I'm willing to live with, but correctly representing the city location of the image is preferred.

I'm an exiftool newbie in the sense this is my first exiftool project, but I have tried to read as much of the resources as possible. Looking for feedback from the experts here.

Thanks in advance for all the help.

FrankB

Quote from: ThankYou on November 21, 2024, 01:26:42 AM@FrankB, I think your suggestion seems like it would be exactly what I'm looking for, but is that something that can be called via the exiftool terminal command?

No. GUI uses the GeoCode webservice to get the data (lat, lon, country, city etc) and ExifTool to update the files.

ThankYou

Quote from: FrankB on November 21, 2024, 01:56:47 AM
Quote from: ThankYou on November 21, 2024, 01:26:42 AM@FrankB, I think your suggestion seems like it would be exactly what I'm looking for, but is that something that can be called via the exiftool terminal command?

No. GUI uses the GeoCode webservice to get the data (lat, lon, country, city etc) and ExifTool to update the files.

Ahhh, I didn't even know ExifToolGUI existed though so thanks for sharing. It looks like it will come in handy if not for this dilema.


Phil Harvey

I spent a couple of hours on this.  It isn't going to work.  I wrote some test code to import the postal code database but the database is horribly incomplete and doesn't match up with the admin codes used in the main database.  There would be no way to get alternate languages to work, and all of the admin regions and city names would be inconsistent.   :(

I think your best bet is to build a database with a zero population threshold using build_geolocation.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

ThankYou

Hi Phil, I appreciate your respone and effort on this whole project, and apologize for the delayed reponse.

I'm wondering if my request could be implemented without reconciling the PostalCode database against the Gazeteer database. This option would limit the output to data contained solely in the PostalCode database, but I think would suit my case and I think other users' as well.

Here is a example of what I'm thinking from a user perspective using some pseudo-code to help explain my use case. Do you think something like this is feasible? Thank you kindly in advance.

1. Download the PostalCode database from this link (allCountries.zip)
https://download.geonames.org/export/zip/

2. Extract allCountries.zip to DIR

3. Execute the build_geolocation utility in the DIR directory using a new option used to select the PostalCode database as an alternate to the Gazeteer database. I think in this case all other build_geolocation options would not apply if the -db option below is used (except for -o) because they would act against data that doesn't exist in the Postalcode dataset, and can be ignored.
build_geolocation -db postal
4. Follow typical steps of locating the geolocation database in the correct directory for exiftool usage.

5. Execute exiftool command using new API command (GeolocPostal) to indicate PostalCode database usage.
exiftool -api GeolocPostal=1 -api Geolocation "-geolocation*" test.jpg
6. Using the test.jpg example on the Exiftool Geolocation Feature page the tool would match this single line in the PostalCode database, and return the results below:
country code postal code place name admin name1 admin code1 admin name2 admin code2 admin name3 admin code3 latitude longitude accuracy
CA J2R Saint-Hyacinthe Northwest Quebec QC Montérégie 16 Saint-Hyacinthe 54048 45.6567 -72.9237 1

The output of the above exiftool command would be limited to the following. Exiftool would need to translate the Country Code to determine the Country since only the CountryCode is saved in the PostalCode database, but there is no need to cross-reference the Gazetteer database in this scenario.
Geolocation City                : Saint-Hyacinthe
Geolocation Region              : Quebec
Geolocation Subregion           : Montérégie
Geolocation Country Code        : CA
Geolocation Country             : Canada
Geolocation Position            : 45.6567, -72.9237
Geolocation Distance            : {recalculate} km
Geolocation Bearing             : {recalculate}

I'm certain I'm misunderstanding some complexities here, but this approach seems to be a simplified option of the geolocation API. The current API command provides the user an incredible amount of flexibility in both building the base database, in querying the data from it, and structuring the output. It has a phenomonal breadth of capability. I see the PostalCode database as an alternative that provides little customization but produces a consistent output across large image sets. My personal preference for the Postal Code database is that it is based on publications from national postal services.

Supported countries: nearly 100 countries are currently supported. New countries area added when the national postal service starts publishing data under a compatible license.

I also understand that there are language limitations when using the PostalCode database but some countries seem to be in the native national language, which I also think would be acceptable for this use case (see below example for Korea, and Russia):
KR    52317    북천면    경상남도    20    하동군    38360    북천면    3836040    35.1119    127.8557    6
KR    52318    북천면    경상남도    20    하동군    38360    북천면    3836040    35.1098    127.8886    6
KR    52319    횡천면    경상남도    20    하동군    38360    횡천면    3836034    35.1245    127.8189    6

RU    431756    Починки    Мордовия Республика    46    БОЛЬШЕБЕРЕЗНИКОВСКИЙ РАЙОН                54.2    45.85    4
RU    431757    Косогоры    Мордовия Республика    46    БОЛЬШЕБЕРЕЗНИКОВСКИЙ РАЙОН                54.2167    45.7833    4
RU    431758    Старые Найманы    Мордовия Республика    46    БОЛЬШЕБЕРЕЗНИКОВСКИЙ РАЙОН                53.6242    46.9492    1
RU    431759    Николаевка    Мордовия Республика    46    БОЛЬШЕБЕРЕЗНИКОВСКИЙ РАЙОН                52.4558    49.2142    4


What are your thoughts?

Phil Harvey

I've been busy recently and don't have time to deal with this in detail at the moment, but I'll come back to this when I have more time.

My idea was the same as yours to adapt build_geolocation to read the postal code database, but I wanted to shoehorn this into the same format as the existing ExifTool Geolocation database.  I think this my be possible, but my first stumbling block was to consistently determine the city name from a postal code database entry.  I also seem to remember that there was an issue with the GPS location being the same for many different postal code entries.  I don't know the best way to deal with this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).