Spelling of German cities

Started by herb, March 19, 2024, 04:25:25 AM

Previous topic - Next topic

herb

Hello Phil,

I started to play with the new geolocation feature. Thanks again for it.

Sorry being so nitpicking, but the German spelling of some city-names should be changed:
The character sequence "oe" and "ue" on the right side should be "ö" resp. "ü".

In file GeoLang\de.pm we have in line
348     'Donauwörth' => 'Donauwoerth',        should be 'Donauwörth' as on the left side
                                              so this line can be omitted
945     'Munich' => 'Muenchen',                         'München'
956     'Mülheim' => 'Muelheim an der Ruhr',            'Mülheim an der Ruhr'

Thanks in advance
Best regards
herb

Phil Harvey

Hi Herb,

I agree that the geonames.org database isn't as good as one would have hoped.  The only way to really fix this is to contribute to the geonames.org Creative Commons database.

But I have given you a way to fix this locally by including the ability to add user-defined geoname translations.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

herb

Hello Phil,

thanks for the info.

Best regards
herb

FrankB

@Herb

I'm from the Netherlands, we dont have that many cities with 'umlauts'. But I can relate to your remark, because I visit Germany regularly.

Now I understand that you want your cities correctly spelled, but there is something else that I even find more annoying. Suppose I'm searching for a city: Should I type in 'Munich', 'Muenchen' or 'München'?
The solution for this, and I agree fully with Phil, is to correct the data in the source, being GeoNames. But how do you find all cities that should be corrected?

I may have a solution, but I would like to know your opinion.
In ExifToolGui (That you also use) you can search for a city using Geocode.maps.co That provider is very good at finding a city. No matter what I type 'Munich', 'Muenchen' or 'München' it will find it, and you have the coordinates.

For the reverse (finding a city by coordinates) you can use 'Overpass api'. That uses Openstreetmap data. Overpass display the citynames better.

Now what I could do, in GUI or a standalone program, is get a list of all cities with their coordinates. (In Germany with an umlaut ü ö ä for example) And then run 2 queries, 1 in Overpass and 1 with Exiftool geolocation.
That would give you a list of possible errors.

You can take some samples in ExifToolGui. If you dont have a geocode Api key, and dont want to register I can send you that in a PM.

Frank

Geocode search: Using 'Munich, de', 'Muenchen, de' or 'München, de'
geocode search.jpg

Overpass Api result:
overpass api.jpg

Phil Harvey

geonames.org provides a more comprehensive list of alternate names for a city.  Below are the alternate names for Munich.  Perhaps in a future version I can add support for these, but the database size will increase significantly, and the performance will take a hit.

- Phil

Lungsod ng Muenchen, Lungsod ng München, MUC, Minca, Minche, Minga, Minhen, Minhene, Minkhen, Miunchenas, Mjunkhen, Mnichov, Mnichow, Mníchov, Monachium, Monacho, Monaco de Baviera, Monaco di Baviera, Monaco e Baviera, Monacu, Monacu di Baviera, Monacum, Muenchen, Muenegh, Muenhen, Muenih, Munchen, Munhen, Munic, Munich, Munich ed Baviera, Munih, Munike, Munique, Munix, Munkeno, Munkhen, Munîh, Mynihu, Myunxen, Myunxén, Mònacu, Mùnich ëd Baviera, Múnic, Múnich, München, Münegh, Münhen, Münih, mi wnik, mi'unikha, miunkheni, miyunik, mu ni hei, mwinhen, mwnykh, mynkn, myunhen, myunik, myunikha, myunsena, mywnkh, mywnykh, Μόναχο, Минхен, Мюнхен, Мүнхен, Мүнхэн, Мӱнхен, Մյունխեն, מינכן, مونیخ, ميونخ, ميونيخ, میونخ, म्युन्शेन, म्यूनिख, মিউনিখ, மியூனிக், ಮ್ಯೂನಿಕ್, มิวนิก, မြူးနစ်ချ်မြို့, მიუნხენი, ミュンヘン, 慕尼黑, 뮌헨
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

But if you have a list of possibilities, you could pass them all to ExifTool using a regular expression, like this:

> exiftool -api geolocation="ci/^(Munich|Muenchen|München)$/"
Geolocation City                : Munich
Geolocation Region              : Bavaria
Geolocation Subregion           : Upper Bavaria
Geolocation Country Code        : DE
Geolocation Country             : Germany
Geolocation Time Zone           : Europe/Berlin
Geolocation Feature Code        : PPLA
Geolocation Population          : 1300000
Geolocation Position            : 48.1375, 11.5755

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

#6
I have a test version working that includes a check for all alternate city names supplied by geonames.org.  This adds 12 MB to the database size, and increases processing time for a single search by 75%.

- Phil

Edit: here is an example

> time exiftool -api geolocation="big apple"
Geolocation City                : New York City
Geolocation Region              : New York
Geolocation Country Code        : US
Geolocation Country             : United States
Geolocation Time Zone           : America/New_York
Geolocation Feature Code        : PPL
Geolocation Population          : 8800000
Geolocation Position            : 40.7143, -74.0060
0.532u 0.026s 0:00.56 98.2% 0+0k 0+0io 0pf+0w

Edit2: I can get the database size down to 7.5 MB (3.2 MB zipped) if I throw out entries with very high Unicode codepoints (first character 0xd0 or higher), so maybe this is do-able as an optional ExifTool feature.  But this doesn't improve the performance much.  As an example, this reduces the Munich alternates to the following:

Lungsod ng Muenchen, Lungsod ng München, MUC, Minca, Minche, Minga, Minhen, Minhene, Minkhen, Miunchenas, Mjunkhen, Mnichov, Mnichow, Mníchov, Monachium, Monacho, Monaco de Baviera, Monaco di Baviera, Monaco e Baviera, Monacu, Monacu di Baviera, Monacum, Muenchen, Muenegh, Muenhen, Muenih, Munchen, Munhen, Munic, Munich, Munich ed Baviera, Munih, Munike, Munique, Munix, Munkeno, Munkhen, Munîh, Mynihu, Myunxen, Myunxén, Mònacu, Mùnich ëd Baviera, Múnic, Múnich, München, Münegh, Münhen, Münih, mi wnik, mi'unikha, miunkheni, miyunik, mu ni hei, mwinhen, mwnykh, mynkn, myunhen, myunik, myunikha, myunsena, mywnkh, mywnykh, Μόναχο
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Marsu42

#7
Quote from: Phil Harvey on March 20, 2024, 02:13:11 PMI have a test version working that includes a check for all alternate city names supplied by geonames.org.  This adds 12 MB to the database size, and increases processing time for a single search by 75%.

Being from Germany, too, I'm very happy to see this problem/feature might get resolved.

Just my 2ct: As long as the results are accurate (maybe even including smaller locations <500 residents), I personally don't care about db size and performance. In my dslr image proceessing workflow, I do a lot of processing with exiftool before importing into Lightroom - so it's very slow anyway.

For higher performance and simple commands, I'm using native c exiv2 - so perl & exiftool is for "features" :-)

Quote from: Phil Harvey on March 20, 2024, 11:38:31 AMLungsod ng Muenchen, Lungsod ng München, MUC, Minca, Minche, Minga, Minhen, Minhene, Minkhen, Miunchenas, Mjunkhen, Mnichov, Mnichow, Mníchov, Monachium, Monacho, Monaco de Baviera, Monaco di Baviera, Monaco e Baviera, Monacu, Monacu di Baviera, Monacum, Muenchen, Muenegh, Muenhen, Muenih, Munchen, Munhen, Munic, Munich, Munich ed Baviera, Munih, Munike, Munique, Munix, Munkeno, Munkhen, Munîh, Mynihu, Myunxen, Myunxén, Mònacu, Mùnich ëd Baviera, Múnic, Múnich, München, Münegh, Münhen, Münih, mi wnik, mi'unikha, miunkheni, miyunik, mu ni hei, mwinhen, mwnykh, mynkn, myunhen, myunik, myunikha, myunsena, mywnkh, mywnykh, Μόναχο, Минхен, Мюнхен, Мүнхен, Мүнхэн, Мӱнхен, Մյունխեն, מינכן, مونیخ, ميونخ, ميونيخ, میونخ, म्युन्शेन, म्यूनिख, মিউনিখ, மியூனிக், ಮ್ಯೂನಿಕ್, มิวนิก, မြူးနစ်ချ်မြို့, მიუნხენი, ミュンヘン, 慕尼黑, 뮌헨

To improve performance, might it be possible to locally prune the db and only leave one (or specified) language(s)?

It's great to have apps support unicode and multiple languages, but realistically, I will keep being German for the remainder of this life...

Phil Harvey

It may be possible to generate the alternate names based on the names used in a specific country. I'll keep this in mind as I continue to play with this new feature.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

Quote from: Marsu42 on March 24, 2024, 04:50:15 PMTo improve performance, might it be possible to locally prune the db and only leave one (or specified) language(s)?

The full ExifTool 12.83 distribution includes a new "build_geolocation" utility script which allows you to generate your own database with different population limits and different included features on a per-country/region basis.  It also allows you to specify which languages to include/exclude.  For more information, read the last 3 paragraphs of the "Alternate databases" section here.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Marsu42

Quote from: Phil Harvey on April 18, 2024, 11:55:12 AM
Quote from: Marsu42 on March 24, 2024, 04:50:15 PMTo improve performance, might it be possible to locally prune the db and only leave one (or specified) language(s)?
For more information, read the last 3 paragraphs of the "Alternate databases" section here.

Thanks, sounds nice, but what puzzles me is...

1. the new db link from https://exiftool.org/geolocation.html#Alt still points to the old Geolocation500-1.02.zip ... took me a while to realize, the correct new link is https://exiftool.org/Geolocation500-20240417.zip

2. the Geolocation::altDir description is gone from the above page, but still referred on https://exiftool.org/ExifTool.html#Geolocation - is this still necessary to set to the specific location of AltNames.dat, or is this found simply by putting it in the same location als Geolocation.dat?


Phil Harvey

Quote from: Marsu42 on April 19, 2024, 02:28:02 PM1. the new db link from https://exiftool.org/geolocation.html#Alt still points to the old Geolocation500-1.02.zip ... took me a while to realize, the correct new link is https://exiftool.org/Geolocation500-20240417.zip

Ooops, sorry.  Fixed.

Quote2. the Geolocation::altDir description is gone from the above page, but still referred on https://exiftool.org/ExifTool.html#Geolocation - is this still necessary to set to the specific location of AltNames.dat, or is this found simply by putting it in the same location als Geolocation.dat?

The altDir may still be used, but I'm removing it from the documentation because I've combined these into a single download file.  With 12.83 only geoDir needs to be set if both are in the same directory.  I will remove the altDir references from the documentation.

Thanks for pointing this out.  I spend nearly as much time updating the documentation as I do programming these features, but it seems I have missed a few things here.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Marsu42

Quote from: Marsu42 on March 24, 2024, 04:50:15 PMThe full ExifTool 12.83 distribution includes a new "build_geolocation" utility script which allows you to generate your own database with different population limits and different included features on a per-country/region basis.

The customization is great, I've got populated places working now.

1. Your sample for -cp CODE in https://exiftool.org/build_geolocation.txt seems odd because "if above minimum population" is proably never valid for historical, abandoned or destroyed placces? In my command line, I've moved +PPLX,+PPLCH,+PPLH,+PPLW,+PPLQ into -c

2. Is this system set up to work only with populated places, or for everything that is included in the geonames db - natural or other places (https://www.geonames.org/export/codes.html)?

I'm trying to geolocate the next railroad station/stop to whereever I've taken the picture... this is not a mere test, this is actually something that would be useful for me.

For the sublocation XMP-iptcCore:Location mentioned in https://exiftool.org/forum/index.php?topic=15898.0 I'd be happy to copy the geolocation result there if I could geolocate any natural place to the default XMP-photoshop:City

Phil Harvey

#13
Quote from: Marsu42 on April 19, 2024, 04:14:29 PM1. Your sample for -cp CODE in https://exiftool.org/build_geolocation.txt seems odd because "if above minimum population" is proably never valid for historical, abandoned or destroyed placces? In my command line, I've moved +PPLX,+PPLCH,+PPLH,+PPLW,+PPLQ into -c

Fair point.  I really don't don't know if these should be included anyway, so I'm OK with this.  Edit: Actually, there are 10 historical, 18 abandoned and 4 destroyed features in the database with populations >= 2000, but these will show up as "Other" in an ExifTool v1.02 database.

The option should be like -c +pplx,pplch,pplh,pplw,pplq  (the "+" is only at the start of the list, case is not significant).

Quote2. Is this system set up to work only with populated places, or for everything that is included in the geonames db - natural or other places (https://www.geonames.org/export/codes.html)?

You can add any other feature codes you want.  You will notice that the database will automatically write version 1.03 if you do this, but ExifTool 12.83 will read this OK.  Version 1.03 can store a maximum of 32 different feature codes.  You can include more than this, but they will be saved as "Other".

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Marsu42

Quote from: Phil Harvey on April 19, 2024, 04:22:23 PMYou can add any other feature codes you want.  You will notice that the database will automatically write version 1.03 if you do this, but ExifTool 12.83 will read this OK.  Version 1.03 can store a maximum of 32 different feature codes.  You can include more than this, but they will be saved as "Other".

Quote from: Phil Harvey on April 21, 2024, 09:33:34 PMVersion 12.84 will have a cool new ability

Speaking of 12.84: I've tried to build a 1.03 db including RSTN,RSTNQ,RSTP,RSTPQ,ADM1,ADM2,ADM3,ADM4,ADM5 and select only these with the -api GeolocFeature=RSTN,RSTNQ,RSTP,RSTPQ to get the next railway stop and -api GeolocFeature=ADM1,ADM2,ADM3,ADM4,ADM5 to get the current county.

Even building the db (using Windows perl) was a pita, this probably was the cause of bugs later on. I had to change the defaults in build_geolocation and it wouldn't auto-update from 1.02 to 1.03 either way - but the fields are at the end of Geolocation.dat

However, with 12.83 geolocating is buggy with this db - when using -api GeolocFeature=PPL results are returned from RSTN,... and ADMx with the population of PPL,... so I had to revert to a db with only populated places like the 1.02 defaults.

I gave up at that point, and this is not a complete or usable bug report - but if you have some time on your hands, maybe you could double-check that this is actualy working for one of the next releases.