Spelling of German cities

Started by herb, March 19, 2024, 04:25:25 AM

Previous topic - Next topic

Phil Harvey

Thanks for this report.  You're right that a bug in Geolocation.pm doesn't properly support additional feature codes in searches.  Here is a patch that fixes this:

diff -u -r1.95 Geolocation.pm
--- lib/Image/ExifTool/Geolocation.pm    21 Apr 2024 21:49:26 -0000    1.95
+++ lib/Image/ExifTool/Geolocation.pm    22 Apr 2024 16:27:29 -0000
@@ -527,6 +527,7 @@
 #
 # perform reverse Geolocation lookup to determine GPS based on city, country, etc.
 #
+    my $fbits = $dbVer eq '1.02' ? 0x0f : 0x1f;
     while (defined $city and (@coords != 2 or $both)) {
         my $cargs = join(',', @cargs, $pop||'', $maxDist||'', $fcodes||'');
         my $i = 0;
@@ -574,7 +575,7 @@
                 $str !~ $_ or next Entry foreach @{$regex{19}};
             }
             # test feature code and population
-            next if $fcmask and not $fcmask & (1 << (ord(substr($cityList[$i],12,1)) & 0x0f));
+            next if $fcmask and not $fcmask & (1 << (ord(substr($cityList[$i],12,1)) & $fbits));
             my $pc = substr($cityList[$i],6,2);
             if (not defined $minPop or $pc ge $minPop) {
                 $lastFound{$i} = $pc;
@@ -642,7 +643,7 @@
         abs($lt - $lat) > $minDistC and $n = $end - $inc, next;
         # ignore if population is below threshold
         next if defined $minPop and $minPop ge substr($cityList[$i],6,2);
-        next if $fcmask and not $fcmask & (1 << (ord(substr($cityList[$i],12,1)) & 0x0f));
+        next if $fcmask and not $fcmask & (1 << (ord(substr($cityList[$i],12,1)) & $fbits));
         $ln = ($ln << 4) | ($f & 0x0f);
         # calculate great circle distance to this city on unit sphere
         my ($p1, $t1) = ($lt*$pi/0x100000 - $pi/2, $ln*$pi/0x080000 - $pi);

With this patch, I get the following:

> exiftool -api geolocation=52.4,-1.5 -api GeolocFeature=RSTN,RSTNQ,RSTP,RSTPQ
Geolocation City                : Coventry Railway Station
Geolocation Region              : England
Geolocation Subregion           : Coventry
Geolocation Country Code        : GB
Geolocation Country             : United Kingdom
Geolocation Time Zone           : Europe/London
Geolocation Feature Code        : RSTN
Geolocation Population          : 0
Geolocation Position            : 52.4008, -1.5141
Geolocation Distance            : 0.96 km
Geolocation Bearing             : 273

> exiftool -api geolocation=52.4,-1.5 -api GeolocFeature=ADM1,ADM2,ADM3,ADM4,ADM5
Geolocation City                : Baginton
Geolocation Region              : England
Geolocation Subregion           : Warwickshire
Geolocation Country Code        : GB
Geolocation Country             : United Kingdom
Geolocation Time Zone           : Europe/London
Geolocation Feature Code        : ADM4
Geolocation Population          : 0
Geolocation Position            : 52.3634, -1.4849
Geolocation Distance            : 4.19 km
Geolocation Bearing             : 158

using a database built with an unmodified build_geolocation script using this command:

build_geolocation -c RSTN,RSTNQ,RSTP,RSTPQ,ADM1,ADM2,ADM3,ADM4,ADM5 -l '' -v DIR

I don't have the problems you mention with build_geolocation, and it builds the v1.03 database properly, as seen in the verbose output:

Languages to read from input database(s):
  <none>
Parameters for reading scripts/geolocation_dir//allCountries.txt:
  Minimum populations (??=any country):
    ??=2000
  Features to keep regardless of population:
    ??=ADM1,ADM2,ADM3,ADM4,ADM5,RSTN,RSTNQ,RSTP,RSTPQ
  Features to keep for population >= minimum:
    ??=PPL,PPLA,PPLA2,PPLA3,PPLA4,PPLA5,PPLC,PPLCH,PPLF,PPLG,PPLH,PPLL,PPLQ,PPLR,PPLS,PPLW,PPLX,STLMT
Reading scripts/geolocation_dir//allCountries.txt... Done.
Some feature codes not supported by version 1.02, writing as 1.03 instead.
  000240 countries  (0x00f0)
  003707 regions    (0x0e7b)
  036948 subregions (0x9054)
Not writing alternate languages (scripts/geolocation_dir//alternateNamesV2.txt not found)
Processing scripts/geolocation_dir//allCountries.txt... Done.
Writing scripts/geolocation_dir//Geolocation_out/Geolocation.dat (version 1.03)...
Features kept:
   246 (22) ADM1
 22312 (21) ADM2
122684 (20) ADM3
207208 (23) ADM4
 14895 (27) ADM5
 56653 ( 1) PPL
  3063 ( 2) PPLA
 13453 ( 3) PPLA2
  9405 ( 4) PPLA3
  3726 ( 5) PPLA4
    21 ( 6) PPLA5
   224 ( 7) PPLC
     1 ( 8) PPLCH
    16 ( 9) PPLF
     6 (10) PPLG
    10 (11) PPLH
   202 (12) PPLL
    18 (13) PPLQ
     2 (14) PPLR
    14 (15) PPLS
     4 (16) PPLW
  4811 (17) PPLX
 66128 (19) RSTN
  1858 (25) RSTNQ
  9252 (24) RSTP
   358 (26) RSTPQ
    25 (18) STLMT
Output file size(s):
   15.23 MB Geolocation.dat (536595 entries)

I would revert your modifications to build_geolocation because I think they may be causing problems.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

I've just released ExifTool 12.84 which should fix the problem of the above patch.  This version also increases the maximum number of named feature codes to 64 (since you were starting to get close to the old limit of 32 will all the additional codes you were adding).

And I've improved build_geolocation to allow backslashes in directory names, which you may have had problems with in Windows. @Marsu42:  If you had other issues, please tell me -- I'm guessing here.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

OK.  I broke down and loaded the necessary geonames files into my Windows virtual machine to test this in Windows.  Yes, there was a problem due to the different floating-point print format in ActivePerl which messed up the population exponent.  Here is a version of build_geolocation that fixes this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Marsu42

Quote from: Phil Harvey on April 24, 2024, 08:11:19 AMOK. I broke down and loaded the necessary geonames files into my Windows virtual machine to test this in Windows.

Great, I can report 12.84 is geolocating correctly with "-api GeolocFeature=RSTN,RSTP" or "-api GeolocFeature=ADM3"

Your new script now correctly auto-upgrades the db to 1.03 (using Strawbery Perl 5.38 portable).

I'm didn't test if country-specific -p, -c and -cp are working on Windows because I've changed the %defaults inside the perl file to "kiss".

I didn't try geolocating natural features like mountains yet because I have to get a grip on how complete the Geonames sources are... but of course that's not exiftool's issue.

This is much more powerful than I could have ever hoped for, and I hope lots of people make use of this!