FileName encoding not specified when changing timestamps of files with ¿¿

Started by EM1336, December 28, 2018, 06:28:32 AM

Previous topic - Next topic

EM1336

Hello, I'm using a command string in ExifToolGUI to change the timestamps of genealogical scanned photos to match the dates when the photos were taken.  For each group of photos from a specific date (for example, photos from Norwegian Xmas 1972), I simply modify the string and re-execute:

-filecreatedate="1972:12:24 14:15:16" -filemodifydate="1972:12:24 14:15:16" -alldates="1972:12:24 14:15:16"

Many photos have an unknown exact date; I only know the month and year, or sometimes I only know the year, so I use two Spanish inverted question marks, ¿¿, as ISO 8601 placeholders:

1972-12-¿¿, pic01
1935-¿¿-¿¿, pic01

Note:  The reason why I'm using the upside-down question marks is because no other alternative question mark symbol (1) Displays properly in Windows Explorer, (2) Displays properly in Google Drive and other cloud storage, and (3) Doesn't cause ExifToolGUI to freak out.  For example, the Unicode double question mark symbol U+2047 causes ExifToolGUI to crash on my computer!  I could use "xx" instead of "¿¿" but then the filenames look naughty.

When I run the command string on the files with exact dates, everything proceeds perfectly.  However, when I run the command string on the files with estimated dates that have the upside-down question marks, ExifToolGUI changes the timestamps properly, but hassles me with every file.  For example:

======== ./1972-12-�� pic01.jpg
1 image files updated
Warning: FileName encoding not specified - ./1972-12-¿¿ pic01.jpg


I know this has something to do with Latin or UTF8 character sets but I can't figure out the syntax to make the warning disappear.  (I prefer to use UTF8 exclusively if at all possible.)  What do I add to the end of my command string?  I've tried the following at the end of the string and none of these have worked:

-charset UTF8
-charset=UTF8
-charset filename=UTF8
-charset UTF8 -ext jpg .
-codedcharacterset UTF8
-codedcharacterset=UTF8
-charset=UTF8 -codedcharacterset=UTF8

What should I be typing instead at the end of my string?

Thank you.

Phil Harvey

If it works, then you must be specifying the filename using your system code page (whatever that is), and you can just ignore the warning.  But specifying -charset filename=YOUR_SYSTEM_CODE_PAGE should get rid of the warning.  Clearly this isn't UTF-8 if you are getting this warning.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

EM1336

I had no idea what my system code page is, so I searched and found these two commands to display your ANSI (Windows) and OEM (console) code pages:

reg query HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage /v OEMCP
My result is 437.

reg query HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage /v ACP
My result is 1252.

The command chcp displays only your OEM code page, and you can change it temporarily to UTF8 (code page 65001):

chcp
Active code page:  437

chcp 65001
Active code page:  65001

I learned that it is not a good idea to change your code pages to UTF8 globally (in the Registry), as it typically makes Windows unbootable with all of its legacy code and dependencies.  Furthermore, even changing the console code page to 65001 temporarily can cause problems with certain commands such as find, more, and anything involving piping or redirection.

I looked into the inverted question mark symbol and it is part of extended ASCII (128-255), character # 168.

So, after this crash course on everything I ever wanted to know about code pages, I tried every combination of active code page (437, 1252, or 65001) and one of the following commands (I use the 15th if I don't know the exact date):

-filecreatedate="1972:12:15 14:15:16" -filemodifydate="1972:12:15 14:15:16" -alldates="1972:12:15 14:15:16" -charset filename=437
-filecreatedate="1972:12:15 14:15:16" -filemodifydate="1972:12:15 14:15:16" -alldates="1972:12:15 14:15:16" -charset filename=1252
-filecreatedate="1972:12:15 14:15:16" -filemodifydate="1972:12:15 14:15:16" -alldates="1972:12:15 14:15:16" -charset filename=65001

So, 3 possible active code pages, 3 commands, 9 different combinations, and none of them made the annoying warnings go away.

I guess don't fix what ain't broke.  ExifToolGUI changes the timestamps just fine to 1972-12-15, in spite of the filenames 1972-12-¿¿ pic01, 1972-12-¿¿ pic02, 1972-12-¿¿ pic03, and so on generating the weird warning messages.

Phil Harvey

Code page 1252 is supported by ExifTool, but 437 is not.  But the syntax is

-charset filename=cp1252

or

-charset filename=latin

or

-charset filename=latin1

See FAQ 10 for a complete list of available character sets and some help.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

EM1336

Success!  Thank you for your help.

I tried each of the suffixes you posted (I'm assuming they are different versions of the same code page command) on my ¿¿-containing filename and all of them worked.  No annoying warning message anymore.

-filecreatedate="1972:12:24 14:15:16" -filemodifydate="1972:12:24 14:15:16" -alldates="1972:12:24 14:15:16" -charset filename=cp1252
-filecreatedate="1972:12:24 14:15:16" -filemodifydate="1972:12:24 14:15:16" -alldates="1972:12:24 14:15:16" -charset filename=latin
-filecreatedate="1972:12:24 14:15:16" -filemodifydate="1972:12:24 14:15:16" -alldates="1972:12:24 14:15:16" -charset filename=latin1

I looked at FAQ 10 and couldn't make heads or tails of what I was reading.  I do understand that -charset filename=CHARSET (such as -charset filename=cp1252) changes the external filename encoding.  What was the original external filename encoding?  UTF8?  Is the ¿ character not included in UTF8, but is included in cp1252?

What is the internal filename encoding?

How does cp437 fit into the picture, if it's incompatible with ExifTool?

As I try to read up on this topic, I'm more and more confused.

So at this point, I don't really understand what's happening under the hood and why the ¿ character needs -charset filename=cp1252 to suppress that warning, but it works and I'm happy.

Phil Harvey

-cp1252, -latin and -latin1 are all equivalent.

-charset filename specifies the character set used for file names on the command line.  The internal coding used for filenames is not relevant, but it is probably UCS-2.

cp437 is an old MS-DOS character set, generally not used by Windows.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).