emojis, DOS 8.3 filename

Started by Waexto, March 29, 2019, 10:46:20 PM

Previous topic - Next topic

Waexto

Hello. Right now I use ExifTool 11.33 for Windows.

When there are emojis (or even a single emoji) like smilies and flags for example in the filename, ExifTool will perform its intended tagging action but it converts the image's filename into DOS 8.3 format (all capital letters plus the tilde). One particular metadata I can't tag correctly then, for example is, "-OriginalFileName<${filename;s/\.jpg$//i}", because the short DOS 8.3 filename is tagged instead of the long filename with emojis. I think whatever the tagging job is, ExifTool will always convert to DOS 8.3 filename as long as the original long filename has an emoji.

YouTube for example has some video titles with emojis and I use those video titles as filenames for the thumbnails I downloaded (eg. the hqdefault.jpg, maxresdefault.jpg). And I use ExifTool in a batch file to handle multiple images easily. So if I have many images with emojis, then I won't be able to remember their original long filenames after they are converted to DOS 8.3 format.

Thank you and more power.

Phil Harvey

Thanks for this report, but I don't think there is much I can do about this because of the poor support for special characters in Windows file names in Perl.

You'll have to find some external utility to work around this.  For example, if you could write the filename in UTF8 to a sidecar.txt file, then you could do this in ExifTool:

exiftool "-originalfilename<=%d%f.txt" ...

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

herb

Hello Yagop, hello Phil,

please allow an additional question:
Quotebut it converts the image's filename into DOS 8.3 format
When is this done by Exiftool?

I also use Exiftool 11.33 on a Windows 7 system and exiftool.exe is started with -stay_open by my C++ application.
I did a short test with a filename that contained 1 emoji.
exiftool.exe -iptc:header<${filename} testfile.jpg with proper charsets UTF8 for filename and IPTC worked properly.

What did I understand wrong?

Best regards
Herb

Phil Harvey

Quote from: herb on March 30, 2019, 11:37:06 AM
Quotebut it converts the image's filename into DOS 8.3 format
When is this done by Exiftool?

ExifTool doesn't do this.  It is possible that this is done somehow in the standard libraries that ExifTool uses.

QuoteI also use Exiftool 11.33 on a Windows 7 system and exiftool.exe is started with -stay_open by my C++ application.
I did a short test with a filename that contained 1 emoji.
exiftool.exe -iptc:header<${filename} testfile.jpg with proper charsets UTF8 for filename and IPTC worked properly.

I have seen similar problems where the behaviour seems to depend somehow on the system settings.

Yagop: What version of ExifTool are you using?  Newer versions try to use the Windows-specific I/O libraries if possible, rather than the standard libraries.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Waexto

Thanks to both of you and your examples/tips. So I just then used an external unicode/UTF8 text file (eg. z.txt) wherein it contains the emoji text, then used "-OriginalFileName<=z.txt". It worked and the emojis are tagged inside but the images with the emoji filenames are still converted to DOS 8.3 format. I'm OK with that since I can still rename them back, for example using ExifToolGUI (just tested now too) to view and copy the emoji tags I just tagged earlier.

My intention is to preserve whatever the filenames are no matter how peculiar (unicode/emoji/etc) at least as tags inside, because I lost a few hard disks before and I lost the filenames of my many files even after recovery. Thus if I could embed the filenames, then I could rename them back.

Quote from: Phil Harvey on March 30, 2019, 11:02:45 PMYagop: What version of ExifTool are you using?  Newer versions try to use the Windows-specific I/O libraries if possible, rather than the standard libraries.
Right now I use ExifTool 11.33, and Windows 7 64-bit.

Quote from: Phil Harvey on March 30, 2019, 09:37:43 AMThanks for this report, but I don't think there is much I can do about this because of the poor support for special characters in Windows file names in Perl.
Sad to hear that, but I'm OKwith that if that's the case. A similar emoji case for MKVToolNix where it just recently updated its particular library to handle emojis for the first time.

Again, thanks.

herb

Hello,

@Phil: Thanks for the clarifications.
Just another info: I repeated the test (of my previous post) also with Perl: in detail CitrusPerl 5.24.1 ( and no further package installed) and of cource the Exiftool Perl-package. The decribed error did not occur.

@Yagop: So I would be interested which "environment" you are using that changes the "unicode-filename" to a "dos-8.3-filename"

Thanks and best regards
Herb

Waexto

Quote from: herb on March 31, 2019, 11:31:45 AM@Yagop: So I would be interested which "environment" you are using that changes the "unicode-filename" to a "dos-8.3-filename"
Apologies for very late reply as I don't visit the web quite often anymore nowadays. Not sure if I understood correctly, but here goes. Again, I use Windows 7 SP2 64-bit.

My goal is to preserve the filenames as tags (and any other available info as tags) so that I could recover those info again, like rename back the files to its original filenames, for example after a disastrous hard disk failure ever struck again. Back in time when I don't have ExifTool, I forever lost all the filenames of my accumulated images and other files.

One scenario where I am forced to embed emojis, is for those YouTube videos I download which have emojis in their titles. After my past experiences including those disk failures, I ended up using Matroska as my prefered media container especially because of its very flexible tagging support. Perhaps just last year with the help of the author of chapterEditor, the developer of MKVToolNix updated one of its libararies to be able to handle emojis for its next release. Thus apparently the Matroska format itself and the related third-party programs benefitted.

In my experience, there are two occasions where ExifTool inadvertently converts the filenames to DOS 8.3 format.

  • If a JPG (or any image file) filename contains emojis, and after I use ExifTool to tag them, they are converted to 8.3.
  • If inside a folder I have a filetype (be it an image file, movie file, audio file, doesn't matter) which contains an emoji, and after I use ExifTool to tag an image in that same folder, the filetype is converted to 8.3. Doesn't matter if the image file I tagged with ExifTool is plain Latin alphanumeric-filenamed or if it have an emoji itself, all other filetypes which have an emoji with them as filenames are inadvertently converted to 8.3.
I just ended up tagging the MKVs with the emojis normally (using for example MKVToolNix itself, Mp3tag, chapterEditor), but for the filenames, I simply removed the emojis. They are preserved as tags inside the MKV after all. So that I could prevent any accidental 8.3 conversion. Thus same approach with image files (using ExifTool). And even if I overlooked and accidentally converted to 8.3, I could still recover the original long filenames from the Matroska and JPG tags.

Phil Harvey

Quote from: Yagop on July 30, 2019, 12:35:20 PM
If inside a folder I have a filetype (be it an image file, movie file, audio file, doesn't matter) which contains an emoji, and after I use ExifTool to tag an image in that same folder, the filetype is converted to 8.3.

You're saying that using ExifTool to write to one file in a folder causes the names of other files in the folder to be converted to 8.3 format?  This can't be ExifTool that is doing this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

herb

Hello,

@Yagop: Can you please tell us how you call Exiftool (e.g. via DOS-box) and can you please give an example of your Exiftool command also with an explicitely used filename etc. etc

Thanks and
Best regards
Herb

herb

Hello Phil,

there is really something strange when Exiftool has to work with filenames that contain surrogate characters:

I did the following test with the WIN-version of Exiftool 11.61.
I have a directory those pathname does only contain ascii characters - F:\dirtest -  and in the directory there is 1 image.
The filename contains ascii characters and also an emoij (which is a surrogate character and which is represented here with X) - P11982XX.JPG

Using the following command to e.g. create an IPTC tag
exiftool.exe -charset filename=utf8 -IPTC:Caption-Abstract=caption -progress -ext jpg F:\dirtest
I get as response from Exiftool:
Warning: [Win32::FindFile] No support for unicode surrogates - F:/dirtest
Error renaming F:/Work_Eixm/Emotics/dirtest/P11982~1.JPG


The strange thing now also is:
- the original image is deleted/removed
- an image file is created with 8.3-format filename: P11982~1.JPG_original

Which part of exiftool or Perl can do this?

Hint:
When I start Exiftool and specify the filename (with emoij) explicitely all is working properly.

Thanks for your help in advance.

Best regards
Herb

Phil Harvey

This is unfortunate.  It must be the Win32::FindFile package that is somehow renaming the file.  I've looked into this package and it looks like it just calls the Windows function FindNextFileW.  I can't find any references for problems like this with FindNextFileW, and I don't understand why it should do anything to the file names.  Unfortunately I don't know what I can do to help with this problem.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

obetz

Quote from: Phil Harvey on August 12, 2019, 08:01:57 AM
This is unfortunate.  It must be the Win32::FindFile package that is somehow renaming the file.

The result looks like the intended rename of the original file (there is no -overwrite_original).

Without support for surrogate pairs, it might simply fall back to the short file name.

Quote from: herb on August 12, 2019, 02:35:32 AM
The strange thing now also is:
- the original image is deleted/removed
- an image file is created with 8.3-format filename: P11982~1.JPG_original

I guess it's not "deleted/removed" but renamed.

After all, I never would even consider to use surrogate pairs in file names. I'm sure there are other applications not supporting them correctly. I teached my colleagues, friends and family to use posix compatible file names. Plain ASCII, no whitespace.

Oliver

Phil Harvey

Quote from: obetz on August 12, 2019, 10:01:25 AM
The result looks like the intended rename of the original file (there is no -overwrite_original).

Without support for surrogate pairs, it might simply fall back to the short file name.

You may be right.  I'll look into this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

herb

Hello Phil, hello Oliver

thanks to both of you for looking into this.

In the meantime I did some tests in order to get an overview:
A)
Giving one single file (fully qualified) to Exiftool
- path and/or filename contains also emotics (surrogates)
- with or without option -overwrite_original_in_place
all is working properly

B)
Giving files to Exiftool with (e.g.) -ext jpg and path-information
- path does contain emotics (surrogates)
  I get the following information:
    1 directories scanned
    0 image files read

- only filename does contain emotics (surrogates)
  -- with option -overwrite_orignal_in-place
      All files are updated properly and
      I get a warning - surrogates not supported
     
      ======== F:/Work_Eixm/Emotics/dirtest/P11982~1.JPG [1/2]
      ======== F:/Work_Eixm/Emotics/dirtest/P11982~3.JPG [2/2]
          1 directories scanned
          2 image files updated
      Warning: [Win32::FindFile] No support for unicode surrogates - F:/Work_Eixm/Emotics/dirtest

  -- without option -overwrite _in_place
      Original imagefile is "replaced" with file <8.3-filname>.jpg_original
      ======== F:/Work_Eixm/Emotics/dirtest/P11982~1.JPG [1/2]
      ======== F:/Work_Eixm/Emotics/dirtest/P11982~3.JPG [2/2]
          1 directories scanned
          0 image files read
      Warning: [Win32::FindFile] No support for unicode surrogates - F:/Work_Eixm/Emotics/dirtest
      Error renaming F:/Work_Eixm/Emotics/dirtest/P11982~1.JPG
      Error renaming F:/Work_Eixm/Emotics/dirtest/P11982~3.JPG


Best regards
Herb

Phil Harvey

Ah, interesting.  Thanks Herb.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

I spent 30 minutes trying, but failed to generate a folder name containing an emoji in my VirtualBox Windows 10 running on Mac.  The only technique I found was to press WIN-; or WIN-., but this didn't work in the virtual environment with a Mac keyboard (the right command key is supposed to be the WIN key, but it failed to bring up the emoji panel).

So it will be difficult for me to test this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

obetz

Quote from: Phil Harvey on August 13, 2019, 07:58:53 AM
I spent 30 minutes trying, but failed to generate a folder name containing an emoji in my VirtualBox Windows 10 running on Mac.  The only technique I found was to press WIN-; or WIN-.,

I usually copy/paste such characters from other sources like https://unicode.org/charts/ or https://en.wikipedia.org/wiki/Unicode_block

You can also create the files on a different system and copy them to the VM.

You can try to extract the attached ZIP file in the VM

Oliver

Phil Harvey

Thanks Oliver.  I had tried that, but the emoji character didn't survive through the zip process using the available "zip" command on the Mac.

I put your folder inside a folder called "emoji" and ran this command:

exiftool -artist=me -r emoji

Everything works as expected.  The file is added and the name retains the emoji.  An "_original" file is created with the emoji in the name.

It also works with -overwrite_original added.  But it gives this error when I use -overwrite_original_in_place:


Error opening /home/phil/Desktop/emoji/Emoji_SMP_XXX_folder/Emoji_SMP_XXX_file.jpg

(where XXX are box-looking characters in the command window)

So, basically I get exactly the opposite of what Herb observed.  This is normal for Windows, really, and probably has to do with some system settings somewhere.  This makes it really hard to troubleshoot/fix because it looks like the results are different on different systems.  :(

I never got the error about "No support for unicode surrogates".


Edit:  Oops.  I was running in a Cygwin shell.   I see the problem when I run in cmd.exe.  Now I can try to figure out if I can fix this...
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

obetz

Quote from: Phil Harvey on August 13, 2019, 12:28:33 PM
Thanks Oliver.  I had tried that, but the emoji character didn't survive through the zip process using the available "zip" command on the Mac.

SMP characters not being supported by Mac zip confirms my aversion using silly characters for file names.

Oliver

Phil Harvey

I've just released ExifTool 11.62 with a patch to prevent it from writing files with surrogate characters in their names unless either the file is being renamed or the -overwrite_original_in_place option is used.

The problem was that ExifTool falls back to using the standard i/o library if Win32::FindFile gives an error, and apparently the standard library accesses these files using 8.3 filenames, and when ExifTool renamed the file to add the "_original" suffix, this 8.3 filename got burned in.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

herb

Hello Phil,

Thanks again for looking into this problem and thanks for the corrected version of Exiftool 11.62.
A short test showed that everything is working fine.

Sorry that my answer to your correction comes so late; but I was on a trip with friends and we decided not to have internet.

Please do not misunderstand, but I have an additional small request:
Inside the test directory I had 2 files with different file-extensions - a.jpg and a.jpg_original - and I got from Exiftool.
Warning: [Win32::FindFile] No support for unicode surrogates - F:/Work_Eixm/Emotics/dirtest
Not writing F:/Work_Eixm/Emotics/dirtest/P11982~1.JPG_original
Use -overwrite_original_in_place to write files with Unicode surrogate characters
Not writing F:/Work_Eixm/Emotics/dirtest/P11982~2.JPG
Use -overwrite_original_in_place to write files with Unicode surrogate characters

But Exiftool was asked only to work with extension .jpg and so for me it would be better to avoid the warning about file a.jpg_original.

Please allow also the following hint:
On my Windows 7 system I use Babelmap.exe from http://www.babelstone.co.uk/Software/BabelMap.html to copy a unicode character.
Babelmap supports unicode up to version 12.01.

Thanks in advance
Best regards
Herb

Phil Harvey

Hi Herb,

Quote from: herb on August 19, 2019, 10:08:26 AM
But Exiftool was asked only to work with extension .jpg and so for me it would be better to avoid the warning about file a.jpg_original.

Right.  Thanks.  I'll fix this in the next release.

And thanks for the Babelmap hint.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

herb

Hello Phil,

thanks for the new version containing the requested enhancement.


Best regards
Herb