No support for unicode surrogates | emoji

Started by Anonan, January 01, 2019, 01:58:36 PM

Previous topic - Next topic

Anonan

The program throws the exception "No support for unicode surrogates at script/exiftool line 3553." when you use it on files that contain emoji in a file name.

The examples of file names: (see the attachment)".
This forum also does not support emoji (I can't post here examples of file names that contain emoji.).


And yes, I don't like emoji too. I don't use them, but other people do. So the support of this is needed.

Phil Harvey

Windows special characters are really a pain.  (I'm assuming you are on Windows.)

What version of ExifTool are you using?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Anonan

11.2.2.0 and 11.2.3.0 (I have tested this version right now. The result is the same). Yes, I use Windows 10.

I have also tried use both cmd.exe and Git Bash.

Anonan

It also does not support symbols like https://en.wiktionary.org/wiki/º (Do not confuse with https://en.wikipedia.org/wiki/Degree_symbol, ExifTool sees ° normally.)
Example of file name: "360º Test.mp4"
In this case the program just write "No matching files".

StarGeek

Quote from: Alternation on January 01, 2019, 03:03:00 PM
It also does not support symbols like https://en.wiktionary.org/wiki/º (Do not confuse with https://en.wikipedia.org/wiki/Degree_symbol, ExifTool sees ° normally.)
Example of file name: "360º Test.mp4"
In this case the program just write "No matching files".

This would seem to be a FAQ #18 answer, as when I change the code page to 65001, it works fine.

C:\>exiftool -g1 -a -s -PNG:all "Y:\!temp\bb\360º Test.png"
---- PNG ----
ImageWidth                      : 336
---- PNG ----
ImageWidth                      : 336
ImageHeight                     : 509
BitDepth                        : 8
ColorType                       : Grayscale with Alpha
Compression                     : Deflate/Inflate
Filter                          : Adaptive
Interlace                       : Noninterlaced
Gamma                           : 2.2
WhitePointX                     : 0.3127
WhitePointY                     : 0.329
RedX                            : 0.64
RedY                            : 0.33
GreenX                          : 0.3
GreenY                          : 0.6
BlueX                           : 0.15
BlueY                           : 0.06
BackgroundColor                 : 255
Label                           : FinalDesignArt
ModifyDate                      : 2018:11:15 11:02:46
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

I can't figure out that line number.  Line 3553 of exiftool version 11.22 doesn't do anything that could possibly generate a warning like that. :/

I guess I'll have to try this myself when I can.

What was the exact command you used?  (Maybe do a screen grab of the command and the warning you get.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Anonan

It's strange, but today I have the exception on line 3547. (The result is the same for both 11.2.2 and 11.2.3; Win 10, RUS; "chcp 65001" does not effect on results).

I run "exiftool.exe *". And there is one or more files with emoji in a name in the folder, within that I run the command.
File names: https://pastebin.com/gtNj96mg (I can not post them here, In other way I get the forum error "The message body was left empty.")
Finally I get:
"No support for unicode surrogates at script/exiftool line 3547."
No more results are in a console.



> Maybe do a screen grab of the command and the warning you get.
Ok, I will do this later.

Phil Harvey

OK.  Line 3547 would be an error in the Win32::FindFile package.  There isn't much I can do about this.

Try not using wildcards when you specify file names on the command line.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Anonan

#8
Oh, wait. The error on line 3553 occurs when I just use "exiftool.exe FILENAME".
The wildcard usage works fine, when where are not files with these names.

Look at the attachment.
(Mirror: https://i.imgur.com/opg7Rj9.png)

CMD displays emoji incorrectly, but works with it correctly.
I can even copy these ⍰⍰ and paste to a text editor that supports a displaying unicode surrogates, and see the correct "icon".

Or I can use the command to concat all files to one – "copy /b *.txt concated.txt" and this command works fine, even if file names contain unicode surrogates (CMD just displays them like ⍰⍰).

Phil Harvey

OK.  The underlying problem is that Win32::FindFile does not support these surrogate codes.  The reason I'm using Win32::FindFile in the first place is because of the lack of built-in support in ActivePerl for Windows Unicode file names.  The situation is unfortunate, but one possible work-around could be to create a hard link with a plain ASCII name to the file with the surrogate characters, then run exiftool on the hard link.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Anonan

Can this program just skip the files with unicode surrogates in a name without stopping work?
And at the end write the names of the files that were skipped to be processed manually by me.

I need to get meta info from a lot of files and only rare files contain unicode surrogates in its name, but the program does not work at all in this case.

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

I've managed to reproduce this.  (The hardest part was figuring out how to create a file with a surrogate character in its name.  I finally did it by creating the file on a Mac then sending it to the Windows machine.)

I will patch ExifTool 11.24 to catch this error from Win32::FindFile and issue a warning or error instead.

Thanks for this report.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Anonan

> And at the end write the names of the files that were skipped to be processed manually by me.
Probably it's better show them also at the start (in "err" stream) to be able to stop the program, fix the names and restart the program. In order not to run twice.
Since the work of the program can take some minutes, when you have several gigabytes of data.


> The hardest part was figuring out how to create a file with a surrogate character in its name.
For example, right click in Chrome/Opera on a text input and the first option in the context menu.




Phil Harvey

Quote from: Anonan on January 02, 2019, 11:11:56 AM
Probably it's better show them also at the start (in "err" stream) to be able to stop the program, fix the names and restart the program.

This is problematic.  For one, there will likely be a problem interpreting the file name(s) in the ExifTool stderr messages due to character set problems.  I'll be outputting these messages in UTF-8.  The other thing is that it would be very hard for me to find these files beforehand.  So you will unfortunately be stuck trying to process them in a second pass.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).