More Unicode Challenges

Started by jhaneyzz, September 20, 2013, 03:27:49 PM

Previous topic - Next topic

jhaneyzz


More Unicode challenges


I am trying to get this encoding problem solved but I am stuck in an either-or dilemma.

When I issue the following command:

exiftool -T -d "%Y/%m/%d %H:%M:%S" -charset utf8 -f -r -fast2 -P -Title -FileName "/Users/jhaney/Documents/USG MacBook Local/Server Indexing/Problem files/" >  "/Users/jhaney/Documents/USG MacBook Local/Server Indexing/exiftool utf8 tests B.txt"

An output file is correctly created, and it contains data, but there is a strange issue.

When I open the file in BBEdit, the file is (Western Mac OS Roman) and it properly detects the Unit (LF) line breaks, however, while the special symbols in the -FileName fields are correctly identified, the -Title field is messed up.


-FileName                                    -Title
------------------------------------   ----------------------
AquaTough™.black.eps   AquaTough,Ñ¢.pms300.eps
AquaTough™.pms485.eps   AquaTough,Ñ¢.pms485.eps
Durock®Brand.WHITE [Converted].eps   Durock¬ÆBrand.WHITE.eps
Fiberock®Brand.WHITE.eps   Fiberock¬ÆBrand.WHITE.eps
Fiberock®Brand.White [Converted   Fiberock¬Æwhite.eps
D:\My Documents\ʰåÈù¢\Hi-Lo(1) Model (1)   Hi-Lo 2 2012-07-18 Model.eps
HumiTek™ Logo_CMYK.eps   HumiTek,Ñ¢ Logo_CMYK.eps
TUFF-HIDE.eps   TUFF-HIDE¬Æ.eps


When I use the BBedit feature, "Reopen using Encoding", and choose "Unicode (UTF-8)", suddenly the situation is reversed.

-FileName                                    -Title
------------------------------------   ----------------------
AquaTough�.black.eps   AquaTough™.pms300.eps
AquaTough�.pms485.eps   AquaTough™.pms485.eps
Durock�Brand.WHITE [Converted].eps   Durock®Brand.WHITE.eps
Fiberock�Brand.WHITE.eps   Fiberock®Brand.WHITE.eps
Fiberock�Brand.White [Converted   Fiberock®white.eps
D:\My Documents\桌面\Hi-Lo(1) Model (1)   Hi-Lo 2 2012-07-18 Model.eps
HumiTek� Logo_CMYK.eps   HumiTek™ Logo_CMYK.eps
TUFF-HIDE.eps   TUFF-HIDE®.eps


How can I get the -FileName tag, and the -Title tag to be in the same encoding?

Phil Harvey

Due to inconsistent use of encoding in various metadata types, it is not possible to always return consistently encoded results.  In general, the only metadata type with consistent encoding is XMP.  See FAQ 10 for some of the gory details.  EPS files aren't mentioned in the FAQ, which means that the strings are passed through without recoding.

Also, file names in Windows are a known problem.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).