News:

2023-08-10 - ExifTool version 12.65 released

Main Menu

Unicode2UTF8 bug?

Started by Archive, May 12, 2010, 08:54:06 AM

Previous topic - Next topic

Archive

[Originally posted by humyo on 2007-06-24 21:31:30-07]

The final character of the values of ID3v2 tags in certain MP3s are stripped by exiftool.

An example file which exhibits this may be found at http://www.dansplace.co.uk/sample.mp3" target="_blank">http://www.dansplace.co.uk/sample.mp3. In the exiftool output the genre is 'Roc' not 'Rock' etc.:

Code:
# exiftool -g1 sample.mp3
-cut-
---- ID3v2_3 ----
Band                            : Girls Alou
Title                           : I Think We're Alone No
Genre                           : Roc
Track                           : 1
Encoder Settings                : Audiograbber 1.80, LAME dll 3.93, 128 Kbit/s, Joint Stereo, Normal quality
Album                           : The Sound of Girls Aloud - The Greatest Hit
Artist                          : Girls Alou
Year                            : 200
Publisher                       : Fascinatio
Comment                         : BF0D120
Picture                         : (Binary data 60286 bytes, use -b option to extract)
-cut-

id3v2 (http://id3v2.sourceforge.net/" target="_blank">http://id3v2.sourceforge.net/) extracts the values correctly (with the final character) so I think the values are stored in the MP3 correctly:

Code:
# id3v2 -l sample.mp3
-cut-
TPE2 (Band/orchestra/accompaniment): Girls Aloud
TIT2 (Title/songname/content description): I Think We're Alone Now
TCON (Content type): Rock (17)
TRCK (Track number/Position in set): 15
TSSE (Software/Hardware and settings used for encoding): Audiograbber 1.80, LAME dll 3.93, 128 Kbit/s, Joint Stereo, Normal quality
TALB (Album/Movie/Show title): The Sound of Girls Aloud - The Greatest Hits
TPE1 (Lead performer(s)/Soloist(s)): Girls Aloud
TYER (Year): 2006
TPUB (Publisher): Fascination
WCOM (Commercial information): http://www.amazon.co.uk/gp/redirect.html%3FASIN=B000JFXT72%26tag=softpointer-20%26lcode=xm2%26cID=2025%26ccmID=165953%26location=/o/ASIN/B000JFXT72%253FSubscriptionId=0RXJS26C80QSDEB56CR2
COMM (Comments): ()[eng]: BF0D120F
APIC (Attached picture): ()[, 3]: image/jpg, 60286 bytes

I have tracked this to the line:

   
Code:
$outVal = pack('C0U*',unpack("$fmt*",$val));

In Unicode2UTF8(). The value is truncated by this code. Tested with perl 5.8.7 and 5.8.8. I understand the intention of the code but I don't know enough about UTF8 in perl to know why it is causing the last character to be stripped.

Does anyone have any ideas? Does this occur on earlier versions of perl without native utf8?

Archive

[Originally posted by exiftool on 2007-06-25 12:45:36-07]

Thanks for pointing out this problem.  And thanks for the sample, it
was essential in helping me to figure this out -- I didn't have an MP3
sample with little-endian unicode text, and the problem was in
decoding this specific text encoding.  I was truncating any terminating
null character before decoding the text, which erroneously truncated
half of the last unicode character in the case of little-endian unicode.

I have fixed this, plus some other problems decoding URL fields, and uploaded a
https://exiftool.org/Image-ExifTool-6.92.tar.gz" target="_blank">6.92
pre-release for you to test out if you want before the official release.

- Phil

Archive

[Originally posted by humyo on 2007-06-25 16:38:34-07]

Phil,

I can confirm that the 6.92 does indeed solve the problem!

Thank you so much for your quick response.

Dan