We have an uploader in our web app that uses a combination of "exiftool" and "file" to correctly identify filetypes.
We have discovered that textfiles saved as unicode are being identified incorrectly by EXIFTOOL as an audio MP3 file. ???
I am trying this on Windows 7 (exiftool 8.78) and Ubuntu (exiftool 8.60)
Is there a work around so that I can identify the correct mimetype/filetype with exiftool or am i doing something wrong? ;D
INPUT file:
Open Notepad (Windows 7)
write any text
save as unicode type i.e unicode.txt (see attachment)
OUPUT generated when run.
# exiftool -v unicode.txt
ExifToolVersion = 8.60
FileName = unicode.txt
Directory = .
FileSize = 36
FileModifyDate = 1329216673
FilePermissions = 33268
FileType = MP3
MIMEType = audio/mpeg
MPEGAudioVersion = 3
AudioLayer = 3
AudioBitrate = 7
SampleRate = 1
ChannelMode = 0
MPEG_Audio_Bit26 = 0
ModeExtension = 0
MPEG_Audio_Bit27 = 0
CopyrightFlag = 0
OriginalMedia = 0
Emphasis = 0
If i use "file" command it correctly identifies it as unicode text.
# file -i -b unicode.txt
text/plain; charset=utf-16le
many thanks,
Anthony
Hi Anthony,
Thanks for this report.
Unfortunately, the file recognition for MP3 files is very weak since those files don't include a strong magic number. ExifTool will identify many unknown files as MP3 due to this, but I wasn't aware of the Unicode text overlap. The problem is that the initial byte order mark (FF FE) of the Unicode text is a valid MP3 frame synchronization word.
There may be something I can do to avoid this specific mis-identification (regarding Unicode text), but I doubt there is any way I can eliminate this problem with other types of files.
I'll look into this to see what I can do.
- Phil
I found 2 more bits that I can validate in MP3 files, and luckily this avoids the mis-identication of UTF-16LE text files with a BOM.
So ExifTool 8.79 will solve this specific problem when it is released.
- Phil
Thank you very much Phil!
ExifTool 8.79 is now available.
- Phil