Bug? exif tool is identifying unicode text files as MP3 audio/mpeg

Started by Anthony, February 16, 2012, 06:47:52 AM

Previous topic - Next topic

Anthony

We have an uploader in our web app that uses a combination of "exiftool" and "file" to correctly identify filetypes.
We have discovered that textfiles saved as unicode are being identified incorrectly by EXIFTOOL as an audio MP3 file.  ???

I am trying this on Windows 7 (exiftool 8.78) and Ubuntu (exiftool 8.60)
Is there a work around so that I can identify the correct mimetype/filetype with exiftool or am i doing something wrong?  ;D


INPUT file:
   Open Notepad (Windows 7)
   write any text
   save as unicode type i.e unicode.txt (see attachment)


OUPUT generated when run.

# exiftool -v unicode.txt
  ExifToolVersion = 8.60
  FileName = unicode.txt
  Directory = .
  FileSize = 36
  FileModifyDate = 1329216673
  FilePermissions = 33268
  FileType = MP3
  MIMEType = audio/mpeg
  MPEGAudioVersion = 3
  AudioLayer = 3
  AudioBitrate = 7
  SampleRate = 1
  ChannelMode = 0
  MPEG_Audio_Bit26 = 0
  ModeExtension = 0
  MPEG_Audio_Bit27 = 0
  CopyrightFlag = 0
  OriginalMedia = 0
  Emphasis = 0



If i use "file" command it correctly identifies it as unicode text.

# file -i -b unicode.txt         
  text/plain; charset=utf-16le


many thanks,
Anthony

Phil Harvey

Hi Anthony,

Thanks for this report.

Unfortunately, the file recognition for MP3 files is very weak since those files don't include a strong magic number.  ExifTool will identify many unknown files as MP3 due to this, but I wasn't aware of the Unicode text overlap.  The problem is that the initial byte order mark (FF FE) of the Unicode text is a valid MP3 frame synchronization word.

There may be something I can do to avoid this specific mis-identification (regarding Unicode text), but I doubt there is any way I can eliminate this problem with other types of files.

I'll look into this to see what I can do.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

I found 2 more bits that I can validate in MP3 files, and luckily this avoids the mis-identication of UTF-16LE text files with a BOM.

So ExifTool 8.79 will solve this specific problem when it is released.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).


Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).