ExifTool Forum

ExifTool => Bug Reports / Feature Requests => Topic started by: secshoggoth on February 20, 2018, 07:55:02 PM

Title: Office Doc Binary Metadata bug?
Post by: secshoggoth on February 20, 2018, 07:55:02 PM
I think I found a bug in the way that exiftool extracts binary metadata from office documents.

I was analyzing a malicious composite format word document and the title metadata contained a mix of ascii and binary data that malware decodes to download its second stage. The analysis can be found here. (https://secshoggoth.blogspot.com/2018/02/the-case-of-tricky-tool.html)

I noticed that exiftool was converting characters when extracting the title metadata. I tried it with multiple options, including the -binary option, and found that certain characters/values it would convert from one byte to 2 bytes. You can see this in the image below, where I show one instance, but it happened multiple times in the extraction.

(https://4.bp.blogspot.com/-ZH08FWdUF-4/Woyu-2gV5SI/AAAAAAAAASg/v3oLO6I5WOEnIV_0oLRpoKmPw9TP8KySgCLcBGAs/s1600/convert.png)

I tried both the version of exiftool that came with in REMNux (9.x) and the latest version from the website with the same results.

The document can be downloaded from https ://drive.google.com/ open?id=1gLgXDVRqdK-VifZ5iE8rHdkN5tizq6Mt.

NOTE THIS IS MALWARE! BE CAREFUL!!!

Let me know if this is a bug, or I am doing something wrong in running exiftool. Thanks!

PH Edit: split malware URL to avoid accidental downloading
Title: Re: Office Doc Binary Metadata bug?
Post by: Phil Harvey on February 20, 2018, 09:33:34 PM
ExifTool is attempting to convert the output text to UTF-8.

- Phil
Title: Re: Office Doc Binary Metadata bug?
Post by: secshoggoth on February 20, 2018, 10:08:31 PM
Is there a way to prevent that? I couldn't find the option to give me raw output for this.
Title: Re: Office Doc Binary Metadata bug?
Post by: Phil Harvey on February 21, 2018, 07:12:04 AM
The only way short of removing all of the calls to Decode() in lib/Image/ExifTool/FlashPix.pm would be to use the -v4 output and convert the output hex characters back to binary.

- Phil