I think I found a bug in the way that exiftool extracts binary metadata from office documents.
I was analyzing a malicious composite format word document and the title metadata contained a mix of ascii and binary data that malware decodes to download its second stage. The analysis can be found here. (https://secshoggoth.blogspot.com/2018/02/the-case-of-tricky-tool.html)
I noticed that exiftool was converting characters when extracting the title metadata. I tried it with multiple options, including the -binary option, and found that certain characters/values it would convert from one byte to 2 bytes. You can see this in the image below, where I show one instance, but it happened multiple times in the extraction.
(https://4.bp.blogspot.com/-ZH08FWdUF-4/Woyu-2gV5SI/AAAAAAAAASg/v3oLO6I5WOEnIV_0oLRpoKmPw9TP8KySgCLcBGAs/s1600/convert.png)
I tried both the version of exiftool that came with in REMNux (9.x) and the latest version from the website with the same results.
The document can be downloaded from https ://drive.google.com/ open?id=1gLgXDVRqdK-VifZ5iE8rHdkN5tizq6Mt.
NOTE THIS IS MALWARE! BE CAREFUL!!!
Let me know if this is a bug, or I am doing something wrong in running exiftool. Thanks!
PH Edit: split malware URL to avoid accidental downloading
ExifTool is attempting to convert the output text to UTF-8.
- Phil
Is there a way to prevent that? I couldn't find the option to give me raw output for this.
The only way short of removing all of the calls to Decode() in lib/Image/ExifTool/FlashPix.pm would be to use the -v4 output and convert the output hex characters back to binary.
- Phil