ExifTool Forum

ExifTool => Bug Reports / Feature Requests => Topic started by: Hank on May 25, 2017, 12:21:11 PM

Title: Bug report: Skipping and Scanning files with $FileType
Post by: Hank on May 25, 2017, 12:21:11 PM
Version: 10.40
OS: Linux, Windows
Contents of the directory:

gsdll32.dll
gsdll32.lib
gsdll64.dll
gswin32c.exe
gswin32.exe

(Ghostscript in this example)

Command:
exiftool -r -SourceFile -AssemblyVersion -Comments -CompanyName -FileDescription -FileName -FileSize -FileType -FileVersion -FileVersionNumber -InternalName -LegalCopyright -LegalTrademarks -if '$FileType=~/DLL|EXE|LIB|OCX|SO|AXF|BIN|ELF|PRX|PUFF|DYLIB|MSI/i' -json .


Output:

[{
  "SourceFile": "./gsdll32.dll",
  "FileName": "gsdll32.dll",
  "FileSize": "11 MB",
  "FileType": "Win32 DLL"
},
{
  "SourceFile": "./gsdll64.dll",
  "FileName": "gsdll64.dll",
  "FileSize": "12 MB",
  "FileType": "Win64 DLL"
},
{
  "SourceFile": "./gswin32.exe",
  "FileName": "gswin32.exe",
  "FileSize": "144 kB",
  "FileType": "Win32 EXE"
},
{
  "SourceFile": "./gswin32c.exe",
  "FileName": "gswin32c.exe",
  "FileSize": "136 kB",
  "FileType": "Win32 EXE"
}]
    1 directories scanned
    4 image files read

Skipped the LIB file.

Command:
exiftool -r -SourceFile -AssemblyVersion -Comments -CompanyName -FileDescription -FileName -FileSize -FileType -FileVersion -FileVersionNumber -InternalName -LegalCopyright -LegalTrademarks -json .

Output:

[{
  "SourceFile": "./gsdll32.dll",
  "FileName": "gsdll32.dll",
  "FileSize": "11 MB",
  "FileType": "Win32 DLL"
},
{
  "SourceFile": "./gsdll64.dll",
  "FileName": "gsdll64.dll",
  "FileSize": "12 MB",
  "FileType": "Win64 DLL"
},
{
  "SourceFile": "./gswin32.exe",
  "FileName": "gswin32.exe",
  "FileSize": "144 kB",
  "FileType": "Win32 EXE"
},
{
  "SourceFile": "./gswin32c.exe",
  "FileName": "gswin32c.exe",
  "FileSize": "136 kB",
  "FileType": "Win32 EXE"
}]
    1 directories scanned
    4 image files read

Command:
exiftool -r -SourceFile -AssemblyVersion -Comments -CompanyName -FileDescription -FileName -FileSize -FileType -FileVersion -FileVersionNumber -InternalName -LegalCopyright -LegalTrademarks -json *.lib

Output:

[{
  "SourceFile": "gsdll32.lib",
  "FileName": "gsdll32.lib",
  "FileSize": "7.8 kB",
  "FileType": "Static library"
}]


Also I have noticed that exiftool has another bug with OCX files.  Microsoft Word files (*.DOCX) are captured as well.

Thanks
Title: Re: Bug report: Skipping and Scanning files with $FileType
Post by: Phil Harvey on May 25, 2017, 12:57:39 PM
Hi Hank,

Quote from: Hank on May 25, 2017, 12:21:11 PM
Skipped the LIB file.

Yes.  This is not currently in the list of supported extensions.  You can force processing of all extensions with -ext "*" if you want.

Quote
Also I have noticed that exiftool has another bug with OCX files.  Microsoft Word files (*.DOCX) are captured as well.

I don't understand.  Do you mean that DOCX files are processed when you didn't expect?  What was the command you used?

- Phil
Title: Re: Bug report: Skipping and Scanning files with $FileType
Post by: StarGeek on May 25, 2017, 01:17:33 PM
Quote from: Hank on May 25, 2017, 12:21:11 PM
Also I have noticed that exiftool has another bug with OCX files.  Microsoft Word files (*.DOCX) are captured as well.

If your command includes the above -if '$FileType=~/DLL|EXE|LIB|OCX|SO|AXF|BIN|ELF|PRX|PUFF|DYLIB|MSI/i' then the OCX part of the regex is going to match DOCX.  Without any boundary limits to your regex, it will match any string with the listed character sequence.  For example, your regex will match "Staypuff Marshmallow Man" because of the PUFF character sequence.

Also, you're making the assumption that the Filetype is the extension.  As you can see in the last output that Lib files return "Static library", not LIB.  This will be a match due to LIB being in LIBrary, but it's not the result you think is happening.
Title: Re: Bug report: Skipping and Scanning files with $FileType
Post by: Hank on May 25, 2017, 01:25:28 PM
Thank you Phil for explaining the LIB ext. being not supported.

For the DOCX files, yes, DOCX files were scanned when I would not have expected it.
From Windows, but it also occurs in Linux:

exiftool -if "$FileType=~/DLL|EXE|LIB|OCX|SO|AXF|BIN|ELF|PRX|PUFF|DYLIB|MSI/i" -json .
[{
  "SourceFile": "./P_U_D.docx",
  "ExifToolVersion": 10.28,
  "FileName": "P_U_D.docx",
  "Directory": ".",
  "FileSize": "169 kB",
  "FileModifyDate": "2017:05:18 06:01:24-04:00",
  "FileAccessDate": "2017:05:22 12:16:26-04:00",
  "FileCreateDate": "2017:05:18 06:01:24-04:00",
  "FilePermissions": "rw-rw-rw-",
  "FileType": "DOCX",
  "FileTypeExtension": "docx",
  "MIMEType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
  "ZipRequiredVersion": 20,
  "ZipBitFlag": "0x0006",
  "ZipCompression": "Deflated",
  "ZipModifyDate": "1980:01:01 00:00:00",
  "ZipCRC": "0x2578eb05",
  "ZipCompressedSize": 411,
  "ZipUncompressedSize": 1605,
  "ZipFileName": "[Content_Types].xml",
  "Template": "Normal",
  "TotalEditTime": "7.9 days",
  "Pages": 8,
  "Words": 1730,
  "Characters": 9865,
  "Application": "Microsoft Office Word",
  "DocSecurity": "None",
  "Lines": 82,
  "Paragraphs": 23,
  "ScaleCrop": "No",
  "HeadingPairs": ["Title",1],
  "TitlesOfParts": "",
  "Company": "",
  "LinksUpToDate": "No",
  "CharactersWithSpaces": 11572,
  "SharedDoc": "No",
  "HyperlinksChanged": "No",
  "AppVersion": 14.0000,
  "Creator": "admin",
  "LastModifiedBy": "admin",
  "RevisionNumber": 105,
  "CreateDate": "2016:01:05 21:22:00Z",
  "ModifyDate": "2016:01:22 20:54:00Z"
}]
    1 directories scanned
    1 files failed condition
    1 image files read
Title: Re: Bug report: Skipping and Scanning files with $FileType
Post by: Phil Harvey on May 25, 2017, 01:31:44 PM
OK.  You're looking for a substring in your -if expression.  Try this:

-if "$FileType=~/\b(DLL|EXE|LIB|OCX|SO|AXF|BIN|ELF|PRX|PUFF|DYLIB|MSI)\b/i"

This will look for the complete word instead (word breaks before and after the type).

- Phil
Title: Re: Bug report: Skipping and Scanning files with $FileType
Post by: StarGeek on May 25, 2017, 01:59:29 PM
QuoteOK.  You're looking for a substring in your -if expression.  Try this:

-if "$FileType=~/\b(DLL|EXE|LIB|OCX|SO|AXF|BIN|ELF|PRX|PUFF|DYLIB|MSI)\b/i"

But this will no longer match several other files.  DLL files can return Win32 DLL/Win64 DLL.  EXE files can return Win32 EXE/Win64 EXE.  MSI files aren't ever going to match because they return FPX.

This part of the command should be dropped and replaced with -ext unless there are going to be some files that are of the correct type but don't have proper extensions.  If that is the case, then a list of actually possible Filetypes results needs to be researched.
Title: Re: Bug report: Skipping and Scanning files with $FileType
Post by: Phil Harvey on May 25, 2017, 02:06:01 PM
Quote from: StarGeek on May 25, 2017, 01:59:29 PM
But this will no longer match several other files.  DLL files can return Win32 DLL/Win64 DLL.  EXE files can return Win32 EXE/Win64 EXE.

My expression will match these, because DLL and EXE are a separate word within FileType.

QuoteMSI files aren't ever going to match because they return FPX.

True.

- Phil
Title: Re: Bug report: Skipping and Scanning files with $FileType
Post by: StarGeek on May 25, 2017, 02:18:35 PM
Quote from: Phil Harvey on May 25, 2017, 02:06:01 PM
My expression will match these, because DLL and EXE are a separate word within FileType.

D'oh again.  Yep, I should have seen that.
Title: Re: Bug report: Skipping and Scanning files with $FileType
Post by: Phil Harvey on May 26, 2017, 08:50:45 AM
I should have mentioned:

If you want to ExifTool to process LIB files, add this to your .ExifTool_config configuration file (https://exiftool.org/config.html):

%Image::ExifTool::UserDefined::FileTypes = (
    LIB => 'EXE',
);


- Phil