Bug report: Skipping and Scanning files with $FileType

Started by Hank, May 25, 2017, 12:21:11 PM

Previous topic - Next topic

Hank

Version: 10.40
OS: Linux, Windows
Contents of the directory:

gsdll32.dll
gsdll32.lib
gsdll64.dll
gswin32c.exe
gswin32.exe

(Ghostscript in this example)

Command:
exiftool -r -SourceFile -AssemblyVersion -Comments -CompanyName -FileDescription -FileName -FileSize -FileType -FileVersion -FileVersionNumber -InternalName -LegalCopyright -LegalTrademarks -if '$FileType=~/DLL|EXE|LIB|OCX|SO|AXF|BIN|ELF|PRX|PUFF|DYLIB|MSI/i' -json .


Output:

[{
  "SourceFile": "./gsdll32.dll",
  "FileName": "gsdll32.dll",
  "FileSize": "11 MB",
  "FileType": "Win32 DLL"
},
{
  "SourceFile": "./gsdll64.dll",
  "FileName": "gsdll64.dll",
  "FileSize": "12 MB",
  "FileType": "Win64 DLL"
},
{
  "SourceFile": "./gswin32.exe",
  "FileName": "gswin32.exe",
  "FileSize": "144 kB",
  "FileType": "Win32 EXE"
},
{
  "SourceFile": "./gswin32c.exe",
  "FileName": "gswin32c.exe",
  "FileSize": "136 kB",
  "FileType": "Win32 EXE"
}]
    1 directories scanned
    4 image files read

Skipped the LIB file.

Command:
exiftool -r -SourceFile -AssemblyVersion -Comments -CompanyName -FileDescription -FileName -FileSize -FileType -FileVersion -FileVersionNumber -InternalName -LegalCopyright -LegalTrademarks -json .

Output:

[{
  "SourceFile": "./gsdll32.dll",
  "FileName": "gsdll32.dll",
  "FileSize": "11 MB",
  "FileType": "Win32 DLL"
},
{
  "SourceFile": "./gsdll64.dll",
  "FileName": "gsdll64.dll",
  "FileSize": "12 MB",
  "FileType": "Win64 DLL"
},
{
  "SourceFile": "./gswin32.exe",
  "FileName": "gswin32.exe",
  "FileSize": "144 kB",
  "FileType": "Win32 EXE"
},
{
  "SourceFile": "./gswin32c.exe",
  "FileName": "gswin32c.exe",
  "FileSize": "136 kB",
  "FileType": "Win32 EXE"
}]
    1 directories scanned
    4 image files read

Command:
exiftool -r -SourceFile -AssemblyVersion -Comments -CompanyName -FileDescription -FileName -FileSize -FileType -FileVersion -FileVersionNumber -InternalName -LegalCopyright -LegalTrademarks -json *.lib

Output:

[{
  "SourceFile": "gsdll32.lib",
  "FileName": "gsdll32.lib",
  "FileSize": "7.8 kB",
  "FileType": "Static library"
}]


Also I have noticed that exiftool has another bug with OCX files.  Microsoft Word files (*.DOCX) are captured as well.

Thanks

Phil Harvey

Hi Hank,

Quote from: Hank on May 25, 2017, 12:21:11 PM
Skipped the LIB file.

Yes.  This is not currently in the list of supported extensions.  You can force processing of all extensions with -ext "*" if you want.

Quote
Also I have noticed that exiftool has another bug with OCX files.  Microsoft Word files (*.DOCX) are captured as well.

I don't understand.  Do you mean that DOCX files are processed when you didn't expect?  What was the command you used?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Quote from: Hank on May 25, 2017, 12:21:11 PM
Also I have noticed that exiftool has another bug with OCX files.  Microsoft Word files (*.DOCX) are captured as well.

If your command includes the above -if '$FileType=~/DLL|EXE|LIB|OCX|SO|AXF|BIN|ELF|PRX|PUFF|DYLIB|MSI/i' then the OCX part of the regex is going to match DOCX.  Without any boundary limits to your regex, it will match any string with the listed character sequence.  For example, your regex will match "Staypuff Marshmallow Man" because of the PUFF character sequence.

Also, you're making the assumption that the Filetype is the extension.  As you can see in the last output that Lib files return "Static library", not LIB.  This will be a match due to LIB being in LIBrary, but it's not the result you think is happening.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Hank

Thank you Phil for explaining the LIB ext. being not supported.

For the DOCX files, yes, DOCX files were scanned when I would not have expected it.
From Windows, but it also occurs in Linux:

exiftool -if "$FileType=~/DLL|EXE|LIB|OCX|SO|AXF|BIN|ELF|PRX|PUFF|DYLIB|MSI/i" -json .
[{
  "SourceFile": "./P_U_D.docx",
  "ExifToolVersion": 10.28,
  "FileName": "P_U_D.docx",
  "Directory": ".",
  "FileSize": "169 kB",
  "FileModifyDate": "2017:05:18 06:01:24-04:00",
  "FileAccessDate": "2017:05:22 12:16:26-04:00",
  "FileCreateDate": "2017:05:18 06:01:24-04:00",
  "FilePermissions": "rw-rw-rw-",
  "FileType": "DOCX",
  "FileTypeExtension": "docx",
  "MIMEType": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
  "ZipRequiredVersion": 20,
  "ZipBitFlag": "0x0006",
  "ZipCompression": "Deflated",
  "ZipModifyDate": "1980:01:01 00:00:00",
  "ZipCRC": "0x2578eb05",
  "ZipCompressedSize": 411,
  "ZipUncompressedSize": 1605,
  "ZipFileName": "[Content_Types].xml",
  "Template": "Normal",
  "TotalEditTime": "7.9 days",
  "Pages": 8,
  "Words": 1730,
  "Characters": 9865,
  "Application": "Microsoft Office Word",
  "DocSecurity": "None",
  "Lines": 82,
  "Paragraphs": 23,
  "ScaleCrop": "No",
  "HeadingPairs": ["Title",1],
  "TitlesOfParts": "",
  "Company": "",
  "LinksUpToDate": "No",
  "CharactersWithSpaces": 11572,
  "SharedDoc": "No",
  "HyperlinksChanged": "No",
  "AppVersion": 14.0000,
  "Creator": "admin",
  "LastModifiedBy": "admin",
  "RevisionNumber": 105,
  "CreateDate": "2016:01:05 21:22:00Z",
  "ModifyDate": "2016:01:22 20:54:00Z"
}]
    1 directories scanned
    1 files failed condition
    1 image files read

Phil Harvey

OK.  You're looking for a substring in your -if expression.  Try this:

-if "$FileType=~/\b(DLL|EXE|LIB|OCX|SO|AXF|BIN|ELF|PRX|PUFF|DYLIB|MSI)\b/i"

This will look for the complete word instead (word breaks before and after the type).

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

QuoteOK.  You're looking for a substring in your -if expression.  Try this:

-if "$FileType=~/\b(DLL|EXE|LIB|OCX|SO|AXF|BIN|ELF|PRX|PUFF|DYLIB|MSI)\b/i"

But this will no longer match several other files.  DLL files can return Win32 DLL/Win64 DLL.  EXE files can return Win32 EXE/Win64 EXE.  MSI files aren't ever going to match because they return FPX.

This part of the command should be dropped and replaced with -ext unless there are going to be some files that are of the correct type but don't have proper extensions.  If that is the case, then a list of actually possible Filetypes results needs to be researched.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

Quote from: StarGeek on May 25, 2017, 01:59:29 PM
But this will no longer match several other files.  DLL files can return Win32 DLL/Win64 DLL.  EXE files can return Win32 EXE/Win64 EXE.

My expression will match these, because DLL and EXE are a separate word within FileType.

QuoteMSI files aren't ever going to match because they return FPX.

True.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

Quote from: Phil Harvey on May 25, 2017, 02:06:01 PM
My expression will match these, because DLL and EXE are a separate word within FileType.

D'oh again.  Yep, I should have seen that.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

I should have mentioned:

If you want to ExifTool to process LIB files, add this to your .ExifTool_config configuration file:

%Image::ExifTool::UserDefined::FileTypes = (
    LIB => 'EXE',
);


- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).