How to exclude all raw file by tag not .ext

Started by captured, August 24, 2018, 11:32:01 PM

Previous topic - Next topic

captured

Hello.

System: Linux Mint 18.3 Cinnamon 64 bit

Desired output: -FileName ; -ImageWidth ; -ImageHeight ;  ?<exiftool tag for RAW files>.
Hello.

I wish to ignore 'All' raw files, determined by the correct tag from exiftool.

Using the Linux command : $ exiftool <file.ext> | grep -i raw
Result ;
Quality                          : RAW + JPEG
Quality2                        : RAW + JPEG

Question: Is Quality and Quality2 the tags I should be using if I wish to ignore all RAW files ?

Thank you.

StarGeek

Probably not.  You haven't checked to see if the value of "RAW + JPEG" appears in the jpgs.  You grepped on the word raw and don't have the matching file names.  Also, it probably depends upon the camera used.  If you're dealing with images from different brands of cameras, you are probably not going to a single tag that has the value 'RAW'.

BitsPerSample might be a better one to check.  For jpgs it will be 8.  For tifs, the number would probably be repeated three times, though I have one sample where it's repeated four times and I haven't checked a grey scale image, which might be only once.

Any reason that you don't want to use --ext to exclude raw file types?
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

captured

Thanks StarGeek.

QuoteAny reason that you don't want to use --ext to exclude raw file types?

I thought it more robust if an image was a raw file, to obtain the determination from a tag...,
it would ensure a result from an image that has had it's extension renamed incorrectly or removed.

May I ask your opinion on the following ?;
find . -type f -exec exiftool -q -q -r -p '$filename $imagewidth $imageheight $mimetype' '{}' 2>/dev/null \+ | grep -i "image/x"

Found Raw types;
.RAF image/x-fujifilm-raf
.CR2 image/x-canon-cr2
.NEF image/x-nikon-nef
.ARW image/x-sony-arw

Question:
Wouldn't the -mimetype tag find or ignore RAW files (image/x)

Regards.

Phil Harvey

OK, so you could use -if '$mimetype !~ m(image/x-)'

But then ExifTool will read each file, which is slower than just skipping a file by extension.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

captured

Thanks for repying Phil.

Objectives;

  • Only process image files
  • Only process non-raw image files
  • Avoid reading the whole file (-fast3, but no width and height in fast3)
  • Supress warnings
  • Recursive Processing
  • Check if file is a regular file
  • Check if file is non zero
  • Check if file is 'regular' (plain?)
  • Ignore multiple case insensitive file type extensions

So far...
exiftool -r -q -q -p '$filename; $filetype; $mimetype $imagewidth $imageheight' -if '$mimetype =~ "^image/" && $mimetype !~ "^image/x"' .

Total Files: 1759

Process Time (round up)
no fast = 30 seconds
-fast    = 28  seconds
-fast2   = 25  seconds
-fast3   = 3.6 seconds (Without imagewidth or imageheight)

Further questions please;
1) How do I incorporate perl unary tests for process files, only if ?
-e   File exists
-s   File has non-zero size
-f   File is a plain file (Is this the same as 'regular' file in Linux ?)

2) When using -if,  '==' vs. '=~' , is the speed increase significant for an exact match vs. pattern match?
e.g. -if '$mimetype =~ "image/"' vs. -if '$mimetype == "^image/.*"

3) What is the correct way to ignore multiple file extensions (case insensitive)...
a) -if '$mimetype =~ "^image/.*" && $mimetype !~ "^image/x" && $filetypeextension !~ "[NEFnef|RAFraf|ARWarw|CR2cr2]"'
b) multiple -ext outside the -if statement
c) Other

Thank you very much for your help.

Best regards.

Hayo Baan

Quote from: captured on August 27, 2018, 03:43:06 PM
Further questions please;
1) How do I incorporate perl unary tests for process files, only if ?
-e   File exists
-s   File has non-zero size
-f   File is a plain file (Is this the same as 'regular' file in Linux ?)

You can use those directly e.g. -if '-f $filepath && -s $filepath'
(and yes, a regular file == a plain file)

Quote from: captured on August 27, 2018, 03:43:06 PM
2) When using -if,  '==' vs. '=~' , is the speed increase significant for an exact match vs. pattern match?
e.g. -if '$mimetype =~ "image/"' vs. -if '$mimetype == "^image/.*"

You can't do that: == tests for numerical equality, not string equality (that would be eq, but that won't work in this case since you are pattern matching!).

Quote from: captured on August 27, 2018, 03:43:06 PM
3) What is the correct way to ignore multiple file extensions (case insensitive)...
a) -if '$mimetype =~ "^image/.*" && $mimetype !~ "^image/x" && $filetypeextension !~ "[NEFnef|RAFraf|ARWarw|CR2cr2]"'
b) multiple -ext outside the -if statement
c) Other

My guess would be to use multiple -ext since that is processed without having to read the file first.

Cheers,
Hayo
Hayo Baan – Photography
Web: www.hayobaan.nl

StarGeek

Just a couple notes offhand.

== is a numeric comparison.  You will get unreliable results if you use non-numeric values with it.  The string equivalent is eq, but that only returns if the results are exactly equal.  It will not match substrings e.g.  'image/' will not match 'image/jpeg'. 

You can combine '$mimetype =~ "^image/" && $mimetype !~ "^image/x"' into '$mimetype=~"^image/[^x]".

To recurse, use the -r option.

To supress minor warnings, use the -m option.

The best way to ignore specific extensions is the --ext option (two hyphens instead of one).  It is case insensitive.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

StarGeek

Quote from: captured on August 27, 2018, 03:43:06 PM"[NEFnef|RAFraf|ARWarw|CR2cr2]"'

This doesn't do what you think it does.  The brakets indicate character lists.  This will match any single one of these characters |2AaCcEeFfNnRrwW.  What you want is non-capture parenthesis with the case-insensitive match option, anchored at the end of the line.  Putting the dot at the beginning of that is probably a good idea as well.

$filetypeextension !~ "\.(?:NEF|RAF|ARW|CR2)$"i
Example of matches
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

captured

Thank you Phil, Hayo Baan, Star Geek.

3rd attempt;
exiftool -r -q -q -m -p '$filename; $filetype; $mimetype $imagewidth $imageheight' -if '$mimetype =~ "^image/[^x]" && $filetypeextension !~ "\.(?:NEF|RAF|ARW|CR2)$"i ' .

Note: The addition of... $filetypeextension !~ "\.(?:NEF|RAF|ARW|CR2)$"i  fails when I include this.

This is working, but it would be nice to get the "non-capture parenthesis" working;
exiftool -r -q -q -m -p '$filename; $filetype; $mimetype $imagewidth $imageheight' -if '$mimetype =~ "^image/[^x]"' .

BTW, is there any workaround for -fast3 to include imagewidth and imageheight without having to read all the metadata ?

Thank you again.

Best regards.

Phil Harvey

The "i" breaks things because you are binding to a string instead of a regular expression in your conditions.  Try this:

exiftool -r -q -q -m -p '$filename; $filetype; $mimetype $imagewidth $imageheight' -if '$mimetype =~ m"^image/[^x]" && $filetypeextension !~ m"\.(?:NEF|RAF|ARW|CR2)$"i ' .

Here I am still using quotes, but prefix with an "m" to make it a matching expression instead of a string.  Actually, I had even forgotten that you could bind to a string at all (if I ever knew), so I learned something here.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

captured

#10
You are the King Phil !

So happy now.

Is the regex you are using from Pearl, bash, or other ?. I couldn't find it in the documentation, I am not fluent in regex or pearl.

Regarding -fast3, if I use any other fast, fast2, non-fast... is my understanding correct that 'all' metadata will be read, even
if you only need 1 or 2 tags... e.g. -ImageWidth -ImageHeight ?

4th Try
Instead of multiple -ext, since there can be multiple file extensions for the same file type, e.g. jpg|jpeg
exiftool -r -q -q -m -p '$filename; $filetype; $mimetype; $imagewidth; $imageheight' -if '$mimetype =~ m"^image/[^x]" && $filetypeextension !~ m"\.(?:NEF|RAF|ARW|CR2)$"i && $filetype =~ m"(JPEG|TIFF|PNG|WEBP|PSD|GIF)"i' .

Total Files: 1871
Real time:   0m25.742s

Thank you so much.


Best regards

Phil Harvey

Quote from: captured on August 28, 2018, 01:35:34 PM
Is the regex you are using from Pearl, bash, or other ?.

Perl.

QuoteRegarding -fast3, if I use any other fast, fast2, non-fast... is my understanding correct that 'all' metadata will be read, even
if you only need 1 or 2 tags... e.g. -ImageWidth -ImageHeight ?

The only way to keep ExifTool from reading all metadata from a file is by using a -fast option (although you can also get ExifTool to extract metadata from embedded documents by adding -ee).
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

captured