Running Raspberry Pi 4b, Ubuntu 22.04 (a flavor of Linux), exiftool -ver is 12.40
Have a bunch of image files in various directories along with files of other types (text, movies, archives, pdf, TeX, etc.). Apparently some of the images have the wrong extension, mostly WebP in .jpg, but there may be others. I cobbled together the following scripts that get pretty much what I want.
List all file extensions/image types and count the instances:
exiftool -progress: -fast2 -f -p '$FileName $FileType $MIMEType' -r. * | sed -e 's|^.*\.|\.|' | sort | uniq -c
NOTE: sed is "greedy", so it should gobble up everything to the last period if there is more than one period.
List files where the file extension does not match the recommended extension for that image type, in case I miss a bad combination in the list above:
exiftool -progress: -fast2 -f -if '$FileTypeExtension ne ($FileName =~ s/^.*\.//r)' -p '$Directory $FileName $FileTypeExtension' -r. * | sort
I tried putting the ($FileName =~ s/^.*\.//r) into the -p, but it doesn't work. Any suggestions? I would like to print it in case the result is unexpected.
Locate and list files of "interest", change the -ext jpg and eq "WEBP" to whatever mismatch is desired:
exiftool -progress: -fast2 -ext jpg -if '$FileType eq "WEBP"' -p '$Directory $FileName' -r. * | sort
I am fully prepared to accept "the most awful code in the last 24 hours" award and complementary flogging if needed, just tell me where and when. :-)
I started off asking for help, but stumbled over some examples on the site and managed to answer my own questions. Now I need to know if there is a better way to do this?
Results of running the scripts on real files, but added some manufactured mismatches and weird file names. The results are acceptable to me, errors, warnings, and all.
admin42@U-RPi4:~/Documents/work/India$
admin42@U-RPi4:~/Documents/work/India$ exiftool -progress: -fast2 -f -p '$FileName $FileType $MIMEType' -r. * | sed -e 's|^.*\.|\.|' | sort | uniq -c
Warning: Processing JPEG-like data after unknown 107-byte header - Alfa4/India - Alfa 1-16.rar
Error: File is empty - empty-file.ext
Error: File is empty - has.two.periods
Error: File is empty - no_periods
Error: File is empty - trailing_period.
147 directories scanned
3861 image files read
1 . - -
1 .ext - -
1 .jpeg JPEG image/jpeg
1 .jpeg WEBP image/webp
3838 .jpg JPEG image/jpeg
1 .jpg WEBP image/webp
1 .periods - -
10 .png PNG image/png
1 .rar - -
1 .tiff TIFF image/tiff
1 .txt TXT text/plain
1 .webp JPEG image/jpeg
2 .webp WEBP image/webp
1 no_periods - -
admin42@U-RPi4:~/Documents/work/India$
admin42@U-RPi4:~/Documents/work/India$
admin42@U-RPi4:~/Documents/work/India$ exiftool -progress: -fast2 -f -if '$FileTypeExtension ne ($FileName =~ s/^.*\.//r)' -p '$Directory $FileName $FileTypeExtension' -r. * | sort
147 directories scanned
3856 files failed condition
5 image files read
. mandrill-0.tiff tif
. mandrill-1.jpeg jpg
. mandrill-a.jpeg webp
. mandrill-a.jpg webp
. mandrill-a.webp jpg
admin42@U-RPi4:~/Documents/work/India$
admin42@U-RPi4:~/Documents/work/India$
admin42@U-RPi4:~/Documents/work/India$ exiftool -progress: -fast2 -ext jpg -if '$FileType eq "WEBP"' -p '$Directory $FileName' -r. * | sort
147 directories scanned
3838 files failed condition
1 image files read
. mandrill-a.jpg
admin42@U-RPi4:~/Documents/work/India$
admin42@U-RPi4:~/Documents/work/India$
A bit of a wall of text so I didn't read it carefully, but it seems you just want to fix the file extensions.
See this post (https://exiftool.org/forum/index.php?topic=8105.msg41514#msg41514).
I prefer to have software (ExifTool in this case) identify possible problems so wetware (me, myself, and I, the three stooges) can investigate, determine what went wrong and why, and then correct the problem if there is a problem. I might be able to trace a problem back to the source ... which may be a junior programmer in need of some reeducation.
I stumbled over ExifTool Saturday and I'm not sure I'm using it properly or fully. I was asking if there were better, more efficient ways to identify the possible problems I'm looking for. I wouldn't be surprised if a third of the first script could be replaced by two options that I didn't properly understand.
Quote from: WaffleHouse2t2 on October 17, 2022, 10:36:28 PMexiftool -progress: -fast2 -f -if '$FileTypeExtension ne ($FileName =~ s/^.*\.//r)' -p '$Directory $FileName $FileTypeExtension' -r. * | sort
I tried putting the ($FileName =~ s/^.*\.//r) into the -p, but it doesn't work. Any suggestions? I would like to print it in case the result is unexpected.
So the object here is to print only the extension for the file?
${Filename;s/^.*\.//}I had to look up the
r modifier. You don't need to use it in the
-if option (https://exiftool.org/exiftool_pod.html#if-NUM-EXPR) and you wouldn't want to use it with the
-p (
-printFormat) option (https://exiftool.org/exiftool_pod.html#p-FMTFILE-or-STR--printFormat) as you want an edited value. Note that any alterations using the Advanced formatting feature (https://exiftool.org/exiftool_pod.html#Advanced-formatting-feature) only apply locally to that instance. So using something like
-p '${Filename;s/^.*\.//} $FileName'will only change the output in the first instance. The original value is unchanged.
You may come across the
-FileOrder option (https://exiftool.org/exiftool_pod.html#fileOrder-NUM---TAG), which can be used to sort on any tag. But in your case, piping into
sort would be better as
-FileOrder makes two passes over the files, so it is much slower for simply printing the data.