mismatch file extension and image type, better way?

Started by WaffleHouse2t2, October 17, 2022, 10:36:28 PM

Previous topic - Next topic

WaffleHouse2t2

Running Raspberry Pi 4b, Ubuntu 22.04 (a flavor of Linux), exiftool -ver is 12.40

Have a bunch of image files in various directories along with files of other types (text, movies, archives, pdf, TeX, etc.).  Apparently some of the images have the wrong extension, mostly WebP in .jpg, but there may be others.  I cobbled together the following scripts that get pretty much what I want.

List all file extensions/image types and count the instances:

  exiftool -progress: -fast2 -f -p '$FileName    $FileType    $MIMEType' -r. * | sed -e 's|^.*\.|\.|' | sort | uniq -c

NOTE: sed is "greedy", so it should gobble up everything to the last period if there is more than one period.

List files where the file extension does not match the recommended extension for that image type, in case I miss a bad combination in the list above:

  exiftool -progress: -fast2 -f -if '$FileTypeExtension ne ($FileName =~ s/^.*\.//r)' -p '$Directory    $FileName    $FileTypeExtension' -r. * | sort

I tried putting the ($FileName =~ s/^.*\.//r) into the -p, but it doesn't work.  Any suggestions?  I would like to print it in case the result is unexpected.

Locate and list files of "interest", change the -ext jpg and eq "WEBP" to whatever mismatch is desired:

  exiftool -progress: -fast2 -ext jpg -if '$FileType eq "WEBP"' -p '$Directory    $FileName' -r. * | sort

I am fully prepared to accept "the most awful code in the last 24 hours" award and complementary flogging if needed, just tell me where and when.  :-)

I started off asking for help, but stumbled over some examples on the site and managed to answer my own questions.  Now I need to know if there is a better way to do this?

Results of running the scripts on real files, but added some manufactured mismatches and weird file names.  The results are acceptable to me, errors, warnings, and all.

admin42@U-RPi4:~/Documents/work/India$
admin42@U-RPi4:~/Documents/work/India$ exiftool -progress: -fast2 -f -p '$FileName    $FileType    $MIMEType' -r. * | sed -e 's|^.*\.|\.|' | sort | uniq -c
Warning: Processing JPEG-like data after unknown 107-byte header - Alfa4/India - Alfa 1-16.rar
Error: File is empty - empty-file.ext
Error: File is empty - has.two.periods
Error: File is empty - no_periods
Error: File is empty - trailing_period.
  147 directories scanned
 3861 image files read
      1 .    -    -
      1 .ext    -    -
      1 .jpeg    JPEG    image/jpeg
      1 .jpeg    WEBP    image/webp
   3838 .jpg    JPEG    image/jpeg
      1 .jpg    WEBP    image/webp
      1 .periods    -    -
     10 .png    PNG    image/png
      1 .rar    -    -
      1 .tiff    TIFF    image/tiff
      1 .txt    TXT    text/plain
      1 .webp    JPEG    image/jpeg
      2 .webp    WEBP    image/webp
      1 no_periods    -    -
admin42@U-RPi4:~/Documents/work/India$
admin42@U-RPi4:~/Documents/work/India$
admin42@U-RPi4:~/Documents/work/India$ exiftool -progress: -fast2 -f -if '$FileTypeExtension ne ($FileName =~ s/^.*\.//r)' -p '$Directory    $FileName    $FileTypeExtension' -r. * | sort
  147 directories scanned
 3856 files failed condition
    5 image files read
.    mandrill-0.tiff    tif
.    mandrill-1.jpeg    jpg
.    mandrill-a.jpeg    webp
.    mandrill-a.jpg    webp
.    mandrill-a.webp    jpg
admin42@U-RPi4:~/Documents/work/India$
admin42@U-RPi4:~/Documents/work/India$
admin42@U-RPi4:~/Documents/work/India$ exiftool -progress: -fast2 -ext jpg -if '$FileType eq "WEBP"' -p '$Directory    $FileName' -r. * | sort
  147 directories scanned
 3838 files failed condition
    1 image files read
.    mandrill-a.jpg
admin42@U-RPi4:~/Documents/work/India$
admin42@U-RPi4:~/Documents/work/India$

StarGeek

A bit of a wall of text so I didn't read it carefully, but it seems you just want to fix the file extensions.

See this post.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

WaffleHouse2t2

#2
I prefer to have software (ExifTool in this case) identify possible problems so wetware (me, myself, and I, the three stooges) can investigate, determine what went wrong and why, and then correct the problem if there is a problem.  I might be able to trace a problem back to the source ... which may be a junior programmer in need of some reeducation.

I stumbled over ExifTool Saturday and I'm not sure I'm using it properly or fully.  I was asking if there were better, more efficient ways to identify the possible problems I'm looking for.  I wouldn't be surprised if a third of the first script could be replaced by two options that I didn't properly understand.

StarGeek

Quote from: WaffleHouse2t2 on October 17, 2022, 10:36:28 PMexiftool -progress: -fast2 -f -if '$FileTypeExtension ne ($FileName =~ s/^.*\.//r)' -p '$Directory    $FileName    $FileTypeExtension' -r. * | sort
I tried putting the ($FileName =~ s/^.*\.//r) into the -p, but it doesn't work.  Any suggestions?  I would like to print it in case the result is unexpected.

So the object here is to print only the extension for the file?
${Filename;s/^.*\.//}

I had to look up the r modifier.  You don't need to use it in the -if option and you wouldn't want to use it with the -p (-printFormat) option as you want an edited value.  Note that any alterations using the Advanced formatting feature only apply locally to that instance.  So using something like
-p '${Filename;s/^.*\.//} $FileName'
will only change the output in the first instance.  The original value is unchanged.

You may come across the -FileOrder option, which can be used to sort on any tag.  But in your case, piping into sort would be better as -FileOrder makes two passes over the files, so it is much slower for simply printing the data.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).