OCR tesseract -> exiftool

Started by StarGeek, March 25, 2024, 12:46:05 PM

Previous topic - Next topic

StarGeek

A recent Reddit post about OCRing an image and saving the text into the file interested me enough to figure it out and even make a Windows BAT file to do it in batch.

I knew about tesseract but never looked into it. Turns out it was Super Easy (insert Ryan George GIF here).  The OCR part to output on STDOUT is simply
tesseract file.jpg -

From there, it's simple enough to pipe the tesseract into exiftool and use the -TAG<=DATFILE option option to save the text into Description
tesseract file.jpg - |exiftool "-Description<=-" file.jpg

The resulting BAT file
@echo off
rem OCR_and_embed.bat
rem OCR images and embed results in a directory and its subdirectories

REM Loop through all directories specified as arguments
for  %%a in (%*) do (
echo "%%a"
  pushd "%%a"

  REM Loop through all jpg files in the current directory and its subdirectories
  for /r %%b in (*.jpg) do (
    REM Process the jpegs
    echo Processing "%%b" in "%%~dpb"
    tesseract "%%b" - |exiftool -P -overwrite_original "-Description<=-" "%%b"
  )

  popd
)

endlocal

The only thing I didn't like was that it is looping exiftool but I couldn't figure out a way to do it otherwise.  I could have just looped tesseract and made a text file to match each image, then run exiftool once, but I wanted to avoid writing temp files.  I also figured that tesseract was going to be a bigger bottleneck than exiftool's startup time, though I haven't tested it. On the simple images I was using and with my CPU, tesseract was very quick to process the files.

The looping code was created by ChatGPT for a different BAT file and I simply replaced the command.

I'm now planning on running this on a bunch of video game screenshots to save the dialog and info into the files, which I'll then be able to search through in IMatch.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).