OCR tesseract -> exiftool

Started by StarGeek, March 25, 2024, 12:46:05 PM

Previous topic - Next topic

StarGeek

A recent Reddit post about OCRing an image and saving the text into the file interested me enough to figure it out and even make a Windows BAT file to do it in batch.

I knew about tesseract but never looked into it. Turns out it was Super Easy (insert Ryan George GIF here).  The OCR part to output on STDOUT is simply
tesseract file.jpg -

From there, it's simple enough to pipe the tesseract into exiftool and use the -TAG<=DATFILE option option to save the text into Description
tesseract file.jpg - |exiftool "-Description<=-" file.jpg

The resulting BAT file
@echo off
rem OCR_and_embed.bat
rem OCR images and embed results in a directory and its subdirectories

REM Loop through all directories specified as arguments
for  %%a in (%*) do (
echo "%%a"
  pushd "%%a"

  REM Loop through all jpg files in the current directory and its subdirectories
  for /r %%b in (*.jpg) do (
    REM Process the jpegs
    echo Processing "%%b" in "%%~dpb"
    tesseract "%%b" - |exiftool -P -overwrite_original "-Description<=-" "%%b"
  )

  popd
)

endlocal

The only thing I didn't like was that it is looping exiftool but I couldn't figure out a way to do it otherwise.  I could have just looped tesseract and made a text file to match each image, then run exiftool once, but I wanted to avoid writing temp files.  I also figured that tesseract was going to be a bigger bottleneck than exiftool's startup time, though I haven't tested it. On the simple images I was using and with my CPU, tesseract was very quick to process the files.

The looping code was created by ChatGPT for a different BAT file and I simply replaced the command.

I'm now planning on running this on a bunch of video game screenshots to save the dialog and info into the files, which I'll then be able to search through in IMatch.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype