Capturing Exiftool StdOut from Powershell script (asynchroneous communication)

Started by PatE, November 29, 2016, 11:22:39 PM

Previous topic - Next topic

jean

> it seems to be a C++ function,
No, it's a SDK function, you can call it from basic, c, c++, pascal etc... and certainly Powershell
A link to use the SDK from Powershell:
https://deletethis.net/dave/?q=dllimport

PatE

Thanks, I'll look into it once I get the meat of my application running. (JPG resizer app, now testing on 80.000 files)

Noticing strange things during my test runs, seeing ASCII chars getting translated (by exiftool ?).
E.g. sending via ASCII argfile a filename:
     xxxxx©xxxx.jpg
The char © is ASCII 169. (http://www.ascii-code.com/)
But after exiftool processing and having it return me the filename with the -filename option via StdOut, the character has changed to code 174, which is ®. Ok, they look alike, but it's a different character.

Not sure where this translation happens.
I can verify upto the argfile, which shows in notepad.exe as type ANSI (=ASCII), and in which the filename still has the correct character ©.
What happens internally within exiftool, or during its communication to StdOut, and afterwards how the exiftool response gets processed as it presumably passes through layers and layers of SDK, .NET etc. processing before it reaches my Powershell variable, I have no clue.

As a workaround I am now just filtering out filenames containing ASCII codes >127, effectively barring files containing quite common accented characters as é, à, etc. from processing. Those files still make up only about 0,01 % of all files to be processed, and my client doesn't care, but it's a loose end nevertheless.

Phil Harvey

You can use the -charset filename option to specify the character set you use for file names.  The -charset exiftool (or just -charset) option specifies the character set used by ExifTool for extracted values (including the FileName value).  If -charset filename is not specified, then the file names specified on the command line are not translated.

See this section of the application documentation for some pointers.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

PatE

Thanks Phil, I was initially using -charset filename=Latin when I started the topic, after a quick read of your extensive help text, but don't remember the logic I followed in selecting that option.
But at that time I was still concentrated on getting exiftool stay_open to work from Powershell.
Only after that was solved, I started concentrating on the character conversion problems.
Those appeared while using the charset option, so I removed it somewhere along the way, to eliminate possible reasons for the unwanted conversions, there are several levels involved. I'll try to reengineer my decision to use the charset Latin. Am now looking at how to read StdOut in different encodings, may be onto something.
Patrick

PatE

Tried to create a Streamreader onto the exiftool process object $p, like:
    $ASCIIstream = new-object system.IO.Streamreader($p.standardoutput.basestream,[System.Text.Encoding]::ASCII)
    $s=$asciistream.readline()


(tip from http://stackoverflow.com/questions/2855675/process-standardinput-encoding-problem)
This seems to be the way to force StdOut, StdIn etc. into a certain encoding.

However, the exiftool response on the option -filename still contains any accented characters converted into "?" or something else.
Checked argfile using notepad.exe before sending execute \n, always has the intended encoding.
Tried charset filename=Latin, and charset filename=UTF8
both :
- writing to argfile with UTF8 encoding -> getting the same 3 responses as earlier although argfile looks OK and is in the intended format :
  0 files read
  1 file could not be read
  {ready}
- writing to argfile in Unicode -> argfile looks OK and is verifiable Unicode, but exiftool does not output anything since my program hangs on the readline() from StdOut call.
(the reason for trying the charset UTF8 + argfile encoding was the following documentation excerpt, which in my lacking knowledge on the matter I may be interpreting too much on face value :
"...character set, preferably UTF8 (see the -charset option for a complete list). Setting this triggers the use of Windows wide-character i/o routines, thus providing support for all Unicode file names"

After further empirical multi-case input-output testing, I can confirm that exiftool does process the files with accented characters OK with the following settings:
- writing argfile in ASCII encoding
- charset filename=Latin (although I'm not sure if this has any effect)
- accented filenames appended to argfile via cmd prompt to retain correct Unicode->ASCII conversion.
So passing the accented filenames to exiftool is OK.

However, it is the response to the -filename option, "Filename      : xxxx", that does not pass OK from exiftool to Powershell. Accented characters get converted.
Given Phil's thorough approach, I assume this does not happen internally in exiftool but outside, in the OS/SDK/.NET etc. layers through which the result string passes from exiftool to Powershell.
Probably developers with a deeper understanding of these layers and encoding could find ways to investigate this.
I tried using the Streamreader approach, this did also return converted characters, but given my lack of experience with the system level routines involved, I will not draw the conclusion that it is exiftool sending converted characters down its StdOut.

I'm going to content myself with writing a string compare function that ignores any mismatches between input and output filename on the positions where the input filename contains an accented char.

Thanks everyone for your input.
Patrick

PatE

Hello Phil,

my Powershell JPG downsampling application is aimed at processing +/- 300.000 - 400.000 JPGs in one run.
For every JPG, it sends commands to exiftool (stay_open mode) via ASCII argfile.
The following command strings are appended to argfile for every file in the processing loop:
-charset
filename=Latin
-m
-filename
-xmp-dc:MyTag1
-xmp-dc:MyTag2
-xmp-dc:MyTag3
c:\test\test\testfile0000001.jpg
-execute

I am noticing in several test runs that my app hangs after +/- 84.000 files - 3:30 hours processing time.
Difficult to debug, given that it happens only after a long time.

Just to be sure: is there any upper limit on argfile filesize ? Is there any risk of exiftool internal buffer/memory overrun after processing that many commands ?
Argfile filesize is about 22KB when app hangs.

Should I better clear argfile after each command i.e. truncate to 0 bytes ?
(not sure how to do it, assuming exiftool needs the file handle to remain valid, so I guess I cannot just delete argfile and recreate 0 size)
I could also stop exiftool with a "-stay_open\nFalse\n" every say 10.000 files, and restart it with a fresh argfile.
Your thoughts ?

Thanks,
Patrick

PatE

Note: my assumption is that Powershell is hanging while waiting for something to appear on StdOut from exiftool, since it immediately stops/becomes responding again when I manually kill exiftool.exe.
To catch this I tried to make my reading from StdOut safer by first looking if there is anything available in the StdOut buffer, using $p.standardoutput.peek(), before actually reading it with $p.standardoutput.readline() .
However, standardoutput.peek() does not seem to work correctly, as it many times returns -1 (=buffer empty) even when there is something available in the buffer.
So am going for an exiftool stay_open restart every n files.

PatE

Confirmed after logging to file: Powershell app hangs on standardoutput.readline() waiting for exiftool response, now after processing 92.000 files.
In previous tests it hung at +/- 84.000.
The total input filelist is 313.000 files that I want exiftool to read.
There are some 40-50 filenames with special characters é, à, ï and more exotic that do no make it intact to exiftool.
Exiftool handles those gracefully, responding just "{ready}" on StdOut, and maybe also something on StdErr, but I'm not reading that out.

I have now manually renamed some of the accented filenames in the first 90.000 to a "valid" filename without accents.
I am now witnessing the app advancing further into the inputlist, now hangs at about 94.000.
This is after 29 filenames have been rejected by exiftool.
Will now manually correct 10 more filenames in the preceding 90.000, to verify if the app still makes it further into the list without hanging.

Forgot to mention that I am restarting exiftool every 20.000 files, so probably is not exiftool related after all (cumulative memory probs etc.)

While (command-line) PS script still hanging, tried to write a "-stay_open`nFalse`n" to argfile from another PS script, see if exiftool exits gracefully.
No response from exiftool, process stays in memory. As soon as I kill the process, PS script terminates with an error about NULL processpointer in readline() read, which is to be exspected.

A deep dive using Process Explorer and Procmon seems inevitable.

Hayo Baan

From your last reports it looks as if special characters in the filenames are at least part of the cause. You specify Latin as character set for the filenames, but that only holds a limited number of special characters and requires the input to be in Latin too. I'm quite sure that isn't the case on the windows command line (and certainly does not allow all special characters), so this is probably an area to have a look at. Best would be to be able to give exiftool UTF8 input, but that's not always possible on Windows (though I think with power shell this is possible).
Hayo Baan – Photography
Web: www.hayobaan.nl

PatE

Hello Hayo, thanks for chiming in. To be honest, I cannot really get my head around the encoding stuff. Have been raised on assembler and C in the '80s-'90s and then there was only ASCII, have not been programming at systems level for at least 20 yrs.

I would very much like to solve this, but as I explained earlier, when passing charset filename=UTF8 to exiftool, and writing the argfile also in UTF8 (am I correct that I need to write the argfile in the chosen charset encoding ? Or should I always write the argfile in ASCII, but use charset filename=UTF8 ?), I get the unexspected responses from exiftool (with any filename, not just ones containing special characters) :
  0 files read
  1 file could not be read
  {ready}

Have now changed my function that calls exiftool to not send the "execute\n" to the argfile, so that exiftool only reads the commands in the growing argfile but never executes them. My Powershell script has now been running happily all night and has processed 274.000 of 313.000 input files... much more than the 80.000 previously, so the problem definitely happens between sending "execute\n" to the argfile and exiftools response.
So a simplified code excerpt:
...
"-xmp-dc:MyTag1"     | Out-File $exifargfile -Append -Encoding ASCII    # ask exiftool to read the tags MyTag1 and MyTag2 from JPG file
"-xmp-dc:MyTag2"    | Out-File $exifargfile -Append -Encoding ASCII
$fullname               | Out-File $exifargfile -Append -Encoding ASCII;   # file to be read
"-execute`n"            | Out-File $exifargfile -Append -Encoding ASCII;   # do it
$s = $p.StandardOutput.Readline()
    <----- here my application hangs, waiting for exiftool response
...

Hayo Baan

I'm not familiar with PowerSHell (Unix/Mac user myself), but I'm sure the ASCII encoding won't be good enough, if you can UTF-8 really is preferred.

But my suggested approach would be to forgo executing anything in exiftool to begin with. So only use powershell to write the arg file as a first step. Then have a good look at that file and look for anything that's off (especially file names). Actually from your experiment you should already have a good idea what file is causing the issue so have a look at how it's encoded in the arg file.

As a side note, I think you could/should make use of the -common_args option, that saves you having to pass the tag names each time in the arg file. This also allows you to easily reuse the same arg file and change the things you want exiftool to do. E.g., have it print only  the filename, etc.
Hayo Baan – Photography
Web: www.hayobaan.nl

PatE

Reduced my input filename list to only +/- 8000 names with accents.
On this input exiftool tripped up systematically after processing only 16 filenames.

Response on each "bad" filename was {ready} instead of the processed filename requested by my -filename.
My application until now simply decided: if the -filename response is not equal to the filename passed, something must have gone wrong during exiftool processing (probably file not found due to the accented chars getting converted).
My code never read out exiftools StdErr buffer.

Now I added code to read StdErr buffer (consume until empty) after each non-match between filename and exiftool response. Now exiftool does not trip up anymore, my application continues processing the 8000 accented filenames OK.

Conclusion: exiftool StdErr buffer overrun if its StdErr if it's not emptied by the calling app ?
Input filepaths were 234 chars,it tripped after the 16th file, meaning +/- 3.744 bytes in StdErr buffer.

@Phil: is this a plausible limitation ?

Phil Harvey

Sorry for the delay in responding.  I don't know much about the buffer limitations on Windows, so I can't say whether 3kB is plausible, but it wouldn't surprise me if it was.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).