Inconsistent Behavior with Stdin + Exiftool + Curl/HTTP

Started by dylanj, May 28, 2022, 01:18:28 PM

Previous topic - Next topic

dylanj

Hello! I'm almost certain I'll be joining the club of "people who thought something might be wrong with exiftool but there wasn't" but I'm quite stumped.

I maintain a service written in Golang that reads file metadata over HTTP, piping the contents to stdin and reading it with exiftool. However, within the past few months I've found that the FileSize tag has been exhibiting some inconsistent behavior, with the exiftool output oscillating between the correct FileSize and "0 bytes".

I've managed to reproduce this behavior from the command line and my findings are below:

Initially I was running the following command but noticed at the bottom of the output I would occasionally get the error "failed writing body" as you can see at the bottom of the output:

curl -vvv "https://cdn-staging.bfldr.com/FGTMYU34/at/nsbrj5rhcg43q5r3cp7pmnjv/astro.jpeg" --output - | exiftool -fast -u -j -


[{
  "SourceFile": "-",
  "ExifToolVersion": 12.30,
  "FileSize": "0 bytes",
  "FileModifyDate": "2022:05:28 11:09:54-06:00",
  "FileAccessDate": "2022:05:28 11:09:54-06:00",
  "FileInodeChangeDate": "2022:05:28 11:09:54-06:00",
  "FilePermissions": "prw-rw----",
  "FileType": "JPEG",
  "FileTypeExtension": "jpg",
  "MIMEType": "image/jpeg",
  "JFIFVersion": 1.01,
  "ResolutionUnit": "None",
  "XResolution": 1,
  "YResolution": 1,
  "ImageWidth": 392,
  "ImageHeight": 500,
  "EncodingProcess": "Baseline DCT, Huffman coding",
  "BitsPerSample": 8,
  "ColorComponents": 3,
  "YCbCrSubSampling": "YCbCr4:2:0 (2 2)",
  "ImageSize": "392x500",
  "Megapixels": 0.196
}]
100 51513  100 51513    0     0   136k      0 --:--:-- --:--:-- --:--:--  138k
* Connection #0 to host cdn-staging.bfldr.com left intact
curl: (23) Failed writing body

However even after modifying the command as follows to ensure a full read:

curl "https://cdn-staging.bfldr.com/FGTMYU34/at/nsbrj5rhcg43q5r3cp7pmnjv/astro.jpeg" --output - | tac | tac | exiftool -fast -u -j -

If I run the command a few times in a row, I find that *sometimes* it will return a populated FileSize, and sometimes it will not. Below I've included two verbose curl outputs, one each with zero and non-zero FileSize and I cannot find any issue. Hence coming here!

Curl output with "0 bytes" FileSize

* Connected to cdn-staging.bfldr.com (151.101.70.217) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (OUT), TLS handshake, Client hello (1):
} [326 bytes data]
* (304) (IN), TLS handshake, Server hello (2):
{ [102 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [2867 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [300 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [37 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=*.bfldr.com
*  start date: Mar 23 13:53:24 2022 GMT
*  expire date: Apr 24 13:53:23 2023 GMT
*  subjectAltName: host "cdn-staging.bfldr.com" matched cert's "*.bfldr.com"
*  issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign Atlas R3 DV TLS CA 2022 Q1
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fe30a00b600)
> GET /FGTMYU34/at/nsbrj5rhcg43q5r3cp7pmnjv/astro.jpeg HTTP/2
> Host: cdn-staging.bfldr.com
> user-agent: curl/7.79.1
> accept: */*
>
< HTTP/2 200
< cache-control: max-age=300, public, no-transform
< content-disposition: inline
< content-type: image/jpeg
< etag: "ddd0588ba9fc957f95a07e0f22536735"
< last-modified: Fri, 27 May 2022 17:07:36 GMT
< x-goog-generation: 1653671256170851
< x-goog-metageneration: 1
< x-goog-stored-content-encoding: identity
< x-goog-stored-content-length: 51513
< access-control-allow-origin: *
< accept-ranges: bytes
< date: Sat, 28 May 2022 17:14:26 GMT
< age: 86062
< vary: Origin
< strict-transport-security: max-age=900
< content-length: 51513
<
{ [1090 bytes data]
100 51513  100 51513    0     0   320k      0 --:--:-- --:--:-- --:--:--  339k
* Connection #0 to host cdn-staging.bfldr.com left intact
[{
  "SourceFile": "-",
  "ExifToolVersion": 12.30,
  "FileSize": "0 bytes",
  "FileModifyDate": "2022:05:28 11:14:26-06:00",
  "FileAccessDate": "2022:05:28 11:14:26-06:00",
  "FileInodeChangeDate": "2022:05:28 11:14:26-06:00",
  "FilePermissions": "prw-rw----",
  "FileType": "JPEG",
  "FileTypeExtension": "jpg",
  "MIMEType": "image/jpeg",
  "JFIFVersion": 1.01,
  "ResolutionUnit": "None",
  "XResolution": 1,
  "YResolution": 1,
  "ImageWidth": 392,
  "ImageHeight": 500,
  "EncodingProcess": "Baseline DCT, Huffman coding",
  "BitsPerSample": 8,
  "ColorComponents": 3,
  "YCbCrSubSampling": "YCbCr4:2:0 (2 2)",
  "ImageSize": "392x500",
  "Megapixels": 0.196
}]


Curl output with correct FileSize

* Connected to cdn-staging.bfldr.com (151.101.70.217) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (OUT), TLS handshake, Client hello (1):
} [326 bytes data]
* (304) (IN), TLS handshake, Server hello (2):
{ [102 bytes data]
* TLSv1.2 (IN), TLS handshake, Certificate (11):
{ [2867 bytes data]
* TLSv1.2 (IN), TLS handshake, Server key exchange (12):
{ [300 bytes data]
* TLSv1.2 (IN), TLS handshake, Server finished (14):
{ [4 bytes data]
* TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
} [37 bytes data]
* TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
} [1 bytes data]
* TLSv1.2 (OUT), TLS handshake, Finished (20):
} [16 bytes data]
* TLSv1.2 (IN), TLS change cipher, Change cipher spec (1):
{ [1 bytes data]
* TLSv1.2 (IN), TLS handshake, Finished (20):
{ [16 bytes data]
* SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=*.bfldr.com
*  start date: Mar 23 13:53:24 2022 GMT
*  expire date: Apr 24 13:53:23 2023 GMT
*  subjectAltName: host "cdn-staging.bfldr.com" matched cert's "*.bfldr.com"
*  issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign Atlas R3 DV TLS CA 2022 Q1
*  SSL certificate verify ok.
* Using HTTP2, server supports multiplexing
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x7fcd6b00d400)
> GET /FGTMYU34/at/nsbrj5rhcg43q5r3cp7pmnjv/astro.jpeg HTTP/2
> Host: cdn-staging.bfldr.com
> user-agent: curl/7.79.1
> accept: */*
>
< HTTP/2 200
< cache-control: max-age=300, public, no-transform
< content-disposition: inline
< content-type: image/jpeg
< etag: "ddd0588ba9fc957f95a07e0f22536735"
< last-modified: Fri, 27 May 2022 17:07:36 GMT
< x-goog-generation: 1653671256170851
< x-goog-metageneration: 1
< x-goog-stored-content-encoding: identity
< x-goog-stored-content-length: 51513
< access-control-allow-origin: *
< accept-ranges: bytes
< date: Sat, 28 May 2022 17:15:49 GMT
< age: 86144
< vary: Origin
< strict-transport-security: max-age=900
< content-length: 51513
<
{ [1090 bytes data]
100 51513  100 51513    0     0   444k      0 --:--:-- --:--:-- --:--:--  474k
* Connection #0 to host cdn-staging.bfldr.com left intact
[{
  "SourceFile": "-",
  "ExifToolVersion": 12.30,
  "FileSize": "50 KiB",
  "FileModifyDate": "2022:05:28 11:15:49-06:00",
  "FileAccessDate": "2022:05:28 11:15:49-06:00",
  "FileInodeChangeDate": "2022:05:28 11:15:49-06:00",
  "FilePermissions": "prw-rw----",
  "FileType": "JPEG",
  "FileTypeExtension": "jpg",
  "MIMEType": "image/jpeg",
  "JFIFVersion": 1.01,
  "ResolutionUnit": "None",
  "XResolution": 1,
  "YResolution": 1,
  "ImageWidth": 392,
  "ImageHeight": 500,
  "EncodingProcess": "Baseline DCT, Huffman coding",
  "BitsPerSample": 8,
  "ColorComponents": 3,
  "YCbCrSubSampling": "YCbCr4:2:0 (2 2)",
  "ImageSize": "392x500",
  "Megapixels": 0.196
}]


The above two curl outputs were generated simply by re-running the exact curl command above a few times until FileSize was observed as populated. I have reproduced this on MacOS Monterey 12.4, as well as Alpine 3.16 all with exiftool 12.30.

Thanks in advance for any time and attention!

StarGeek

Do you get the same results when you remove the -fast option.  As per the docs
     With -fast (or -fast1), ExifTool will not scan to the end of a JPEG image to check for an AFCP or PreviewImage trailer

I believe that exiftool is closing the pipe once it hits the SOS (Start Of Scan) marker, which is where the image data begins.  If you add the -v3 (-verbose3) option, you should see something like
JPEG SOS
100 51513  100 51513    0     0  47151      0  0:00:01  0:00:01 --:--:-- 47216

as the last output from exiftool.  Where if you remove the -fast option, you should see
0JPEG SOS
100 51513  100 51513    0     0  63522      0 --:--:-- --:--:-- --:--:-- 63596
JPEG EOI

which shows exiftool has reached the EOI (End of Image) marker.

Also, cURL returns a curl: (23) Failure writing output to destination error when the -fast option is included on my system.  Is that missing in your case?

Still, on my end, exiftool is returning either 8KiB or 16KiB regardless of if I include the -fast option or not.  I don't get the 50 KiB that is the actual size of the file.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

dylanj

Thanks for the response. Just ran the following (removing the -fast option):
curl -v "https://cdn-staging.bfldr.com/FGTMYU34/at/nsbrj5rhcg43q5r3cp7pmnjv/astro.jpeg" --output - | tac | tac | exiftool -u -j -

Ran it 6 times or so and noted the same behavior. About 30% of the time I will receive a populated FileSize tag of "50 KiB". Anecdotally just playing around with different permutations of args when just running a vanilla "exiftool -" command, the data returned was consistently "0 bytes" which may be indicative of a full/partial scan based on filetypes.

Regarding your note about the "-fast" option in general and the reading of the whole curl body. I got that error before a quick google search instructed me to use a double-piped tac command to reverse read, then re-reverse the original input ensuring a full curl is complete before piping to exiftool

Sorry for so many edits but last one: The verbose option is helpful though I wasn't able to pick much out of the following, maybe someone else will note something
File size is present, last notation is JPEG SOS

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 51513  100 51513    0     0   364k      0 --:--:-- --:--:-- --:--:--  384k
  ExifToolVersion = 12.30
  FileSize = 51513
  FileModifyDate = 1653766635
  FileAccessDate = 1653766635
  FileInodeChangeDate = 1653766635
  FilePermissions = 4528
  FileType = JPEG
  FileTypeExtension = JPG
  MIMEType = image/jpeg
JPEG APP0 (14 bytes):
  + [BinaryData directory, 9 bytes]
  | JFIFVersion = 1 1
  | ResolutionUnit = 0
  | XResolution = 1
  | YResolution = 1
  | ThumbnailWidth = 0
  | ThumbnailHeight = 0
JPEG DQT (65 bytes):
JPEG DQT (65 bytes):
JPEG SOF0 (15 bytes):
  ImageWidth = 392
  ImageHeight = 500
  EncodingProcess = 0
  BitsPerSample = 8
  ColorComponents = 3
  YCbCrSubSampling = 2 2
JPEG DHT (28 bytes):
JPEG DHT (86 bytes):
JPEG DHT (26 bytes):
JPEG DHT (57 bytes):
JPEG SOS


Filesize is NOT present, but last notation is still JPEG SOS

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 51513  100 51513    0     0   309k      0 --:--:-- --:--:-- --:--:--  324k
  ExifToolVersion = 12.30
  FileSize = 0
  FileModifyDate = 1653766648
  FileAccessDate = 1653766648
  FileInodeChangeDate = 1653766648
  FilePermissions = 4528
  FileType = JPEG
  FileTypeExtension = JPG
  MIMEType = image/jpeg
JPEG APP0 (14 bytes):
  + [BinaryData directory, 9 bytes]
  | JFIFVersion = 1 1
  | ResolutionUnit = 0
  | XResolution = 1
  | YResolution = 1
  | ThumbnailWidth = 0
  | ThumbnailHeight = 0
JPEG DQT (65 bytes):
JPEG DQT (65 bytes):
JPEG SOF0 (15 bytes):
  ImageWidth = 392
  ImageHeight = 500
  EncodingProcess = 0
  BitsPerSample = 8
  ColorComponents = 3
  YCbCrSubSampling = 2 2
JPEG DHT (28 bytes):
JPEG DHT (86 bytes):
JPEG DHT (26 bytes):
JPEG DHT (57 bytes):
JPEG SOS


The command used for both of the above (again just running it multiple times by pressing up-arrow then enter in my terminal):
curl "https://cdn-staging.bfldr.com/FGTMYU34/at/nsbrj5rhcg43q5r3cp7pmnjv/astro.jpeg" --output - | tac | tac | exiftool -u -fast -j -v -

StarGeek

Sorry, but I don't have any other ideas.  The best I can say is that if you really need the file size, then you'll probably have to save to a temp file before running exiftool on it. Like I said, running on my Windows machine, I only get a files size 8 or 16 KiB, never 0 or the actual files size.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

dylanj

Hmm, quite odd. Thank you or your replies and I definitely don't have a guess as to why you'd get those numbers on Windows. Luckily we're not particularly reliant on the FileSize tag itself in our workflows so really not the biggest deal, but just some concerning inconsistency for what I would imagine is a straightforward piece of metadata generally.

Appreciate the help, I'll leave this open for a bit in case there are any other bits of feedback

Hayo Baan

Interesting! I tried this too and indeed, sometimes I get the correct 50kiB size, sometimes I get 0kiB. I'm quite sure this has to do with how/when the temp file for the pipe is examined by ExifTool. I looked at the code that determines the file size tag and indeed it makes use of stat to determine the file size (which operates on a file).
Hayo Baan – Photography
Web: www.hayobaan.nl

StarGeek

And since Windows doesn't have a stat command, it gets completely different results.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Hayo Baan

Quote from: StarGeek on May 29, 2022, 09:51:59 AM
And since Windows doesn't have a stat command, it gets completely different results.

But perl (on Windows) does! So that's not the issue here ;)
It's the behaviour of the "file" for the pipe (at least that's what I must conclude from the code and behaviour).
Hayo Baan – Photography
Web: www.hayobaan.nl

StarGeek

Ah, I assumed it was calling the stat from the command line.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

Interesting, but not unexpected.  ExifTool would have to read the entire file over the pipe to reliably determine the size.  stat seems to return the amount of the file that is currently buffered.  The fix would be for me to patch ExifTool to not generate FileSize for buffered files, but this is a low priority issue.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).