Real length of tag content and Unicode characters

Started by herb, September 07, 2010, 01:27:47 PM

Previous topic - Next topic

herb

Hello,

My camera (an Olympus one) writes some Exif-tags with trailing spaces.
e.g.: ImageDescription = "OLYMPUS DIGITAL CAMERA          ".
In case of I want to replace this value I must know the real length of the tag-content
(how many spaces do I have to add after CAMERA).

I work on a windows systems and the communication with ExifTool is realized via stdin, stdout and stderr.
I always get a newline directly after the word "CAMERA" when I send the display command exiftool -all:all image.jpg

Is this because Exiftool truncates the spaces at end of each output line or is it because windows does it?

On one hand I will try to use the -json option to get the real length, because in this case the response is
"EXIF:ImageDescription" : "OLYMPUS DIGITAL CAMERA          "
This solves my problem as long as I work only with NON Unicode characters.


But one of my future goals is also to support Unicode characters within the tag-content and
because the -json option does not support the -charset or -L option and
because I do not have such images with unicode characters
I ask the question in general: how to solve this problem?

I hope someone will have a piece of advice how to handle this.
Can someone explain me how to manipulate some tag content with a hex-editor so that ExifTool thinks to have Unicode characters.
Is it a must to work with HTML-output via stdin and stdout?

Thanks in advance.
Best Regards
Herb

Phil Harvey

Quote from: herb on September 07, 2010, 01:27:47 PM
Is this because Exiftool truncates the spaces at end of each output line or is it because windows does it?

In this case it is the exiftool application is trimming the spaces.  The trimming is not done with any of the -b, -json or -X outputs.

QuoteBut one of my future goals is also to support Unicode characters within the tag-content and
because the -json option does not support the -charset or -L option and
because I do not have such images with unicode characters
I ask the question in general: how to solve this problem?

JSON doesn't support -charset because the JSON specification mandates UTF-8 encoding.

However, the -charset option works with both -b and -X.  (But only for supported XML character encodings with -X.)

Quote
Can someone explain me how to manipulate some tag content with a hex-editor so that ExifTool thinks to have Unicode characters.

I don't understand the question.  You can write special characters by generating them on the command line.  How to do this depends on your System.  On Mac, I press "Option-E" then "E" on the keyboard to get "é".  Then an argument like -usercomment=été will write Unicode characters.  But note that the ImaegDescription tag is ASCII by the Exif specification, so the -charset option will have no effect on its value.

If you just want examples with special characters, see t/images/MIE.mie in the full exiftool distribution.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

herb

Hello Phil,

thanks for your comments and hints. In the past I did not work with XML-output. So your hint was very helpful.

Doing again some tests that take care of your comments I found a solution to get the 'real length of a tag content' and also to get the output in german language using the options -X -l -lang de -charset Latin2 .
But in order to get a "simple" output made for human eyes it is much work to do the parsing of the XML-output.
When I asked my question I thought to get an output (maybe only in case of -t option is also used):
      EXIF<tab>ImageDescription<tab>tag_content_with_leading_and_trailing_spaces<newline>
   or
   ImageDescription<tab>tag_content_with_leading_and_trailing_spaces<newline>

Such a solution would also work for list-tags when -sep is used.
Would this be possible?

Thanks also for your comments to unicode. I now have a better starting point for unicode input and output.

Best Regards
Herb

Phil Harvey

Quote from: herb on September 10, 2010, 12:45:49 PM
      EXIF<tab>ImageDescription<tab>tag_content_with_leading_and_trailing_spaces<newline>
   or
   ImageDescription<tab>tag_content_with_leading_and_trailing_spaces<newline>


This is the output with the -t option, with -G added in the first case.

Your only problem is that exiftool trims trailing spaces.  To get around this, you could write a simple script to do what you want, or comment out $val =~ s/\s+$//; at line 1453 in the "exiftool" script (version 8.29).  If you are running the Windows version, you will find the "exiftool" script in your TEMP\par-USER directory.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

herb

Hello Phil,

thank you very much for your detailed comments.
I tried to follow it, but without success.

In the TEMP\par-USER directory I found only one script-file with name "exiftool".
I commented out line 1463 with content: $val =~ s/\s+$//; and started my test. It was the only line with the given content.
I tried in DOS-box with output redirection into a textfile and I also checked the exiftool output in my VB6 application,
after I received it via a pipe from stdout ( I am working on a WIN2000 system).
I always received no trailing spaces (although I had checked that the file contains it).

Sorry, when I asked my question I did not know to run into "such a big problem".

Thanks again and best regards
Herb

Phil Harvey

Hi Herb,

Oh well, it was worth a try.  I haven't ever done this myself but I thought it should work.

Changing the "exiftool" script is simple on other systems, but with the Windows .exe version it is more complex since the .exe bundles Perl, exiftool and all the necessary libraries together.  But I thought once it was unpacked in the TEMP directory that it was simply run from there.  I would try it myself but that would mean dragging out the Windows machine, which is a pain.  Maybe tomorrow.

BTW, why are you so interested in preserving the trailing spaces?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

herb

#6
Hello Phil,

thanks for your very good cooperation and thanks for your planned investigations in advance.

Well,  how did I come to that "problem".
I am working with exiftool for about 1 year and was never interested in leading and/or trailing spaces.
Accidentically I came to know that we can write a tag-content of a "string-tag" that contains leading or trailing spaces.
So I tried also to test the replace feature for such tags.
Replace did only work in case of leadings and trailing spaces were entered properly.
To get the number of leading spaces was no problem.
Then I asked my question about trailing spaces in the forum and thought to get a simple answer;  ... ok thats all.

Thanks again and best regards
Herb

Phil Harvey

Hi Herb,

You're right.  On my Windows XP machine, if I modify %TEMP%\par-Phil\cache-exiftool-8.30\inc\script\exiftool  then it doesn't work.  It seems that PAR has made a copy of the file and placed it in %TEMP%\par-Phil\cache-exiftool-8.30\43e5bff5.pl.  When I edit this file as I described, the patch works.  But I'm guessing the file may be named differently on your system.  I did a grep of all the files to find the one I needed to change.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

herb

Hello Phil,

thank you very much for your investigations.
With your help I could modify ExifTool on my machine and now I really get all characters of all tag-contents.
Wonderful! - As long as only I use this feature it is ok to modify the *.pl script.
But I think that others will also use it.

I hope I am not too annoying you, but would it be possible to get this feature as a standard feature of ExifTool.
I agree to change/modify the existing output-interface on the fly should not be done.
But would it be possible to get all trailing spaces in case of
- using the -t option or
- using a new option -t1, which means -t plus trailing spaces or
- enabling this feature via the .config file or
- ...
(I hope to talk about a low effort)

Waiting for a "yes" from your side, Best Regards
Herb 

Phil Harvey

#9
Hi Herb,

Thanks for the suggestions.  I'll think about this.

My usual response is that the "exiftool" application isn't meant to do absolutely everything.  This is especially true since it is so easy to code up custom solutions using simple scripts and the API library.  Also, the more features I add to exiftool the harder it is for people to find the features that they are looking for, so I really try hard to avoid adding new options, which rules out the -t1 idea.

- Phil

Edit: fixed markup
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

herb

Hello Phil,

do you have some news on it?
I hope I have nothing overseen.
Thanks in advance

Best regards
Herb

Phil Harvey

Hi Herb,

You haven't missed anything.  I haven't figured out how to handle this yet.

In my mind, the big problem is that the value for the -if condition is different from the value reported by exiftool.  If trailing spaces were removed from both values, would this make you happier?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

herb

Hello Phil,

thanks for your reply, thanks for your investigations and thanks for your patience.

That trailing spaces are taken into account in the -if statement is a very very good news from your side.
Please DO NOT change this behaviour.
In case of replacing a tag-content using e.g. the parameters -tag-="  value-old  " -tag="value-new" (where value-old has trailing space) these spaces are also taken into account properly.
Please DO also NOT change this behaviour.

My question, my goal now is to find the possibilty to get the trailing spaces inside a command-output
- that is simple to parse (e.g. no html, no xml, no binary output)
- and that supports sorting the output with -g0, -g1 and/or -g2

For me the solution is to replace the statement (near line 1463): $val =~ s/\s+$//;
because e.g. removing it the trailing spaces are included.

In case of an output read by human eyes it does not matter whether the trailing spaces are included or not.
But in an output e.g. read by an application or stored inside a file it is important to have these trailing spaces.
I think that such output typically will be generated using the -t option.

So I ask you to replace the statement: $val =~ s/\s+$//;
with the following statement:              If $tabFormat = 0 then $val =~ s/\s+$//; 
(remove trailing spaces only in case of -t option is NOT used)

Thanks in advance.

Best Regards
Herb