ExifTool Forum

ExifTool => Developers => Topic started by: herb on July 11, 2018, 10:11:12 AM

Title: Question to NoDups
Post by: herb on July 11, 2018, 10:11:12 AM
Hello Phil,

in order to avoid duplicate entries in listtype tags I did some tests with NoDups feature.
In forum I have seen that all NoDups-examples use separator ##.
exiftool -sep "##" "-keywords<${keywords;NoDups}" DirOrFile
During my tests I have seen that using e.g. separator ? or | will lead to unexpected results.

Ok, ## is a good separator, but it cannot be sent when -stay_open is used.

So my question is: Which characters must not be used as separator for NoDups feature.

Thanks for your help in advance.

Best regards
Herb
Title: Re: Question to NoDups
Post by: Phil Harvey on July 13, 2018, 07:17:28 AM
Hi Herb,

Good point.  I'll fix this in version 11.07.  I should have been quoting these special characters in the regular expression.

Thanks for pointing this out.

- Phil
Title: Re: Question to NoDups
Post by: Phil Harvey on July 13, 2018, 07:53:49 AM
Hi again,

This also raises the issue of a big oversight on my part.  You should be able to start an argument with "#' in the argfile.  ExifTool 11.07 will allow you to add "##" as an argument like this:

#[NOT_COMMENT]##

So if you have any argument that starts with "#", just replace this with "#[NOT_COMMENT]#" in the argfile to avoid it being interpreted as a comment.


- Phil

Edit: I just realized the you can in fact add an argument of "##" by putting a space before it.  But then there is no mechanism to have an argument start with a space.  I will re-think this.
Title: Re: Question to NoDups
Post by: Phil Harvey on July 13, 2018, 08:27:46 AM
OK, second try.

ExifTool 11.07 will add a feature allowing an argfile line to begin with "#[CSTR]".  The rest of the line is taken as a standard C string.  As well as allowing spaces at the start of a line, this new feature also allows newlines to be embedded.  For example:

#[CSTR] this argument has a leading space

#[CSTR]this\nargument\nhas\nmultiple\lines

This also provides a mechanism to have an empty argument if necessary (which wasn't allowed before):

#[CSTR]

As well as allowing "##" as an argument, which was the initial impetus:

#[CSTR]##

- Phil
Title: Re: Question to NoDups
Post by: herb on July 27, 2018, 11:58:12 AM
Hello Phil,

thanks for the new version 11.07 and also many thanks for all the corrections.

A very short test showed that e.g. character "," is working as separator for NoDups feature.
The new feature #[CSTR] is also working properly when I start Exiftool from Dos-Box and call an argument file with -@ option.

But when I send a line starting with #[CSTR] via pipe to Exiftool - using the -stay_open feature -  this line is still ignored by Exiftool.
Or is there a misunderstanding ony my side?

Best regards
Herb
Title: Re: Question to NoDups
Post by: Phil Harvey on July 27, 2018, 12:05:27 PM
Hi Herb,

Ooops, you're right.  I have different code to filter arguments from the stay_open file.  The new feature needs to be applied in two different places.  :(

I'll fix this in the next release.

Thanks for letting me know.

- Phil
Title: Re: Question to NoDups
Post by: herb on July 28, 2018, 04:41:43 AM
Hello Phil,

thanks for your quick reply.

You created #[CSTR] feature and now we have a very easy possibility to enter e.g. NEWLINE into a textstring.

Please allow to think loud about the output of such a NEWLINE.
As far as I know Exiftool changes - in general - all binary characters to a dot inside the output.
Would it be helpful to modiy this feature and change NEWLINE to "\n" instead of a dot?

Best regards
Herb
Title: Re: Question to NoDups
Post by: Phil Harvey on July 28, 2018, 07:52:45 AM
Hi Herb,

A change like that would require a new command-line option (otherwise it wouldn't be backward compatible), but I really try to avoid that if possible.

- Phil
Title: Re: Question to NoDups
Post by: herb on July 28, 2018, 10:24:21 AM
Hello Phil,

thanks for your reply.
I agree Exiftool should always be backward compatible and I also agree not to introduce a new commandline option.
But what do you think about using an userParam - called NEWLINE?

Best regards
Herb
Title: Re: Question to NoDups
Post by: StarGeek on July 28, 2018, 01:38:20 PM
Quote from: herb on July 28, 2018, 04:41:43 AM
Would it be helpful to modiy this feature and change NEWLINE to "\n" instead of a dot?

This can be done without adding new command options.  Add the following subroutine to your .exiftool_config file, somewhere before the main definitions (i.e. before %Image::ExifTool...)
# Replace NewLines (and other characters) with their escape sequences
sub RepNL {
($_)=@_;
my %replace = (
"\n" => "\\n", # line feed 0x0A
"\r" => "\\r", # carriage return 0x0D
"\t" => "\\t", # tab 0x09
);
s/(@{[join '|', map { quotemeta($_) } keys %replace]})/$replace{$1}/g;
}


You can now use the RepNL (rename as you desire for ease of use) as an advanced formatting option. 

Examples:
C:\>exiftool -Echo "Set Description and Caption with whitespace control characters" -P -overwrite_original -E -description="NL:&#x0a; CR:&#x0d; Tab:&#x09;" -Caption-Abstract="NL:&#x0a; CR:&#x0d; Tab:&#x09;" -E y:\!temp\Test3.jpg
Set Description and Caption with whitespace control characters
    1 image files updated

C:\>exiftool -g1 -a -s -Description -Caption-Abstract y:\!temp\Test3.jpg
---- XMP-dc ----
Description                     : NL:. CR:. Tab:.
---- IPTC ----
Caption-Abstract                : NL:. CR:. Tab:.

C:\>exiftool -g1 -a -s -echo "Global Replace NLs" -api "Filter=RepNL($_)" -Description -Caption-Abstract y:\!temp\Test3.jpg
Global Replace NLs
---- XMP-dc ----
Description                     : NL:\n CR:\r Tab:\t
---- IPTC ----
Caption-Abstract                : NL:\n CR:\r Tab:\t

C:\>exiftool -g1 -a -s -echo "Using -p, selectively replace"  -p "${Caption-Abstract;RepNL($_)}" y:\!temp\Test3.jpg
Using -p, selectively replace
NL:\n CR:\r Tab:\t


Other non-printable characters can be added if you desire.
Title: Re: Question to NoDups
Post by: herb on July 29, 2018, 04:10:31 AM
Hello StarGeek, hello Phil,

@StarGeek:
Thanks for your hints and thanks for your Perl code; it could be very helpful.
But your design will only work when I ask explicitely for a tag; e.g.: -Description in your example.
Or am I wrong?
Will it also work when I display all tags with -all:all because of the -api filter=... option?

Edit:
In meantime I did some more tests and now I also got it working with filter defined inside config file.


@Phil:
Please allow some additional questions to #[CSTR] feature:
(1) I thought \n is system depending: so \n == h'0A0D on Windows systems.
     A short test showed that also on Windows only h'0A will be used.

     Edit: Please forget (1): I mixed NEWLINE and LINEFEED.
(2) Using #[CSTR] feature I also think on Unicode characters, strings as values of listtype tags and of course also on structures.
     In a short test I entered Chinese characters to IPTC and XMP listtype tags and I have seen no restriction.
     But just to be sure: are there any restrictions?
     In combination with structures I think on possible problems because of escaping some characters.

Thanks again for your help in advance.
Best regards
Herb
Title: Re: Question to NoDups
Post by: Phil Harvey on July 30, 2018, 07:20:25 AM
Very smart StarGeek.

The #[CSTR] should work fine for inputting structures and Unicode characters.  The only difference is that it unescapes characters as if it was a double-quoted Perl string (no not actually a C string because Perl adds a few extra features).  See section 2.3.2 here (http://mkweb.bcgsc.ca/intranet/perlbook/lperl/ch02_03.htm).

- Phil
Title: Re: Question to NoDups
Post by: herb on July 30, 2018, 11:55:21 AM
Hello Phil, hello StarGeek,

thanks again for your detail hints.

I see, I have to learn more about Perl.
But at the moment it looks like that I have only to escape backslash.

Best regards
Herb
Title: Re: Question to NoDups
Post by: Phil Harvey on July 30, 2018, 12:32:01 PM
Hi Herb,

That reminds me.  (For now) you must also escape "$" and "@" symbols.  I'll fix this in the next release.

- Phil
Title: Re: Question to NoDups
Post by: herb on July 30, 2018, 01:54:40 PM
Hello Phil,

thanks; you are really great!

Best regards
Herb
Title: Re: Question to NoDups
Post by: Phil Harvey on August 01, 2018, 07:48:37 AM
Hi Herb,

Version 11.08 is out now.  Hopefully this fixes all of the "#[CSTR]" issues.

- Phil
Title: Re: Question to NoDups
Post by: herb on August 04, 2018, 12:42:15 PM
Hello Phil, hello StarGeek,

@Phil:
Thanks for the new version of Exiftool. #[CSTR] works fine sending strings via pipe.

@Phil and StarGeek:
Thanks again for the hint to use -api "Filter=RepNL($_)" in order to get infos about newline.
I tried to use the filter option inside a .exiftool_config file. This does also modify values when I copy them e.g. from *.jpg image into a *.xmp sidecar file.

So my question is: when does this filter option come into effect.
The description in options for the PerlLibrary is not clear to me.

Thanks for your help in advance.
Best regards
Herb
Title: Re: Question to NoDups
Post by: StarGeek on August 04, 2018, 02:16:14 PM
Quote from: herb on August 04, 2018, 12:42:15 PM
This does also modify values when I copy them e.g. from *.jpg image into a *.xmp sidecar file.

Are you saying that it is replacing newlines even when you don't use the -api "Filter=... option?  It shouldn't do that.

QuoteSo my question is: when does this filter option come into effect.

If you add -api "filter...", it will affect all tags you extract.  From the api options (https://exiftool.org/ExifTool.html#Options):
"Perl expression used to filter all returned tag values" (emphasis mine)

If you need to turn it off, you can by adding the hashtag # to the end of any tag you don't want to process. So, for example, you want to use the RepNL function on Description, but also want to compare it to the original, not filtered version, you can use:
exiftool -api "Filter=RepNL($_)" -Description -Description# File.
which would give you something like this:
C:\Programs\My_Stuff>exiftool -g1 -a -s -api "Filter=RepNL($_)" -Description -Description# y:\!temp\Test3.jpg
---- XMP-dc ----
Description                     : NL:\n CR:\r Tab:\t
Description                     : NL:. CR:. Tab:.


If you're copying tags and you only want one or two to be affected, then you can use the function in advanced formatting rather than using it in filter.  For example, here Description will be affected by RepNL, but any other tag will not.

exiftool -TagsFromFile image.jpg "-Description<${Description;RepNL($_)}" -DateTimeOriginal -OtherTag -AnotherTag image.xmp
Title: Re: Question to NoDups
Post by: herb on August 04, 2018, 03:24:27 PM
Hello StarGeek,

thanks for your quick reply and thanks for the detailed explanation.
Now I understand, how filter is working.

Just to clarify:
QuoteAre you saying that it is replacing newlines even when you don't use the -api "Filter=... option?  It shouldn't do that.
Newlines were replaced only when -api "Filter=..." was used.

Best regards
Herb
Title: Re: Question to NoDups
Post by: herb on August 05, 2018, 06:34:24 AM
Hello StarGeek, hello Phil,

@StarGeek:
Just an additional short comment on sub RepNL. This function should also escape existing backslash.

@Phil and StarGeek:
When I write a multiline text into a tag, which "newline" character(s) should be used: CRLF, only LF or only CR.
Should I use that one depending on the operating system: e.g CRLF on Windows.
Or is there a standard which defines it?

Thanks for your comments in advance
Best regards
Herb
Title: Re: Question to NoDups
Post by: StarGeek on August 05, 2018, 12:20:11 PM
Quote from: herb on August 05, 2018, 06:34:24 AM
Just an additional short comment on sub RepNL. This function should also escape existing backslash.

I don't have time to test it right now, but try add this line
"\\" => "\\\\",
right above the one with \n  Test it out first.

QuoteWhen I write a multiline text into a tag, which "newline" character(s) should be used: CRLF, only LF or only CR.
Should I use that one depending on the operating system: e.g CRLF on Windows.
Or is there a standard which defines it?

I haven't seen any standard that defines what should be used, not that I've gone searching, though.  Myself, I only use LF and actually change CRLF to LF in any file that I acquire as part of my preprocessing.  Never had a problem.  I don't think any modern software is going to have a problem with the difference.  Even windows will display just line feeds properly, try printing out a multiline tag with the -b option and see.

I will mention that Lightroom 4.4 (don't know about later versions) would do something odd when writing multiline text in the Caption-Abstract and Description fields.  It would write CR for one tag and LF for the other (can't remember which it wrote for which tag). 

It comes down to what software you use and if it has problems with one or the other.  And it can always be changed pretty easily with exiftool later on if you do find you have problems.
Title: Re: Question to NoDups
Post by: herb on August 05, 2018, 02:22:42 PM
Hello StarGeek,

thanks again for the detailed answer to my annoying questions.
It is very helpful.

Best regards
Herb
Title: Re: Question to NoDups
Post by: StarGeek on August 05, 2018, 03:22:29 PM
Quote from: herb on August 05, 2018, 02:22:42 PM
thanks again for the detailed answer to my annoying questions.
It is very helpful.

Not annoying at all.  They get me to think about new ways of using exiftool and get my thoughts on various aspects of image metadata out there.