ExifTool Forum

ExifTool => Archives => Topic started by: Archive on May 12, 2010, 08:54:34 AM

Title: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by pixelpicker on 2009-05-04 22:05:01-07]

Hello to all - and Special Hello to Phil.

After squeezing my head without finding a solution I post my question:

(I posted something related here: copying the caption of different in one new image, but I can't create a script doing this process vice versa.)

I have a lot of images with IPTC headline, keywords and captions in the following format:

headline: language1 blabla blabla | language1 blabla blabla

caption:  language1 blabla blabla | language1 blabla blabla

Keyword1: language1 bla1 | language1 bla2

Keyword2: language1 bla3 | language1 bla4

Keyword3: language1 bla5 | language1 bla6   etc.

This keywords look in some other applications like this (Comma separated):
language1 bla1 | language1 bla2, language1 bla3 | language1 bla4, language1 bla5 | language1 bla6  

One image is keyworded in two languages. The language texts in headline and caption are separated by a " | " Pipe and each language term in keywords is separated also by a Pipe.

This is a horror, cause it gives an information salad Smiley

I read in the Exiftool specifications on XMP, that it is possible to save separate languages in XMP fields with language code eg. -en for english. VERY nice! This is what I would like to do, but with my nixed language images.

The logic would be, that Exiftool takes an image and

1) looks in the above three fields for the " | " Pipe.

4) Then copys the the part before the Pipe to the related XMP-language1 filed and

3) then the part after the Pipe to the related XMP-language2 filed

The result would be an image file with separated languages in XMP.

Could anybody give me a hint on how to create the script for this?

Many greetings from

pixelpicker
Title: Re: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by pixelpicker on 2009-05-04 22:59:59-07]

Ups - sorry: the sheme for an image looks of course like this:

headline: language1 blabla blabla | language2 blabla blabla

caption: language1 blabla blabla | language2 blabla blabla

Keyword1: language1 bla1 | language2 bla2

Keyword2: language1 bla3 | language2 bla4

Keyword3: language1 bla5 | language2 bla6 etc.

keywords look in some other applications like this (Comma separated): language1 bla1 | language2 bla2, language1 bla3 | language2 bla4, language1 bla5 | language2 bla6

Many greetings from

pixelpicker
Title: Re: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by exiftool on 2009-05-05 11:33:55-07]

Unfortunately, only the caption (XMP-dc:Description) supports
alternate languages in XMP.  The keywords (XMP-dc:Subject) and
headline (XMP-photoshop:Headline) do not support alternate
languages.  (Only lang-alt type tags support alternate languages
in XMP -- see the tag name documentation for details.)

Given this limitation, do you still want to try to separate the caption
languages?

- Phil
Title: Re: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by pixelpicker on 2009-05-05 14:31:08-07]

Hello Phil! Hope you had a wonderful vacation - welcome back Smiley

Yes, I have to do something to separate the language-keywords.

Cause its not possible to write the two languages completely into the XMP it makes no sense to use XMP. Sad.

Half the way I found a solution:

I use your unser-defined tags with the config-file. For the caption it looks like this:

Code:
       MyCaptionB => {
          Require => 'Caption-Abstract',
          ValueConv => q{ $val=~s/([^\|]*)\| .*/\1/ ? $val : undef },
        },

This script cuts everything after the separator | as it looks up to now. (I say "looks" cause I found the command
Code:
s/([^\|]*)\| .*/\1/
in the net and after experimenting it seems to work.)

What remains is a script to cut vice versa: cut everything before  |

One could use the double keyworded images for storing and when a language is needed one could create a copy of the image an delete the not needed language in this new image. The command for copying and deleting after the | should look something like this:

Code:
exiftool -o DIR "-caption-abstract<mycaptionb" DIR

Do you have any clue how to cut vice versa?

Many greetings

pixelpicker
Title: Re: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by exiftool on 2009-05-05 15:10:17-07]

I would suggest pulling out the languages by name,
maybe something like this:

Code:
       CaptionLanguage1 => {
            Require => 'Caption-Abstract',
            ValueConv => '$val =~ /(^|\|)\s*language1\s+(.*?)\s*(\||$)/si ? $2 : undef',
        },
        CaptionLanguage2 => {
            Require => 'Caption-Abstract',
            ValueConv => '$val =~ /(^|\|)\s*language2\s+(.*?)\s*(\||$)/si ? $2 : undef',
        },

Here, the regular expression matches "language1" or "language2" (case insensitive),
then takes all text following this up to the "|" symbol or the end of the string.

- Phil
Title: Re: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by exiftool on 2009-05-05 15:17:22-07]

I should mention that this will be more complex for the
Keywords since they are a list-type tag.  In this case,
you may have to loop through elements in the array:

Code:
       KeywordsLanguage1 => {
            Require => 'Keywords',
            ValueConv => q{
                my @vals = ref $val ? @$val : ($val);
                foreach $val (@vals) {
                    $val =~ /(^|\|)\s*language1\s+(.*?)\s*(\||$)/ and $val = $2;
                }
                return \@vals;
            },
        },

- Phil
Title: Re: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by exiftool on 2009-05-05 15:27:29-07]

Here is maybe a better way to do the keywords.  The above example
returned the entire keyword string if the language didn't exist,
while this version returns nothing:

Code:
       KeywordsLanguage1 => {
            Require => 'Keywords',
            ValueConv => q{
                my @vals;
                foreach (ref $val eq 'ARRAY' ? @$val : $val) {
                    push @vals, $2 if /(^|\|)\s*language1\s+(.*?)\s*(\||$)/;
                }
                return @vals ? \@vals : undef;
            },
        },

- Phil
Title: Re: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by exiftool on 2009-05-05 15:31:08-07]

Ooops.  Forgot to qualify the regular expression with "si" in the last
2 examples to allow matching of newlines in the text with case
insensitivity.

- Phil
Title: Re: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by pixelpicker on 2009-05-05 21:14:45-07]



WOW PHIL! Thanks for your quick and elegand solution! - like always Smiley

The idea to difference between the languages in the naming too is very helpful.

But I didn't understand correctly how I have to use the code, I must have made a mistake cause it doesn't do here what it should.

If I understood proper your code for CaptionLanguage1 and CaptionLanguage2 cuts everything after the | ? Or goes till the end when no pipe is there. But what does the  
Code:
.../(^|\|)\s*language1\...
"language1" in this part? Is it a variable?

What I did is, to put your code of the CaptionLanguage1 in the config, renamed the necessary parts to "english" to extract the second language.

But ExifTool says: 1 directories scanned 0 images updated Sad

Any idea whats wrong?

Have a good day.

Greetings from

pixelpicker
Title: Re: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by pixelpicker on 2009-05-05 21:22:01-07]

Sorry - I meant:

...CaptionLanguage1 and CaptionLanguage2 cuts everything before the | ?

Cause this is whats needed.

Greetings from

pixelpicker
Title: Re: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by exiftool on 2009-05-06 11:31:11-07]

The expression for CaptionLanguage1 looks for the string "language1",
and takes the text after this (up to the "|" or the end of the string, whichever
comes first).  You need to change "language1" in this expression to be the
name of the actual language you want.

Code:
> exiftool a.jpg -caption-abstract -captionlanguage1 -captionlanguage2
Caption-Abstract                : language1 bla1 | language2 bla2
Caption Language 1              : bla1
Caption Language 2              : bla2

- Phil
Title: Re: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by pixelpicker on 2009-05-06 13:58:03-07]

OK - there was a misunderstanding - I didn't say clearly how the metadata is given:

its like this:

Caption-Abstract : Auto | car

Caption Language 1 : Auto

Caption Language 2 : car

Theres no "English" or other language in front.

I tried to modify your script. What I found out up to now:

Code:
        CaptionEnglish => {
           Require => 'Caption-Abstract',
           ValueConv => '$val =~ /(^|\|)\s*\s+(.*?)\s*(\||$)/si ? $2 : undef',
         },

This creates the "car". Verry Good, nearly done Smiley

But:

Code:
       CaptionDeutsch => {
          Require => 'Caption-Abstract',
          ValueConv => '$val =~ /(^|\|)\s*|\s+(.*?)\s*(\||$)/si ? $2 : undef',
        },

creates nothing Sad

Could you help me one more time, please?

Many greetings from

pixelpicker
Title: Re: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by exiftool on 2009-05-06 15:10:26-07]

Sorry, yes, I misunderstood.  So you do simply want to extract all
text before and after the "|":

Code:
       CaptionLanguage1 => {
            Require => 'Caption-Abstract',
            ValueConv => '$val =~ /^(.*?)\s*\|/s ? $1 : undef',
        },
        CaptionLanguage2 => {
            Require => 'Caption-Abstract',
            ValueConv => '$val =~ /\|\s*(.*?)$/s ? $1 : undef',
        },

 
and works like this:

Code:
> exiftool a.jpg -caption-abstract -captionlanguage1 -captionlanguage2
Caption-Abstract                : language1 bla1 | language2 bla2
Caption Language 1              : language1 bla1
Caption Language 2              : language2 bla2

- Phil
Title: Re: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by pixelpicker on 2009-05-06 17:35:36-07]



************* Yes - this is the right thing!! *************

************* Works like a machine!! *************

************* Thank you very much Phil. *************

For the keywords I took your code like this:

Code:
       KeywordsDeutsch => {
          Require => 'Keywords',
          ValueConv => q{
            my @list = ref $val ? @$val : ($val);
            my $changed;
            s/([^\|]*)\| .*/\1/ and $changed = 1 foreach @list;
            return $changed ? \@list : undef;
          },
        },
        KeywordsEnglish => {
          Require => 'Keywords',
          ValueConv => q{
            my @vals;  
            foreach (ref $val eq 'ARRAY' ? @$val : $val) {  
            push @vals, $2 if /(^|\|)\s*\s+(.*?)\s*(\||$)/;  
          }
          return @vals ? \@vals : undef;      
          },
        },

No clue how it works in detail, but it does exactly what it should Smiley))

Hmm - but I have another question before I look on it more detailed:

Wouldn't it be possible with ExifTool to create own iptc-tags (caption, keywords, headline) for a second language? Of course this wouldn't be a standard but could help.

Have a good evening.

Greetings from

pixelpicker
Title: Re: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by exiftool on 2009-05-06 17:49:38-07]

Great.

Sure, you can create custom IPTC tags if you want.  Of course,
nothing but exiftool could ever read them... Smiley

If you want to define custom tags with future compatibility
in mind, XMP is the better choice.

- Phil
Title: Re: Separating languages in keywords & caption...
Post by: Archive on May 12, 2010, 08:54:34 AM
[Originally posted by pixelpicker on 2009-05-06 18:59:29-07]

As long as there is exiftool -  who cares? Smiley

But I will look deeper into this xmp-thing in some time.

For now be thanked and have a good time.

Greetings from

pixelpicker Smiley