Separating languages in keywords & caption...

Started by Archive, May 12, 2010, 08:54:34 AM

Previous topic - Next topic

Archive

[Originally posted by pixelpicker on 2009-05-04 22:05:01-07]

Hello to all - and Special Hello to Phil.

After squeezing my head without finding a solution I post my question:

(I posted something related here: copying the caption of different in one new image, but I can't create a script doing this process vice versa.)

I have a lot of images with IPTC headline, keywords and captions in the following format:

headline: language1 blabla blabla | language1 blabla blabla

caption:  language1 blabla blabla | language1 blabla blabla

Keyword1: language1 bla1 | language1 bla2

Keyword2: language1 bla3 | language1 bla4

Keyword3: language1 bla5 | language1 bla6   etc.

This keywords look in some other applications like this (Comma separated):
language1 bla1 | language1 bla2, language1 bla3 | language1 bla4, language1 bla5 | language1 bla6  

One image is keyworded in two languages. The language texts in headline and caption are separated by a " | " Pipe and each language term in keywords is separated also by a Pipe.

This is a horror, cause it gives an information salad https://exiftool.org/forum/Smileys/default/smiley.gif" alt="Smiley" border="0" />

I read in the Exiftool specifications on XMP, that it is possible to save separate languages in XMP fields with language code eg. -en for english. VERY nice! This is what I would like to do, but with my nixed language images.

The logic would be, that Exiftool takes an image and

1) looks in the above three fields for the " | " Pipe.

4) Then copys the the part before the Pipe to the related XMP-language1 filed and

3) then the part after the Pipe to the related XMP-language2 filed

The result would be an image file with separated languages in XMP.

Could anybody give me a hint on how to create the script for this?

Many greetings from

pixelpicker

Archive

[Originally posted by pixelpicker on 2009-05-04 22:59:59-07]

Ups - sorry: the sheme for an image looks of course like this:

headline: language1 blabla blabla | language2 blabla blabla

caption: language1 blabla blabla | language2 blabla blabla

Keyword1: language1 bla1 | language2 bla2

Keyword2: language1 bla3 | language2 bla4

Keyword3: language1 bla5 | language2 bla6 etc.

keywords look in some other applications like this (Comma separated): language1 bla1 | language2 bla2, language1 bla3 | language2 bla4, language1 bla5 | language2 bla6

Many greetings from

pixelpicker

Archive

[Originally posted by exiftool on 2009-05-05 11:33:55-07]

Unfortunately, only the caption (XMP-dc:Description) supports
alternate languages in XMP.  The keywords (XMP-dc:Subject) and
headline (XMP-photoshop:Headline) do not support alternate
languages.  (Only lang-alt type tags support alternate languages
in XMP -- see the tag name documentation for details.)

Given this limitation, do you still want to try to separate the caption
languages?

- Phil

Archive

[Originally posted by pixelpicker on 2009-05-05 14:31:08-07]

Hello Phil! Hope you had a wonderful vacation - welcome back https://exiftool.org/forum/Smileys/default/smiley.gif" alt="Smiley" border="0" />

Yes, I have to do something to separate the language-keywords.

Cause its not possible to write the two languages completely into the XMP it makes no sense to use XMP. Sad.

Half the way I found a solution:

I use your unser-defined tags with the config-file. For the caption it looks like this:

Code:
       MyCaptionB => {
          Require => 'Caption-Abstract',
          ValueConv => q{ $val=~s/([^\|]*)\| .*/\1/ ? $val : undef },
        },

This script cuts everything after the separator | as it looks up to now. (I say "looks" cause I found the command
Code:
s/([^\|]*)\| .*/\1/
in the net and after experimenting it seems to work.)

What remains is a script to cut vice versa: cut everything before  |

One could use the double keyworded images for storing and when a language is needed one could create a copy of the image an delete the not needed language in this new image. The command for copying and deleting after the | should look something like this:

Code:
exiftool -o DIR "-caption-abstract<mycaptionb" DIR

Do you have any clue how to cut vice versa?

Many greetings

pixelpicker

Archive

[Originally posted by exiftool on 2009-05-05 15:10:17-07]

I would suggest pulling out the languages by name,
maybe something like this:

Code:
       CaptionLanguage1 => {
            Require => 'Caption-Abstract',
            ValueConv => '$val =~ /(^|\|)\s*language1\s+(.*?)\s*(\||$)/si ? $2 : undef',
        },
        CaptionLanguage2 => {
            Require => 'Caption-Abstract',
            ValueConv => '$val =~ /(^|\|)\s*language2\s+(.*?)\s*(\||$)/si ? $2 : undef',
        },

Here, the regular expression matches "language1" or "language2" (case insensitive),
then takes all text following this up to the "|" symbol or the end of the string.

- Phil

Archive

[Originally posted by exiftool on 2009-05-05 15:17:22-07]

I should mention that this will be more complex for the
Keywords since they are a list-type tag.  In this case,
you may have to loop through elements in the array:

Code:
       KeywordsLanguage1 => {
            Require => 'Keywords',
            ValueConv => q{
                my @vals = ref $val ? @$val : ($val);
                foreach $val (@vals) {
                    $val =~ /(^|\|)\s*language1\s+(.*?)\s*(\||$)/ and $val = $2;
                }
                return \@vals;
            },
        },

- Phil

Archive

[Originally posted by exiftool on 2009-05-05 15:27:29-07]

Here is maybe a better way to do the keywords.  The above example
returned the entire keyword string if the language didn't exist,
while this version returns nothing:

Code:
       KeywordsLanguage1 => {
            Require => 'Keywords',
            ValueConv => q{
                my @vals;
                foreach (ref $val eq 'ARRAY' ? @$val : $val) {
                    push @vals, $2 if /(^|\|)\s*language1\s+(.*?)\s*(\||$)/;
                }
                return @vals ? \@vals : undef;
            },
        },

- Phil

Archive

[Originally posted by exiftool on 2009-05-05 15:31:08-07]

Ooops.  Forgot to qualify the regular expression with "si" in the last
2 examples to allow matching of newlines in the text with case
insensitivity.

- Phil

Archive

[Originally posted by pixelpicker on 2009-05-05 21:14:45-07]



WOW PHIL! Thanks for your quick and elegand solution! - like always https://exiftool.org/forum/Smileys/default/smiley.gif" alt="Smiley" border="0" />

The idea to difference between the languages in the naming too is very helpful.

But I didn't understand correctly how I have to use the code, I must have made a mistake cause it doesn't do here what it should.

If I understood proper your code for CaptionLanguage1 and CaptionLanguage2 cuts everything after the | ? Or goes till the end when no pipe is there. But what does the  
Code:
.../(^|\|)\s*language1\...
"language1" in this part? Is it a variable?

What I did is, to put your code of the CaptionLanguage1 in the config, renamed the necessary parts to "english" to extract the second language.

But ExifTool says: 1 directories scanned 0 images updated https://exiftool.org/forum/Smileys/default/sad.gif" alt="Sad" border="0" />

Any idea whats wrong?

Have a good day.

Greetings from

pixelpicker

Archive

[Originally posted by pixelpicker on 2009-05-05 21:22:01-07]

Sorry - I meant:

...CaptionLanguage1 and CaptionLanguage2 cuts everything before the | ?

Cause this is whats needed.

Greetings from

pixelpicker

Archive

[Originally posted by exiftool on 2009-05-06 11:31:11-07]

The expression for CaptionLanguage1 looks for the string "language1",
and takes the text after this (up to the "|" or the end of the string, whichever
comes first).  You need to change "language1" in this expression to be the
name of the actual language you want.

Code:
> exiftool a.jpg -caption-abstract -captionlanguage1 -captionlanguage2
Caption-Abstract                : language1 bla1 | language2 bla2
Caption Language 1              : bla1
Caption Language 2              : bla2

- Phil

Archive

[Originally posted by pixelpicker on 2009-05-06 13:58:03-07]

OK - there was a misunderstanding - I didn't say clearly how the metadata is given:

its like this:

Caption-Abstract : Auto | car

Caption Language 1 : Auto

Caption Language 2 : car

Theres no "English" or other language in front.

I tried to modify your script. What I found out up to now:

Code:
        CaptionEnglish => {
           Require => 'Caption-Abstract',
           ValueConv => '$val =~ /(^|\|)\s*\s+(.*?)\s*(\||$)/si ? $2 : undef',
         },

This creates the "car". Verry Good, nearly done https://exiftool.org/forum/Smileys/default/smiley.gif" alt="Smiley" border="0" />

But:

Code:
       CaptionDeutsch => {
          Require => 'Caption-Abstract',
          ValueConv => '$val =~ /(^|\|)\s*|\s+(.*?)\s*(\||$)/si ? $2 : undef',
        },

creates nothing https://exiftool.org/forum/Smileys/default/sad.gif" alt="Sad" border="0" />

Could you help me one more time, please?

Many greetings from

pixelpicker

Archive

[Originally posted by exiftool on 2009-05-06 15:10:26-07]

Sorry, yes, I misunderstood.  So you do simply want to extract all
text before and after the "|":

Code:
       CaptionLanguage1 => {
            Require => 'Caption-Abstract',
            ValueConv => '$val =~ /^(.*?)\s*\|/s ? $1 : undef',
        },
        CaptionLanguage2 => {
            Require => 'Caption-Abstract',
            ValueConv => '$val =~ /\|\s*(.*?)$/s ? $1 : undef',
        },

 
and works like this:

Code:
> exiftool a.jpg -caption-abstract -captionlanguage1 -captionlanguage2
Caption-Abstract                : language1 bla1 | language2 bla2
Caption Language 1              : language1 bla1
Caption Language 2              : language2 bla2

- Phil

Archive

[Originally posted by pixelpicker on 2009-05-06 17:35:36-07]



************* Yes - this is the right thing!! *************

************* Works like a machine!! *************

************* Thank you very much Phil. *************

For the keywords I took your code like this:

Code:
       KeywordsDeutsch => {
          Require => 'Keywords',
          ValueConv => q{
            my @list = ref $val ? @$val : ($val);
            my $changed;
            s/([^\|]*)\| .*/\1/ and $changed = 1 foreach @list;
            return $changed ? \@list : undef;
          },
        },
        KeywordsEnglish => {
          Require => 'Keywords',
          ValueConv => q{
            my @vals;  
            foreach (ref $val eq 'ARRAY' ? @$val : $val) {  
            push @vals, $2 if /(^|\|)\s*\s+(.*?)\s*(\||$)/;  
          }
          return @vals ? \@vals : undef;      
          },
        },

No clue how it works in detail, but it does exactly what it should https://exiftool.org/forum/Smileys/default/smiley.gif" alt="Smiley" border="0" />))

Hmm - but I have another question before I look on it more detailed:

Wouldn't it be possible with ExifTool to create own iptc-tags (caption, keywords, headline) for a second language? Of course this wouldn't be a standard but could help.

Have a good evening.

Greetings from

pixelpicker

Archive

[Originally posted by exiftool on 2009-05-06 17:49:38-07]

Great.

Sure, you can create custom IPTC tags if you want.  Of course,
nothing but exiftool could ever read them... https://exiftool.org/forum/Smileys/default/smiley.gif" alt="Smiley" border="0" />

If you want to define custom tags with future compatibility
in mind, XMP is the better choice.

- Phil