Create keyword from the file name, all name till first underscore

Started by seraosr, November 22, 2012, 01:27:27 PM

Previous topic - Next topic

seraosr

Hello Exif Forum,
congratulations to the author of the tool. It's the best tools to edit metadata easily.

I'm a newby hear. I found and read some documentation but i'm not able yet to do it myself.

Question: Every day my workflow produces 50 pdf files, then these pdfs are converted to the right format with a tool called "Pitstop" then these pdfs are cataloged for a DAM tool called "Canto Cumulus." PDF files are created as follows "name_code.pdf."

The intention is, with EXIF, collect the file name to "_code.pdf" and write in the 'Keyword' field.

On the final, the command script is attached to the a action folder that do this transformation and move the files to other folder to automatically being catalogued.

Regards
Ricardo
Portugal

Phil Harvey

Hi Richardo,

You want to manpulate the file name and write to another tag.  This can be done with a user-defined tag and a command like this:

exiftool "-keywords+=myname" FILE

where FILE is one or more file and/or directory names.

For this to work, you need to create a user-defined "MyName" tag, which can be done with this config file:

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::Composite' => {
        MyName => {
            Require => 'FileName',
            # remove "_code.pdf" from end of FileName
            ValueConv => '$val =~ s/_code\.pdf//; $val',
        },
    },
);


- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

Hi Phil,
thank you for you reply.
I've files with filename like:

AEAGR3_20112889_TXT.PDF
AEAPRGRAM5_20112275_CP.PDF
EF789EP_20112125_TXTLR.PDF
GH7CAEP_20113477_TXT.PDF
IT7DP_20101838_AVAL.PDF

So, for the code you kindly generate, i think that you assume that the '_code.pdf' is a fixed tag, but there isn't, file names are very different.

Thanks a lot. Congratulations for your job.

Ricardo

Phil Harvey

Hi Richardo,

Then just use "\w+" instead of "code" to represent a string of one or more word characters.  Also, you should use /i to make the extension case-insensitive:

            ValueConv => '$val =~ s/_\w+\.pdf//i; $val',
- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

Hi Phil,
thanks in advance for all your help.

From our compreension how Exif works, we have 2 files, one batch file with "exiftool "-keywords+=myname" C:\PDFS" and a config file with the code you sent "%Image::ExifTool::UserDefined = (
    'Image::ExifTool::Composite' => {
        MyName => {
            Require => 'FileName',
            # remove "_code.pdf" from end of FileName
         ValueConv => '$val =~ s/_\w+\.pdf//i; $val',
      },
    },
);"


We have both files in a folder on c:\ together with "exiftool.exe", but if we run the batch file, we get no result. Can you tell us what are we doing wrong?

Thanks again
Ricardo

Phil Harvey

...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

Hi Phil,

yes i did.

The problem was that i was executing cmd, EXE and CONFIG file from Desktop.
Now I have - "EXIF.cmd", "exiftool.exe" and ".ExifTool_config" on %homepath%. Destination Folder now is C:\PDFS.

I have sucess when execute CMD file but on PDF the keyword added is "Myname" - The name you gave to TAG.

Can you please help.

Thanks again.
Ricardo

Phil Harvey

Oops, you're right.  I wanted to copy the MyName tag, not add the string "myname".  The command should have been:

exiftool "-keywords+<myname" FILE

(the "+" adds to the existing keywords.  Remove it if you don't want to preserve keywords that already exist in the file.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

:) :)

Phil, I understood all you wrote about the + and =.

in spite of it's working, the keyword added is the file name filename.PDF. To remove (.PDF) from Keyword can i remove .PDF from ValueConv => '$val =~ s/_code\.pdf//; $val',

I can only test again in Monday. Have a good weekend and thank you for all your work.

Regards
Ricardo

Phil Harvey

Hi Ricardo,

The problem is that you forgot the "/i" in the expression to make it case insensitive (your files are .PDF, and the expression has "\.pdf"):

         ValueConv => '$val =~ s/_\w+\.pdf//i; $val',

- Phil

Edit: Oops, sorry about getting your name wrong.  Fixed now.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

Hi Phil,
i've tested you wrote and it's working,yupi! I want to thank you.

My problem is with my real PDF's files generated by a system from 'Kodak', exif is not accessing the files,.. Maybe some protection that i will try to find out.

Im mean while, thank you again.

Regards
Ricardo

seraosr

Hi Phil
apologizes for pushing this issue.

Our PDFS are protected after going to "pitstop" tool, I really need to write to the "Keywords" field in PDF.
I've changed the command line to:exiftool "-Xmp:subject+<myname" C:\EXIF\PDFS

Opening the PDF in Adobe Reader, the "keyword" is written in the keyword field, but with semicolons, ie: (; filename), i've attached an example. Can you please correct me!

Regards and thanks

Phil Harvey

The Adobe Reader screen shot doesn't help me.  An ExifTool output would be more useful.  But it could be there is a blank item in the XMP:Subject list.  If so, it was there before you added your "myname" item.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

Hi, Phil
Attached the file (CCH-SEITA_20114530_CP_72dpi.txt) from ExifToolGUi. It's strange, no mencion to (;) symbol.

Can you help now, on to "Create keyword from the same Keyword" ??

Attached, the export file (CESPI-REC_20000000_TXT.txt) from a PDF File with a written Keyword.

It's possible to read that keyword and write on same field?

---- XMP ----
XMP Toolkit                     : Adobe XMP Core 5.2-c001 63.139439, 2010/09/27-13:37:26
Modify Date                     : 2012:12:04 17:06:10Z
Create Date                     : 2008:03:14 17:50:53
Metadata Date                   : 2012:12:04 17:06:10Z
Document ID                     : uuid:dd2f8b4e-9198-4172-9519-19fb92544a11
Instance ID                     : uuid:2e9074d2-e7ae-f74c-8cb3-56d6cd0b8970
Format                          : application/pdf
Creator                         :
Subject                         : NEW ARCHIVE: CESPI-REC_20000000_TXT.pdf
Producer                        : Creo Normalizer JTP
Keywords                        : NEW ARCHIVE: CESPI-REC_20000000_TXT.pdf

On field XMP: Subject, wanto to read and write only NEW ARCHIVE: CESPI-REC. Can you please help.

In previous, to Create keyword from the file name, all name till first underscore, we had:

.ExifTool_Config

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::Composite' => {
        Myname => {
               Require => 'FileName',
   ValueConv => '$val =~ s/_\w+\.pdf//i; $val',
        },
    },
);

Batch File

"exiftool "-keywords<myname" C:\EXIF\PDFS" and later "exiftool "-Xmp:subject+<myname" C:\EXIF\PDFS"

Have to Keep the tag  ValueConv ?

if you kindly help, i appreciated.

Regards
Thanks

Ricardo

Phil Harvey

Modifying the keywords or subject is a bit different because these are list-type tags.

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::Composite' => {
        Myname2 => {
               Require => 'Keywords',
               ValueConv => q{
                   $val = $val[0] if ref $val eq 'ARRAY';
                   $val =~ s/_\w+\.pdf//i;
                   return $val;
               },
        },
    },
);


With this, you can use this command to remove the end part of the file name from the keywords:

exiftool "-keywords<keywords" FILE

I don't see where the semicolon came from the file with the "CCH-SEITA" keywords. To make sure you don't have unwanted keywords, you could replace existing keywords with "<" instead of adding to them with "+<".

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

Hi Phil, thanks again for your kindly help.

Did the test with the code and command you send and no result.

Think that, the field "XMP: Subject" is the same that field "Keywords" on a PDF file.

On PDF file Keywords are duplicated, even wit "<" signal. I think that with code you sent, exif is writing on a XMP Subject.

Attached the Export Metadata from and the PDF file, before and after execute command you sent exiftool "-keywords<keywords" FILE.

Thank you
Regards

Phil Harvey

Oops.  I made a mistake.  The command should have been:

exiftool "-keywords<myname2" FILE

Also, what I was trying to do wouldn't work because the file name is the 3rd keyword.

It helps a lot that you have included the original PDF.  The original PDF:Keywords are:

1) "Porto"
2) "Editora:"
3) "CESPI-REC_20000000_TXT.pdf"

If you want them changed to:

1) "Porto"
2) "Editora:"
3) "CESPI-REC"

then use this config file:

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::Composite' => {
        Myname2 => {
               Require => 'Keywords',
               ValueConv => q{
                   my @list = ref $val eq 'ARRAY' ? @$val : ($val);
                   s/_\w+\.pdf//i foreach @list;
                   return \@list;
               },
        },
    },
);


- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

No mistakes Phil. Your help is priceless.

I have this message, and PDF isn't processed.

C:\EXIF>exiftool "-keywords+<Myname2" C:\EXIF\PDFS
Warning: No writable tags set from C:/EXIF/PDFS/CESPI-REC_20000000_TXT.pdf
    1 directories scanned
    0 image files updated
    1 image files unchanged

Any thing that i'm missing or the PDF is now ilegible.

Regards and thanks

Phil Harvey

You should drop the "+" from the command or you will duplicate all of the keywords.

This should work provided your config file is properly activated.

What is the output of this command?:

exiftool -keywords -myname2 C:\EXIF\PDFS

If Keywords exists but Myname2 doesn't, then the config file isn't activated.  See FAQ number 11 for help with this.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

Hi Phil,
definitely, i've drop the "+". My mistake, in last post, i've write "+".
I think that duplicated keywords is because PDF file has writen on Keywords field and Exif write on XMP-dc Keyword.

The output for this command - exiftool -keywords -myname2 C:\EXIF\PDFS is:

C:\EXIF>exiftool -keywords -myname2 C:\EXIF\PDFS

======== C:/EXIF/PDFS/CESPI-REC_20000000_TXT.pdf
Keywords                        : Porto, Editora:, CESPI-REC_20000000_TXT.pdf
Myname 2                        : Porto, Editora:, CESPI-REC
    1 directories scanned
    1 image files read


I think that config file is activated, or if you prefer i can use "exiftool -config .ExifTool_config". I did the teste "print "LOADED!\n";" and the output is "LOADED".

Attached my folder "EXIF". Please unzip and paste on "c:\". Please, kindly execute command.

Regards and Thanks
Many thanks

Phil Harvey

So your config file is activated OK.  I tried this config file with the command and your sample PDF on my Mac and it worked fine.  It should work fine on Windows too, but I don't have a Windows system that I can try this on now.

You could try adding -v3 to the command with "-keywords<myname2" to see if it gives any hint about what is going on.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

Hi Phill,
i've a MAC either. I've followed the FAQ and i've configured on MAC, but isn't working neither.
Think that isn't writing back on PDF. Can you please analyze?

bash-3.2# cat ~/.ExifTool_config
%Image::ExifTool::UserDefined = (
    'Image::ExifTool::Composite' => {
        Myname2 => {
               Require => 'Keywords',
               ValueConv => q{
                   my @list = ref $val eq 'ARRAY' ? @$val : ($val);
                   s/_\w+\.pdf//i foreach @list;
                   return \@list;
               },
        },
    },
);
bash-3.2# exiftool -keywords -myname2 /Users/rsoares/Desktop/EXIF/PDFS/ 
======== /Users/rsoares/Desktop/EXIF/PDFS/CESPI-REC_20000000_TXT.pdf
Keywords                        : Porto, Editora:, CESPI-REC_20000000_TXT.pdf
Myname 2                        : Porto, Editora:, CESPI-REC
    1 directories scanned
    1 image files read
bash-3.2#


Thanks again.

Phil Harvey

Can you paste the output of this command?:

exiftool "-keywords<myname2" -v3 /Users/rsoares/Desktop/EXIF/PDFS/

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

bash-3.2# exiftool -keywords -myname2 /Users/rsoares/Desktop/EXIF/PDFS/ 
======== /Users/rsoares/Desktop/EXIF/PDFS/CESPI-REC_20000000_TXT.pdf
Keywords                        : Porto, Editora:, CESPI-REC_20000000_TXT.pdf
Myname 2                        : Porto, Editora:, CESPI-REC
    1 directories scanned
    1 image files read
bash-3.2# exiftool "-keywords<myname2" -v3 /Users/rsoares/Desktop/EXIF/PDFS/
======== /Users/rsoares/Desktop/EXIF/PDFS/CESPI-REC_20000000_TXT.pdf
Setting new values from /Users/rsoares/Desktop/EXIF/PDFS/CESPI-REC_20000000_TXT.pdf
Writing PDF:Keywords if tag exists
Writing MIE-Doc:Keywords
Writing XMP-acdsee:Keywords if tag exists
Writing IPTC:Keywords
Writing PDF:Keywords if tag exists
Writing MIE-Doc:Keywords
Writing XMP-acdsee:Keywords if tag exists
Writing IPTC:Keywords
Writing PDF:Keywords if tag exists
Writing MIE-Doc:Keywords
Writing XMP-acdsee:Keywords if tag exists
Writing IPTC:Keywords
Writing PostScript:Keywords
Writing XMP-xmp:Keywords if tag exists
Writing XMP-pdf:Keywords if tag exists
Rewriting /Users/rsoares/Desktop/EXIF/PDFS/CESPI-REC_20000000_TXT.pdf...
    - PDF:Keywords = 'Porto'
    - PDF:Keywords = 'Editora:'
    - PDF:Keywords = 'CESPI-REC_20000000_TXT.pdf'
    + PDF:Keywords = 'Porto'
    + PDF:Keywords = 'Editora:'
    + PDF:Keywords = 'CESPI-REC'
  Rewriting XMP
  Warning = [minor] Ignored empty rdf:Bag list for dc:creator
    - XMP-pdf:Keywords = 'Porto Editora: CESPI-REC_20000000_TXT.pdf'
    + XMP-pdf:Keywords = 'Porto, Editora:, CESPI-REC'
Warning: [minor] Ignored empty rdf:Bag list for dc:creator - /Users/rsoares/Desktop/EXIF/PDFS/CESPI-REC_20000000_TXT.pdf
    1 directories scanned
    1 image files updated
bash-3.2#

Phil Harvey

OK, so this seems to have worked.  (the file was updated and the XMP and PDF Keywords were written.)

Now, on the updated PDF file, what does this command give?:

exiftool -a -G1 -keywords /Users/rsoares/Desktop/EXIF/PDFS/

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

bash-3.2# exiftool -a -G1 -keywords /Users/rsoares/Desktop/EXIF/PDFS/
======== /Users/rsoares/Desktop/EXIF/PDFS/CESPI-REC_20000000_TXT.pdf
[PDF]           Keywords                        : Porto, Editora:, CESPI-REC
[XMP-pdf]       Keywords                        : Porto, Editora:, CESPI-REC
    1 directories scanned
    1 image files read
bash-3.2#


It's almost Phil, almost.

The Keyword added is: "Porto, Editora:, CESPI-REC". So, 3 keywords.
The Keyword that I need to be added is: "Porto Editora: CESPI-REC"

Cataloging to our DAM system, the result is 3 Categories with the 3 Keywords added. And the PDF is cataloged to 3 categories, so, 3 Records.

Categories Tree:

Porto (1)
Editora (1)
CESPI-REC (1)

And what we need to be cataloged is 1 Record on Subcategory: CESPI-REC on main Category "Porto Editora".

Categories Tree:

->Porto Editora
                     > CESPI-REC (1).

Phil Harvey

I don't understand.  If you just tell me the keywords in the original file and the keywords you want in the final file, then I can help.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

Keywords on the original file are: "Porto Editora: CESPI-REC_20000000_TXT.pdf" - One Keyword
Keywords that i want in final file: "Porto Editora: CESPI-REC" - One Keyword

But, i want that "Porto Editora: CESPI-REC" be one keyword only and not three.

This is what i get
    + XMP-pdf:Keywords = 'Porto, Editora:, CESPI-REC'

This is what i want (Without commas, one keyword only)
    + XMP-pdf:Keywords = 'Porto Editora: CESPI-REC'

Thanks

Phil Harvey

I see the problem.  The PDF keywords are split up into separate keywords by exiftool because these are often stored as a  simple string.  So all you need to do is force ExifTool to use the XMP:Keywords instead of the PDF:Keywords in your config file by adding "XMP:" to the Require'd tag name:

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::Composite' => {
        Myname2 => {
               Require => 'XMP:Keywords',
               ValueConv => q{
                   my @list = ref $val eq 'ARRAY' ? @$val : ($val);
                   s/_\w+\.pdf//i foreach @list;
                   return \@list;
               },
        },
    },
);
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

No. You saw the solution. Thank you very much for your help. Wasn't easy. Good to have MAC OS X (snow leopard)

By the way, I've tested on Windows and doesn't work, even with this last changes.

Exif tool is very powerful, was amazing if sintax were more not to complex.

Thank you very much for your hard work. Really!

Phil Harvey

There should be no difference in Windows as long as the config file is properly activated.

What is the result with the -v3 option when you try this in Windows?

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

Phil, as i told you before and you saw the result in previous post, config file is properly activated

C:\Windows\system32>exiftool "-keywords<myname2" -v3 C:\EXIF\PDFS\
======== C:/EXIF/PDFS/CESPI-REC_20000000_TXT.pdf
Setting new values from C:/EXIF/PDFS/CESPI-REC_20000000_TXT.pdf
Warning: No writable tags set from C:/EXIF/PDFS/CESPI-REC_20000000_TXT.pdf
Nothing changed in C:/EXIF/PDFS/CESPI-REC_20000000_TXT.pdf

    1 directories scanned
    0 image files updated
    1 image files unchanged


Phil Harvey

This is exactly what would happen if either the config file isn't active or XMP:Keywords doesn't exist.

What do you get with this command?:

exiftool -a -G1 -keywords -myname2 C:\EXIF\PDFS\

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

seraosr

This is the result Phil.

C:\EXIF>exiftool -a -G1 -keywords -myname2 C:\EXIF\PDFS\
======== C:/EXIF/PDFS/CESPI-REC_20000000_TXT.pdf
[PDF]           Keywords                        : Porto, Editora:, CESPI-REC_20000000_TXT.pdf
[XMP-pdf]       Keywords                        : Porto Editora: CESPI-REC_20000000_TXT.pdf
    1 directories scanned
    1 image files read

Phil Harvey

OK.  XMP:Keywords exists but MyName2 doesn't, so the proper config file isn't activated.  You had this working in your reply #19, so something must have changed.

This is back to FAQ number 11 to figure out what is wrong.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).