Win Bat Script - Filename.pdf + filename.xml into Exiftool?

Started by Stephen Marsh, October 05, 2016, 04:29:58 AM

Previous topic - Next topic

Stephen Marsh

Let's say I have a PDF:
Filename.pdf

and I have an XML file with the same name:
Filename.xml

Both of which are in a given directory... Is there a Windows batch script or other method that can be used to have ExifTool automatically populate specific information from the XML file into the matching PDF file (xpath I guess)? I would of course have to strip out the XML tags to leave only the required value.

This would be processing multiple PDF/XML file pairs (with both filenames being an unknown variable, most likely a string of digits such as 54321.pdf + 54321.xml or 123.pdf + 123.xml etc), it would also most likely be run from a scheduled task or other software that may monitor a directory and then fire up a script or program that runs a script etc.

On the Mac I could setup variables via Automator or AppleScript for the file input and the tags, however I am not sure where to start or go in Windows.

Any links or other pointers would be greatly appreciated.

Phil Harvey

Hi Stephen,

This is easy using the ExifTool -tagsFromFile feature.  The first step is to figure out what tags you want to copy and where, then the command is:

exiftool -tagsfromfile %d%f.xml "-DSTTAG1<SRCTAG1" "-DSTTAG2<SRCTAG2" ... -ext pdf DIR

See FAQ 2 for determining tag names, and the application documentation for compete details about the -tagsFromFile option.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Stephen Marsh

#2
Thank you Phil,

The XML is from a different system, it contains "foreign formatted" XML data, not ExifTool formatted XML data.

So for example, if the XML database information had the author/creator as:

<CUSTNAME>Client Name Here</CUSTNAME>

Then I presume that one would need to strip out the XML tags...

However, how do we specify the XML xpath? Such as:

/ORDERS/ORDER/SPECIFICATION/CUSTNAME

Or do we just perform some sort of Python magic to do an on the fly search/replace regex of <CUSTNAME>Client Name Here</CUSTNAME> into a pseudo metadata tag that would be recognised and handled seamlessly so that the final populated data in the PDF was Client Name Here?

Phil Harvey

Read the FAQ I mentioned, and try this on your XML file.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Stephen Marsh

Again, thank you Phil. I did read the FAQ and have read it again. I am still none the wiser. :] I know the destination tags, they are standard metadata tags, such as -author

What I am not getting is how to specify the source, as they are not tags, they are just XML entries among a whole lot of other XML entries:

<CUSTNAME>Client Name Here</CUSTNAME>

I would need to transform the XML entry above into a tag that ExifTool understands, or am I not understanding again?

Thank you for your patience Phil.

StarGeek

I'm just guessing, but try something like

exiftool -tagsfromfile source.xml "-author<CUSTNAME" target.jpg

I believe exiftool can read properly formed XML files.  Add in tags from file and tag redirection and you should be able to come up with something.

...

A quick test with a random xml file on my computer (something from Civ V) and I was able to copy a value from the xml to a jpg with a command like that above.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Stephen Marsh

#6
Thank you StarGeek, incremental progress...

exiftool -tagsfromfile "C:\Users\Administrator\Desktop\ExifTool XML & PDF Test\1237.xml" "-author<CUSTNAME" "C:\Users\Administrator\Desktop\ExifTool XML & PDF Test\1237.pdf"

I tried your suggestion with the following result:

Warning: [minor] Error 3 placing ::ordersOrderOrderitemsOrderitemTax in structure or list -
C:\Users\Administrator\Desktop\ExifTool XML & PDF Test\1237.xml
Warning: No writable tags set from C:\Users\Administrator\Desktop\ExifTool XML &
PDF Test\1237.xml
    0 image files updated
    1 image files unchanged


So it appears that ExifTool can't understand the source tag.

The "xpath" to the data in question should be:

/ORDERS/ORDER/SPECIFICATION/CUSTNAME

Not:

ordersOrderOrderitemsOrderitemTax

Phil Harvey

#7
Hi Stephen,

What is the exact command that you used?

From the error message you are trying to write the XML file, so there is something wrong with your command.

Also

Quote from: Stephen Marsh on October 06, 2016, 04:37:37 PM
Again, thank you Phil. I did read the FAQ and have read it again. I am still none the wiser. :] I know the destination tags, they are standard metadata tags, such as -author

I said to run the command from FAQ 2 on your XML file (to find the names of the source tags).  I think your biggest problem now is confusing which is the source file and which is the destination.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Stephen Marsh

Thank you for your patience Phil, it is not for nothing in the end!

OK, baby steps.

No, I was not running the command on the "foreign" XML file, I totally misunderstood and thought that this was to find the destination tag, not the source tag... So:

exiftool -s "C:/Users/Administrator/Desktop/ExifTool XML & PDF Test/1237.xml"

Resulted in "most" of the tags being output, EXCEPT the tag that I was interested in.

So I just used the tag from the same area of the XML file and changed it accordingly.

exiftool -tagsfromfile %d%f.xml "-author<OrdersOrderSpecificationCustname"  -ext pdf "C:\Users\Administrator\Desktop\ExifTool XML & PDF Test"


Which then resulted in the following output:

Warning: [minor] Error 3 placing ::ordersOrderOrderitemsOrderitemTax in structure or list -
C:/Users/Administrator/Desktop/ExifTool XML & PDF Test/1237.xml
    1 directories scanned
    1 image files updated


And yes, the PDF does indeed have the XML entry added to the Author field!

I will need to kick this round some more with deeper tests, however I am very happy with this first successful result and again I truly appreciate your perseverance.

Phil Harvey

I you can attach the XML file to a post and tell me what information is missing from the ExifTool output I can help figure this out.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Stephen Marsh

#10
Thank you Phil,

I have compared the output of the XML file using -a -g1 -s on both Mac and Win using the same current version of ExifTool and there are some differences in the output. The Win output was missing the tag for the Custname that I was looking for, while on the Mac version it was there.

I have attached an XML file and the text output from the Win and Mac in an .zip archive.

Thank you for taking a look.

I'll try to look a bit deeper into the entire workflow now that I have the basics working (again thank you), I still need to create a .bat file and fire it up via scheduled tasks.

EDIT: OK, the .bat file only required simple modification from the standard CLI input (adding .exe and extra % characters):

exiftool.exe -tagsfromfile %%d%%f.xml "-author<OrdersOrderSpecificationCustname" -ext pdf "C:\Users\Administrator\Desktop\ExifTool XML & PDF Test"

Phil Harvey

Quote from: Stephen Marsh on October 07, 2016, 07:03:30 PM
I have compared the output of the XML file using -a -g1 -s on both Mac and Win using the same current version of ExifTool and there are some differences in the output. The Win output was missing the tag for the Custname that I was looking for, while on the Mac version it was there.

Ah.  It appears that you copied and pasted from the command window, and I think the window buffer size is too small and the top of the Windows output was lost.  Try redirecting the output to a file:

exiftool -a -g1 -s FILE > out .txt

Also, you may want to change the properties of "exiftool(-k).exe" in Windows so that it uses a command window with a larger scrollback buffer.  (I know this can be done, but I can't give any specifics right now.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Stephen Marsh

Yes, it was a copy and paste job! I did not know that this may be truncated in Windows, I must be spoilt by mostly using Terminal! OK, I'll use text dumps next time so as not to be caught out, thanks.

P.S. What if the source XML and the destination were in different directories? And one final twist, what if the filenames were not exactly the same, such as 1234.xml and 1234_artwork.pdf, the destination PDF would have the same prefix number with an underscore delimeter.

StarGeek

Quote from: Phil Harvey on October 08, 2016, 09:41:50 AM

Ah.  It appears that you copied and pasted from the command window, and I think the window buffer size is too small and the top of the Windows output was lost.  Try redirecting the output to a file:

To redirect directly to the clipboard, add |clip to the end of the command.  I believe the clip program is available on Vista and up, but can be downloaded and added to XP.

exiftool -a -g1 -s FILE |clip
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

Phil Harvey

@StarGeek: "clip". Cool.

Quote from: Stephen Marsh on October 08, 2016, 06:59:37 PM
P.S. What if the source XML and the destination were in different directories? And one final twist, what if the filenames were not exactly the same, such as 1234.xml and 1234_artwork.pdf, the destination PDF would have the same prefix number with an underscore delimeter.

If the directories are different, then replace %d in the-tagsFromFile argument with the source directory name.  If the file names are different, then replace %f with the source file name, but then you can't specify an entire DIR, and must do one file at a time (unless there is a consistent difference between the two, in which case you could do something like SRCDIR/%-.8f.xml to remove the last 8 characters from the file name)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Stephen Marsh

#15
That worked great Phil. The regular expression bit is intriguing as that offers more flexibility in automatically processing an entire directory.

If for example the only constant was the leading numbers and then the underscore and other text, how would that work?

I am new to both ExifTool and Regular Expressions, however I know enough regex to hack this together:

^\d+ or ^(\d+)

or should I be adding forward slashes around that pattern?

/^\d+/

Which should select all digits up to the underscore:

1234_some_random_file-name.pdf
9876543210_another-file name.pdf

Or should I be trying to select the opposite, all characters after the digits (while retaining the file extension)? And if so, how? (?<=^\d+).+ does not work...

I have tried to use these regexes however I must be getting the perl syntax wrong as I have an error that my regex does not exist for a -tagsfromfile operation.

So rather than your example of truncating the trailing 8 characters off the PDF filename so that it matched the source XML filename, how would one just select the variable leading digits up until the underscore (which does not rename the output file, it is just a temporary rename while processing if I understand things correctly):

1234.pdf
9876543210.pdf

(which would then match the 1234.xml and 9876543210.xml files in the other specified directory)

Phil Harvey

Sorry, but the regular expressions can not be used in the argument to -tagsFromFile.  There are just a few basic exiftool-specific formatting codes that may be used here.  So I think you'll probably have to do it one file at a time in exiftool.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Stephen Marsh

#17
That is a shame Phil about no regex with -tagsFromFile. If the goal was to automate a directory, I was thinking:

1. Preserve the current filename in metadata
"-XMP-xmpMM:PreservedFileName<${filename} -r"

2. Batch rename using regex to remove everything after the digits
9876_filenametogo.pdf to 9876.pdf so that the PDF matches the XML file naming convention 9876.xml

3. inject the required metadata from the XML files into the PDF files

4. If necessary, revert the PDF files back to their original name
"-FileName<XMP-xmpMM:PreservedFileName" -r

The various .bat files could be triggered to sequentially run from a scheduled task a minute apart from each other for example (steps 1 and 4 being optional).

Step 2 is where I am a little unsure...

Hayo Baan

For step two, it's been a while since I renamed stuff on the Windows command-line, but wouldn't something as simple as ren *.* ????.* work?
Hayo Baan – Photography
Web: www.hayobaan.nl

Stephen Marsh

#19
That could be an option Hayo, the leading digits may not only be 4 digits though, which is why I was looking for an equivalent of a greedy regex for \d+

1234_some-random-file_name.pdf > 1234.pdf

9876543_anotherfilename.pdf > 9876543.pdf

The one constant is that there will be any length of digits followed by an underscore separator. All I want is the variable length digits and to strip the underscore and other characters, preserving the extension. The other constant is that the "paired" XML file only has the digits such as 1234.xml or 9876543.xml

I thought that I could just use ExifTool commands (as no -tagsFromFile is being used in that step) to either rename and/or move or copy the files from DIR-A to DIR-B with the altered regex filename, without altering any metadata at that step?

Hayo Baan

OK, I see. This can indeed be done with exiftool. For instance
exiftool -filename"<${filename;s/(\d+).*(\..*)/$1$2/;}" FILES
should do the trick.
Hayo Baan – Photography
Web: www.hayobaan.nl

Stephen Marsh

Quote from: Hayo Baan on October 10, 2016, 07:39:40 AM
OK, I see. This can indeed be done with exiftool. For instance
exiftool -filename"<${filename;s/(\d+).*(\..*)/$1$2/;}" FILES
should do the trick.


Thank you Hayo, that worked great!