Win Bat Script - Filename.pdf + filename.xml into Exiftool?

Started by Stephen Marsh, October 05, 2016, 04:29:58 AM

Previous topic - Next topic

Stephen Marsh

Let's say I have a PDF:
Filename.pdf

and I have an XML file with the same name:
Filename.xml

Both of which are in a given directory... Is there a Windows batch script or other method that can be used to have ExifTool automatically populate specific information from the XML file into the matching PDF file (xpath I guess)? I would of course have to strip out the XML tags to leave only the required value.

This would be processing multiple PDF/XML file pairs (with both filenames being an unknown variable, most likely a string of digits such as 54321.pdf + 54321.xml or 123.pdf + 123.xml etc), it would also most likely be run from a scheduled task or other software that may monitor a directory and then fire up a script or program that runs a script etc.

On the Mac I could setup variables via Automator or AppleScript for the file input and the tags, however I am not sure where to start or go in Windows.

Any links or other pointers would be greatly appreciated.

Phil Harvey

Hi Stephen,

This is easy using the ExifTool -tagsFromFile feature.  The first step is to figure out what tags you want to copy and where, then the command is:

exiftool -tagsfromfile %d%f.xml "-DSTTAG1<SRCTAG1" "-DSTTAG2<SRCTAG2" ... -ext pdf DIR

See FAQ 2 for determining tag names, and the application documentation for compete details about the -tagsFromFile option.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Stephen Marsh

#2
Thank you Phil,

The XML is from a different system, it contains "foreign formatted" XML data, not ExifTool formatted XML data.

So for example, if the XML database information had the author/creator as:

<CUSTNAME>Client Name Here</CUSTNAME>

Then I presume that one would need to strip out the XML tags...

However, how do we specify the XML xpath? Such as:

/ORDERS/ORDER/SPECIFICATION/CUSTNAME

Or do we just perform some sort of Python magic to do an on the fly search/replace regex of <CUSTNAME>Client Name Here</CUSTNAME> into a pseudo metadata tag that would be recognised and handled seamlessly so that the final populated data in the PDF was Client Name Here?

Phil Harvey

Read the FAQ I mentioned, and try this on your XML file.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Stephen Marsh

Again, thank you Phil. I did read the FAQ and have read it again. I am still none the wiser. :] I know the destination tags, they are standard metadata tags, such as -author

What I am not getting is how to specify the source, as they are not tags, they are just XML entries among a whole lot of other XML entries:

<CUSTNAME>Client Name Here</CUSTNAME>

I would need to transform the XML entry above into a tag that ExifTool understands, or am I not understanding again?

Thank you for your patience Phil.

StarGeek

I'm just guessing, but try something like

exiftool -tagsfromfile source.xml "-author<CUSTNAME" target.jpg

I believe exiftool can read properly formed XML files.  Add in tags from file and tag redirection and you should be able to come up with something.

...

A quick test with a random xml file on my computer (something from Civ V) and I was able to copy a value from the xml to a jpg with a command like that above.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Stephen Marsh

#6
Thank you StarGeek, incremental progress...

exiftool -tagsfromfile "C:\Users\Administrator\Desktop\ExifTool XML & PDF Test\1237.xml" "-author<CUSTNAME" "C:\Users\Administrator\Desktop\ExifTool XML & PDF Test\1237.pdf"

I tried your suggestion with the following result:

Warning: [minor] Error 3 placing ::ordersOrderOrderitemsOrderitemTax in structure or list -
C:\Users\Administrator\Desktop\ExifTool XML & PDF Test\1237.xml
Warning: No writable tags set from C:\Users\Administrator\Desktop\ExifTool XML &
PDF Test\1237.xml
    0 image files updated
    1 image files unchanged


So it appears that ExifTool can't understand the source tag.

The "xpath" to the data in question should be:

/ORDERS/ORDER/SPECIFICATION/CUSTNAME

Not:

ordersOrderOrderitemsOrderitemTax

Phil Harvey

#7
Hi Stephen,

What is the exact command that you used?

From the error message you are trying to write the XML file, so there is something wrong with your command.

Also

Quote from: Stephen Marsh on October 06, 2016, 04:37:37 PM
Again, thank you Phil. I did read the FAQ and have read it again. I am still none the wiser. :] I know the destination tags, they are standard metadata tags, such as -author

I said to run the command from FAQ 2 on your XML file (to find the names of the source tags).  I think your biggest problem now is confusing which is the source file and which is the destination.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Stephen Marsh

Thank you for your patience Phil, it is not for nothing in the end!

OK, baby steps.

No, I was not running the command on the "foreign" XML file, I totally misunderstood and thought that this was to find the destination tag, not the source tag... So:

exiftool -s "C:/Users/Administrator/Desktop/ExifTool XML & PDF Test/1237.xml"

Resulted in "most" of the tags being output, EXCEPT the tag that I was interested in.

So I just used the tag from the same area of the XML file and changed it accordingly.

exiftool -tagsfromfile %d%f.xml "-author<OrdersOrderSpecificationCustname"  -ext pdf "C:\Users\Administrator\Desktop\ExifTool XML & PDF Test"


Which then resulted in the following output:

Warning: [minor] Error 3 placing ::ordersOrderOrderitemsOrderitemTax in structure or list -
C:/Users/Administrator/Desktop/ExifTool XML & PDF Test/1237.xml
    1 directories scanned
    1 image files updated


And yes, the PDF does indeed have the XML entry added to the Author field!

I will need to kick this round some more with deeper tests, however I am very happy with this first successful result and again I truly appreciate your perseverance.

Phil Harvey

I you can attach the XML file to a post and tell me what information is missing from the ExifTool output I can help figure this out.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Stephen Marsh

#10
Thank you Phil,

I have compared the output of the XML file using -a -g1 -s on both Mac and Win using the same current version of ExifTool and there are some differences in the output. The Win output was missing the tag for the Custname that I was looking for, while on the Mac version it was there.

I have attached an XML file and the text output from the Win and Mac in an .zip archive.

Thank you for taking a look.

I'll try to look a bit deeper into the entire workflow now that I have the basics working (again thank you), I still need to create a .bat file and fire it up via scheduled tasks.

EDIT: OK, the .bat file only required simple modification from the standard CLI input (adding .exe and extra % characters):

exiftool.exe -tagsfromfile %%d%%f.xml "-author<OrdersOrderSpecificationCustname" -ext pdf "C:\Users\Administrator\Desktop\ExifTool XML & PDF Test"

Phil Harvey

Quote from: Stephen Marsh on October 07, 2016, 07:03:30 PM
I have compared the output of the XML file using -a -g1 -s on both Mac and Win using the same current version of ExifTool and there are some differences in the output. The Win output was missing the tag for the Custname that I was looking for, while on the Mac version it was there.

Ah.  It appears that you copied and pasted from the command window, and I think the window buffer size is too small and the top of the Windows output was lost.  Try redirecting the output to a file:

exiftool -a -g1 -s FILE > out .txt

Also, you may want to change the properties of "exiftool(-k).exe" in Windows so that it uses a command window with a larger scrollback buffer.  (I know this can be done, but I can't give any specifics right now.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Stephen Marsh

Yes, it was a copy and paste job! I did not know that this may be truncated in Windows, I must be spoilt by mostly using Terminal! OK, I'll use text dumps next time so as not to be caught out, thanks.

P.S. What if the source XML and the destination were in different directories? And one final twist, what if the filenames were not exactly the same, such as 1234.xml and 1234_artwork.pdf, the destination PDF would have the same prefix number with an underscore delimeter.

StarGeek

Quote from: Phil Harvey on October 08, 2016, 09:41:50 AM

Ah.  It appears that you copied and pasted from the command window, and I think the window buffer size is too small and the top of the Windows output was lost.  Try redirecting the output to a file:

To redirect directly to the clipboard, add |clip to the end of the command.  I believe the clip program is available on Vista and up, but can be downloaded and added to XP.

exiftool -a -g1 -s FILE |clip
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Phil Harvey

@StarGeek: "clip". Cool.

Quote from: Stephen Marsh on October 08, 2016, 06:59:37 PM
P.S. What if the source XML and the destination were in different directories? And one final twist, what if the filenames were not exactly the same, such as 1234.xml and 1234_artwork.pdf, the destination PDF would have the same prefix number with an underscore delimeter.

If the directories are different, then replace %d in the-tagsFromFile argument with the source directory name.  If the file names are different, then replace %f with the source file name, but then you can't specify an entire DIR, and must do one file at a time (unless there is a consistent difference between the two, in which case you could do something like SRCDIR/%-.8f.xml to remove the last 8 characters from the file name)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).