Rename PDF with Title metadata

Started by mr80h, November 03, 2012, 07:19:03 PM

Previous topic - Next topic

mr80h

So I am a complete noob - I've barely even gotten my toes wet with the command line, so all of this is a bit overwhelming. However, I'm trying to organize a PDF library by renaming files according to their title metadata. so far "-filename<%f_$title.%e" DIR seems to do the trick for many of the files, but fails on certain files. I assume this has something to do with characters in the Title field that are not acceptable for use in a filename. When I run the command on a directory, I get one error about a particular file with a ':' in its title, but this is not the only file that seems to fail. In fact, when I remove this file from the directory and run the command again, I get no errors, but a number of the files don't seem to properly get their .pdf extensions, and what's more show up as 0KB files.

What can I do to 1. make sure this command names the files without unacceptable characters, and 2. fix the issue with files losing all their data?

Thanks for your patience with a beginner!

Phil Harvey

I must admit, I do not understand how the standard "rename" library function in Windows can erase the contents of a file (ExifTool uses this function to rename the file).  This certainly doesn't happen in other operating systems, but this is the 2nd time I have seen someone report this so it must be happening. (Here is the first.)  It seems the problem is related to having colons in the file name.

Here is a config file that will remove the illegal characters from Title:

%Image::ExifTool::UserDefined = (
    'Image::ExifTool::Composite' => {
        MyTitle => {
            Require => 'Title',
            # translate characters which are illegal in Windows file names to underlines
            ValueConv => '$val =~ tr{/\?*:|"<>}{_}; $val',
        },
    },
);


With this config file active, you can use MyTitle instead of Title to get a filtered version of the title string.  See the sample config file for instructions on activating this file.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

I was able to reproduce this problem.  Wow.  I do NOT understand Windows at all:

c:\>C:\>dir tmp
Volume in drive C has no label.
Volume Serial Number is C8F0-D326

Directory of C:\tmp

04/11/2012  08:25 AM    <DIR>          .
04/11/2012  08:25 AM    <DIR>          ..
04/01/2006  02:02 PM             1,373 FujiFilm.jpg
25/07/2009  06:01 AM             8,907 PDF.pdf
               2 File(s)         10,280 bytes
               2 Dir(s)   6,995,259,392 bytes free

C:\>exiftool tmp -filename=00:0000 -v3
Writing File:FileName
======== tmp/FujiFilm.jpg
'tmp/FujiFilm.jpg' --> 'tmp/00:0000'
    + FileName = 'tmp/00:0000'
======== tmp/PDF.pdf
'tmp/PDF.pdf' --> 'tmp/00:0000'
    + FileName = 'tmp/00:0000'
    1 directories scanned
    2 image files updated

C:\>dir tmp
Volume in drive C has no label.
Volume Serial Number is C8F0-D326

Directory of C:\tmp

04/11/2012  08:25 AM    <DIR>          .
04/11/2012  08:25 AM    <DIR>          ..
25/07/2009  06:01 AM                 0 00
               1 File(s)              0 bytes
               2 Dir(s)   6,995,263,488 bytes free

C:\>dir tmp\00:0000
Volume in drive C has no label.
Volume Serial Number is C8F0-D326

Directory of C:\tmp

File Not Found


but if I look at the same directory from Cygwin (which I have running on my Windows PC):

> ls -l tmp
total 0
-rwxr-xr-x    1 phil     None            0 Jul 25  2009 00

> ls -l tmp/00:0000
-rwxr-xr-x    1 phil     None         8907 Jul 25  2009 tmp/00:0000


The file actually exists somewhere, but it doesn't show up in the directory listing (except from Cygwin when I specify the file name).  I can read the metadata from "tmp/00:0000" using exiftool in Cygwin (it is the PDF file -- the FujiFilm file was lost), but not in Windows.

I stepped through the code to see what was happening.  The library call to "rename" isn't the problem as I suggested before.  The rename does fail as it should, but then exiftool falls back to copying the file instead, and the "open" call to create the new file succeeds.  Also, there are no errors writing to the new file or closing it (I'm a bit anal about checking error return codes).  So I don't see any easy way to tell that something funny has happened.  It would be best if I could add a patch to exiftool to prevent this, but right now I can't see how.

So until I can figure out a better solution, my advice is:  Be careful of your file names when using the rename feature in Windows.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).