recursive directory loops because of symlinks

Started by MichaelRath, January 06, 2011, 05:44:11 AM

Previous topic - Next topic

MichaelRath

Dear Phil,

first of all thanks to you and all the contributors for this absolute wonderful program. It helps me a lot organizing my >70000 pictures and it is really exciting what variety of information can be found in EXIF data (e.g. camera temperature, lighting value...).

When recursively finding all pictures on all my harddisks I ran into a problem that can be easily solved:

I use exiftool under Cygwin in Windows and a recursive search lead to directory loops because of Windows use of symbolics links in "Documents and Settings".

I solved this by skipping symlinks in function ScanDir. Here's my version of the first few lines:

# Scan directory for image files
# Inputs: 0) ExifTool ref, 1) directory name, 2) list ref to return file names
sub ScanDir($$;$)
{
    my ($exifTool, $dir, $list) = @_;
    opendir(DIR_HANDLE, $dir) or Warn("Error opening directory $dir\n"), return;
    my @fileList = readdir(DIR_HANDLE);
    closedir(DIR_HANDLE);

    my $file;
    $dir =~ /\/$/ or $dir .= '/';
    foreach $file (@fileList) {
        my $path = "$dir$file";
        if (-d $path) {
            if (-l $path) {
                print STDERR "Ignoring link:", $path,"\n";
                next;
            }
            next if $file =~ /^\./; # ignore dirs starting with "."
            next if grep /^$file$/, @ignore;
            $recurse and ScanDir($exifTool, $path, $list);
            next;
        }


The only change is the if block starting with "if (-l $path) {".

I saved this version under a new filename ("exiftool-nosymlinks"), but it probably would be nice to have a command line option (e.g. -dont-folllow-symlinks) to switch this behaviour on and off as it might be useful for others too...

Regards

Michael

Phil Harvey

Hi Michael,

Thanks for this report.  I hadn't considered this problem and I'll have to think about it and do some testing.  I'm thinking it may be better if I could just avoid processing the same directory twice, but I'm not sure how to make this determination.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

MichaelRath

Dear Phil,

thanks for your fast reply. Checking if the dir was already visited might be a very good idea to prevent never ending loops in recursion and still be able to follow symlinks (perhaps you can put all visited dirs in a hash and check for the existence of the path in it).

But still it might be useful to disable the following of any symlinks as I don't know if there is a platform idependent way in Perl to know in which physical directory you are after accessing it via a symlink and using that for the check.

And windows has a lot of links pointing to the same location:

On my 64 bit german language system I have the following links reaching  C:\Users\Public\Pictures\USA:

/cygdrive/c/Documents\ and\ Settings -> /cygdrive/c/Users
/cygdrive/c/Dokumente\ und\ Einstellungen -> /cygdrive/c/Users
/cygdrive/c/ProgramData/Documents -> /cygdrive/c/Users/Public/Documents
/cygdrive/c/ProgramData/Dokumente -> /cygdrive/c/Users/Public/Documents
/cygdrive/c/Users/Public/Documents/Eigene\ Bilder -> /cygdrive/c/Users/Public/Pictures
/cygdrive/c/Users/Public/Documents/My\ Pictures -> /cygdrive/c/Users/Public/Pictures


so that I can reach this directory via 23 different paths:

/cygdrive/c/Documents\ and\ Settings/All\ Users/Documents/Eigene\ Bilder/USA
/cygdrive/c/Documents\ and\ Settings/All\ Users/Documents/My\ Pictures/USA
/cygdrive/c/Documents\ and\ Settings/All\ Users/Dokumente/Eigene\ Bilder/USA
/cygdrive/c/Documents\ and\ Settings/All\ Users/Dokumente/My\ Pictures/USA
/cygdrive/c/Documents\ and\ Settings/Public/Documents/Eigene\ Bilder/USA
/cygdrive/c/Documents\ and\ Settings/Public/Documents/My\ Pictures/USA
/cygdrive/c/Documents\ and\ Settings/Public/Pictures/USA
/cygdrive/c/Dokumente\ und\ Einstellungen/All\ Users/Documents/Eigene\ Bilder/USA
/cygdrive/c/Dokumente\ und\ Einstellungen/All\ Users/Documents/My\ Pictures/USA
/cygdrive/c/Dokumente\ und\ Einstellungen/All\ Users/Dokumente/Eigene\ Bilder/USA
/cygdrive/c/Dokumente\ und\ Einstellungen/All\ Users/Dokumente/My\ Pictures/USA
/cygdrive/c/Dokumente\ und\ Einstellungen/Public/Documents/Eigene\ Bilder/USA
/cygdrive/c/Dokumente\ und\ Einstellungen/Public/Documents/My\ Pictures/USA
/cygdrive/c/Dokumente\ und\ Einstellungen/Public/Pictures/USA
/cygdrive/c/ProgramData/Documents/Eigene\ Bilder/USA
/cygdrive/c/ProgramData/Documents/My\ Pictures/USA
/cygdrive/c/ProgramData/Dokumente/Eigene\ Bilder/USA
/cygdrive/c/ProgramData/Dokumente/My\ Pictures/USA
/cygdrive/c/Users/All\ Users/Documents/Eigene\ Bilder/USA
/cygdrive/c/Users/All\ Users/Documents/My\ Pictures/USA
/cygdrive/c/Users/All\ Users/Dokumente/Eigene\ Bilder/USA
/cygdrive/c/Users/All\ Users/Dokumente/My\ Pictures/USA
/cygdrive/c/Users/Public/Documents/Eigene\ Bilder/USA


The default of the "find" utility is not to follow symlinks at all. Might be a good option for exiftool too...

Regards

Michael

Phil Harvey

#3
Quote from: MichaelRath on January 07, 2011, 07:30:09 PM
(perhaps you can put all visited dirs in a hash and check for the existence of the path in it).

Yes, but how is this done?  I need to do some research.

If I walk the directory tree to "/a/b/c/b/c", how do I know that this is the same directory as "/a/b/c"?  (Using system-independent functions only -- ie. no system calls).

Quote
But still it might be useful to disable the following of any symlinks as I don't know if there is a platform idependent way in Perl to know in which physical directory you are after accessing it via a symlink and using that for the check.

Exactly.  I should have read this before typing my response above.  Maybe this is the only reasonable alternative, but I really hate adding new options, so I avoid doing this whenever possible.

- Phil

Edit: I did a quick search, and there is a system-dependent "readlink" function built into Perl that I might be able to use.  It gives a fatal error if used on systems which don't support symlinks, but I can trap this.  I will look into this.
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Phil Harvey

The readlink function didn't pan out -- too much work generating the actual directory names for hashing.

But I have an idea which gives you the feature you want without adding a new option.  I will add a feature which uses the existing -i (-ignore) option to disable following of symlinks:

exiftool -i SYMLINKS ...

I'm happy with this, and the default behaviour of exiftool doesn't change, which is also good for backward compatibility.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

MichaelRath

Hi Phil,

this is a good idea (not having to add another option and keeping backwards compability) and it is all I need. I think the probability that anyone uses "SYMLINKS" as a directory name is really low...

Thank you very much.

Regards

Michael