UTF8 Album Name Not Being Matched with Regex in FilePath

Started by Invindicator, September 06, 2017, 05:08:47 AM

Previous topic - Next topic

Invindicator

Hi Phil and all,
Being a long time user of Exiftool to fix, edit and maintain my photos and videos I'm now attempting to use exiftool to validate file paths for my music collections (Windows 10). I'm using exiftool to get the filepath tag of each mp3 file and match this with a regex in the if statement to list files whose directory names are not of the format "^.+/$band/$album/$tracknumber $title.mp3$". The exact regex im using is a little different to the one just given, substituting certain characters for a '_' as files have been encoded to ID3v2.4 using the most recent version of iTunes. The problem is that the -if statement match fails for any paths and files whose names contain any UTF-8 characters (Ed Sheeran ÷ (Deluxe) album and Psy's Gangnam Style (강남스타일) album as two examples). I have looked at a number of the other posts on UTF-8 values and exiftool on this forum however can't seem to work out the answer myself. I was wondering if anyone has any solutions/suggestions which could be of some use.
THanks for the help
Regards Nick

P.s. Here is the file path and my current regex match for anyone who is interested

exiftool -filepath "01 Eraser.mp3"
File Path                       : C:\Users\Nick\Music\iTunes\iTunes Media\Music\Ed Sheeran\� (Deluxe)\01 Eraser.mp3

exiftool -if "$filepath !~ /^.+[\\\/]$band[\\\/]$album[\\\/].+$$/"

Note at the moment I am not too worried about the actual filename as the main issue is just to get the actual issue of matching UTF8 characters solved

P.p.s - Phil: Not sure how familiar you are with the ID3v2.4 spec but there is an unsynchronisation flag which can be set for the APIC tag. I was wondering if there was a way for exiftool to determine whether or not this flag has been set within the mp3 file or if it could possibly be added in the future  :)

StarGeek

This may require Phil's attention but it may be a while as he's currently on vacation.

I've never been able to properly deal with UTF8 and Windows myself and some quick googling on Perl, regex, and UTF8 doesn't seem to indicate an easy solution.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Invindicator

Thanks Stargeek. Its something that seems so simple yet ends up being so difficult hahaha. I shall eagerly and patiently wait  :)

johnrellis

You're tripping over a couple of issues:

1. The documentation for FilePath says: "Does not support Windows Unicode file names", and some experiments bear that out. But Directory does seem to support Unicode file names. For your purposes, the directory is the part of the path that should contain the artist and album.

2. Substituting $Album into a Perl regular expression will fail if the substituted string contains special pattern characters that need to be escaped.  I haven't used Perl in a decade and even then I was a novice blinded by the mysteries, but I think using \Q$Album\E solves this problem.

Here's an example that works (at least on the one directory I tried):

Y:\Music\iTunes\iTunes Music\PSY>exiftool -r . -ext m4a -if "$Directory =~ /^.*\/\Q$Album\E$$/" -directory -filename
======== ./Gangnam Style (강남스타일) - Single/01 Gangnam Style (강남스타일).m4a
Directory                       : ./Gangnam Style (강남스타일) - Single
File Name                       : 01 Gangnam Style (강남스타일).m4a
    2 directories scanned
    1 image files read



Invindicator

Thanks for the tips johnrellis. I had already implemented the quotemeta characters into the expression, as there were names with brackets and the like that were previously causing issues.

The point you make about the filepath variable however I did not know about and is very interesting to find out. Any idea as to why it does not support windows unicode? I am happy to give the directory variable a try however from what I have seen previously it only provides a relative directory path and as such I cant match a regex to the full file name? Am I wrong in assuming this or is it possible to do.

One more question does anyone know if you can get email updates whenever a thread you are following is updated. I keep forgetting to check the forum every couple of days and it would be a good feature to have however I cant find such a setting if one does already exist :)

Thanks again, Nick

johnrellis

QuoteAny idea as to why it does not support windows unicode?
Phil would have to answer that.

Quoteit only provides a relative directory path
It provides the relative path from the current directory to the file being processed. If you invoke ExifTool with "-r . -ext m4a", from the iTunes root directory, then you'll get relative directory paths of the form:
./iTunes Music/Benny Goodman/The Platinum Collection
These paths contain both the artist and the album, which is what you indicated you needed. 

johnrellis

Quoteget email updates whenever a thread you are following is updated
At the top of the forum page, click Profile > Account Settings.  Then click Modify Profile > Notifications. Select the option "Turn notification on when you post or reply to a topic".

StarGeek

Or just turn on notification individually for any thread you want to follow.  Just click "Notify" in the upper right of the thread.



(That is, unless the permissions are incorrectly set and only Mods can see that button.)
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Invindicator

Cool thanks guys, I've got the notifications working I believe. Will attempt to see how we go with the Directory variable and report back.

On another note I was wondering if either anyone can sort out my problematic if statement. I want to list filenames for songs where:

  • $genre# < 0
  • $genre# > 191
  • $genre# != 255
  • $genre ne 'CR'
  • $genre ne 'RX'

So basically any files where genre is NOT a number and is equal to CR or RX, or where it is a number and is between 0 and 191 inclusive or equal to 255. Hopefully that makes sense. Ive tried a couple of different statements however none of them have seemed to work

Nick

Phil Harvey

I'm not clear on the logic you want.  Perhaps this?:

exiftool -filename -n -if '($genre=~/^\d+$/ and ($genre<0 or $genre>191) and $genre!=255) or ($genre!~/^\d+$/ and $genre ne "CR" and $genre ne "RX")' DIR

This should list unconverted Genre values which are numerical and <0 or >191 (not including 255), or non-numerical genres which are not "CR" or "RX".

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Invindicator

Hi Phil thanks for the reply and sorry for the delay in responding.

I have tested the statement you gave me (on Windows) and converted it over to Windows formatting however I still had problems detecting some files. I was able to use it to garner a solution that worked however. Here is what I ended up using

exiftool -filename -genre -genre# -if "(($genre# < 0 or $genre# > 191) and $genre# != 255) or ($genre#=~/[A-Z]/i and ($genre# ne 'CR' and $genre# ne 'RX'))" .

Just as a side note I am aware that I can use the -n option however this is just going to be used as one part of a bigger if check and dont want to convert all values to their numeric equivalent :)

Also I just wanted to check with you or get an update on the original two issues raised regarding UTF8 file names and the possibility of checking for whether the unsynchronisation flag has been set

Phil Harvey

The unsynchronization flag is only a detail the metadata format.  Why do you want to see this information?  ExifTool already un-does the unsynchronization if necessary.

I could perhaps add this to the verbose (-v2) output.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Invindicator

I have been encoding my songs using mp3Tag to ID3v2.4. I then put them in my SONOS library and while the files are all valid, I have confirmed with the owner of mp3Tag that it writes the images using unsynchronisation, however SONOS does not read the unsynchronisation and hence no artwork is displayed. I then started converting the files ID3 information with iTunes, again to ID3v2.4, which does not use unsynchronisation and the artwork is displayed properly within SONOS. So I wanted to use exiftool to validate the files and check whether the flag was set to see what files had been done/still needed doing

Phil Harvey

I see, thanks.

If I add this to the -v2 output then you could do something like this:

exiftool -v2 DIR | grep -E '(===|APIC.*Unsync)'

This will show all file names and list unsynchronized APIC frames.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

Invindicator