Using wildcard in file path

Started by FlameDra, October 30, 2024, 01:01:24 AM

Previous topic - Next topic

FlameDra

I recently noticed that Google Takeout has changed the naming of the json metadata file it would create for it's images.

Previously the metadata file name was in the format:

filename.extension.json

However, now the metadata file name can be:

filename.extension.supplemental-meta.json

filename.extension.supplemental-metadata.json

filename.extension.supplemental-me.json

filename.extension.supplemental-metada.json

And probably other variants I have not found yet.

I would run a Go script in my Google Takeout folder to fix the metadata:

func exiftoolMetadataFix(dirPath string) error {
// Define the exiftool command and its arguments
cmd := exec.Command("exiftool",
"-d", "%s",
"-tagsfromfile", "%d%f.%e.json",
"-DateTimeOriginal<PhotoTakenTimeTimestamp",
"-FileCreateDate<PhotoTakenTimeTimestamp",
"-FileModifyDate<PhotoTakenTimeTimestamp",
"-overwrite_original",
"-ext", "mp4",
"-ext", "jpg",
"-ext", "heic",
"-ext", "mov",
"-ext", "jpeg",
"-ext", "png",
"-ext", "gif",
"-ext", "webp",
"-r", ".",
"-progress",
)

// Set the working directory to the user-provided directory path
cmd.Dir = dirPath

// Run the command and capture the output
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr

fmt.Println("Running exiftool to fix metadata.")

// Execute the command
return cmd.Run()
}



However I would like to update the following section:

"-tagsfromfile", "%d%f.%e.json"

To include the format of filename.extension.anything.json as well as the original filename.extension.json which it was previously.

I have tried:

"-tagsfromfile", "%d%f.%e.*.json",

But it does not seem to match the criteria and keeps the * in the string instead of using it as a wildcard:

Warning: Error opening file - ./2024 Colorado/PXL_20240902_140316491.jpg.*.json
Nothing changed in ./2024 Colorado/PXL_20240902_140316491.jpg
======== ./2024 Colorado/PXL_20240902_140317444.jpg [361/11477]
Warning: Error opening file - ./2024 Colorado/PXL_20240902_140317444.jpg.*.json
Nothing changed in ./2024 Colorado/PXL_20240902_140317444.jpg
======== ./2024 Colorado/PXL_20240902_141512531.jpg [362/11477]
Warning: Error opening file - ./2024 Colorado/PXL_20240902_141512531.jpg.*.json

Is there a way to create a name/path matching which satisfies both conditions?


wywh

Google Takeout truncates long filenames so jpg filenames over 47 chars fail to get a matching .json:

a2345678901234567890123456789012345678901.jpg
b23456789012345678901234567890123456789012.jpg
c234567890123456789012345678901234567890123.jpg
d2345678901234567890123456789012345678901234.jpg
e23456789012345678901234567890123456789012345.jpg
f234567890123456789012345678901234567890123456.jpg
g2345678901234567890123456789012345678901234567.jpg
h23456789012345678901234567890123456789012345678.jpg
i234567890123456789012345678901234567890123456789.jpg

...Google Takeout truncates and corrupts those to 51 characters as follows:

a2345678901234567890123456789012345678901.jpg
a2345678901234567890123456789012345678901.jpg.json
b23456789012345678901234567890123456789012.jpg
b23456789012345678901234567890123456789012.jpg.json
c234567890123456789012345678901234567890123.jpg
c234567890123456789012345678901234567890123.jp.json
d2345678901234567890123456789012345678901234.jpg
d2345678901234567890123456789012345678901234.j.json
e23456789012345678901234567890123456789012345.jpg
e23456789012345678901234567890123456789012345..json
f234567890123456789012345678901234567890123456.jpg
f234567890123456789012345678901234567890123456.json
g2345678901234567890123456789012345678901234567.jpg
g234567890123456789012345678901234567890123456.json
h2345678901234567890123456789012345678901234567.jpg
h234567890123456789012345678901234567890123456.json
i2345678901234567890123456789012345678901234567.jpg
i234567890123456789012345678901234567890123456.json

So merging fails in those unless the filenames are fixed to match:

Warning: Error opening file - ./c234567890123456789012345678901234567890123.jpg.json
Warning: Error opening file - ./d2345678901234567890123456789012345678901234.jpg.json
Warning: Error opening file - ./e23456789012345678901234567890123456789012345.jpg.json
Warning: Error opening file - ./f234567890123456789012345678901234567890123456.jpg.json
Warning: Error opening file - ./g2345678901234567890123456789012345678901234567.jpg.json
Warning: Error opening file - ./h2345678901234567890123456789012345678901234567.jpg.json
Warning: Error opening file - ./i2345678901234567890123456789012345678901234567.jpg.json

You could truncate filenames to 30 characters with something like the command below. That command needs fine-tuning because it does not handle corrupted extensions like:

.jp.json
.j.json
..json
.json

exiftool -ext jpg -ext json "-FileName=%30f.%e" .
So some regex magic is needed to fix that. I let others chime in for that :D

- Matti

FlameDra

Would that mean setting it up as the following?

-tagsfromfile %d%30f.%e.json

Would it match with the correct image if the tagsfromfile file is only limited to 30 characters, or would it fill in the rest?

StarGeek

It would only use a maximum of 30 characters from the original file. It won't fill in anything else.

Also take note that Google does not remove any metadata from your files. Copying time stamps from the JSON files will overwrite the correct date/time with the incorrect date/time, as the json files have the UTC time, not the original time.

Additionally, from your example, it looks like the filenames are set to the date/time. A better option would be to copy the filename to the date/time tags as shown in FAQ #5.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

FlameDra

When I download my files from Google Takeout, the DateTimeOriginal and FileCreateDate are always wrong and set to the timestamp of download.

As you can see from this fresh download from Google Takeout, the 'Date created' is 10/28/24 for all the files (which is when they were downloaded) and do not match the actual date the photos were taken.



Example of metadata from one of the files:

{
  "title": "PXL_20240706_195956227.jpg",
  "description": "",
  "imageViews": "5",
  "creationTime": {
    "timestamp": "1720311619",
    "formatted": "Jul 7, 2024, 12:20:19 AM UTC"
  },
  "photoTakenTime": {
    "timestamp": "1720295996",
    "formatted": "Jul 6, 2024, 7:59:56 PM UTC"
  },
...

Even thought the photo was taken on Jul 7, 2024, the `Date created` here is 10/28/24.


When I import these files into Immich, they are not organized by Year/Month/Day in the Immich UI and just end up in one big lump according as Immich uses the `Date created` property for sorting.

This has been a consistent behavior with Google Takeout for me, which is why I have had to have a script to update the metadata of the image files from the json metadata files. I am not sure if it preserves the correct `Date created` property for other users of Google Takeout, but my personal experience has been the above.

FlameDra

It seems like using the 30 character truncate method is not working for me. I am running the following:

exiftool -d "%s" -tagsfromfile %d%30f.%e.json "-DateTimeOriginal<PhotoTakenTimeTimestamp" "-FileCreateDate<PhotoTakenTimeTimestamp" "-FileModifyDate<PhotoTakenTimeTimestamp" -overwrite_original -ext mp4 -ext jpg -ext heic -ext mov -ext jpeg -ext png -ext gif -ext webp -r . -progress

On this folder where I have image files along with the json metadata files:



However it does not seem to find the correct json files to match.

...
======== ./IMG_20190623_172854.jpg [24/30]
Warning: Error opening file - ./IMG_20190623_172854.jpg.json
Nothing changed in ./IMG_20190623_172854.jpg
======== ./IMG_20190623_172917.jpg [25/30]
Warning: Error opening file - ./IMG_20190623_172917.jpg.json
Nothing changed in ./IMG_20190623_172917.jpg
======== ./IMG_20191103_183121.jpg [26/30]
Warning: Error opening file - ./IMG_20191103_183121.jpg.json
Nothing changed in ./IMG_20191103_183121.jpg
======== ./IMG_20191117_110508.jpg [27/30]
Warning: Error opening file - ./IMG_20191117_110508.jpg.json
Nothing changed in ./IMG_20191117_110508.jpg
======== ./IMG_20191119_171818.jpg [28/30]
Warning: Error opening file - ./IMG_20191119_171818.jpg.json
Nothing changed in ./IMG_20191119_171818.jpg
======== ./IMG_20191119_171920.jpg [29/30]
Warning: Error opening file - ./IMG_20191119_171920.jpg.json
Nothing changed in ./IMG_20191119_171920.jpg
======== ./IMG_20191213_194801.jpg [30/30]
Warning: Error opening file - ./IMG_20191213_194801.jpg.json
Nothing changed in ./IMG_20191213_194801.jpg
    1 directories scanned
    0 image files updated
  30 image files unchanged


It keeps looking for `./IMG_20190623_172854.jpg.json` when I want it to look for `IMG_20190623_172854.jpg.supplemental-metadata.json` or any variant of `filename.extension.anything.json`.

I would like to find a way for exiftool to do that.

StarGeek

Quote from: FlameDra on October 30, 2024, 05:33:13 PMWhen I download my files from Google Takeout, the DateTimeOriginal and FileCreateDate are always wrong and set to the timestamp of download.

Those are not EXIF time stamps. Those are file system time stamps. This is why I say you are writing incorrect data. You are overwriting the embedded time stamps with the ones from the JSON files. And the JSON files do not hold the correct time stamp. They hold the UTC time for when the images where taken

You need to add the "Date Taken" column to see the EXIF time stamps.


QuoteExample of metadata from one of the files:

{
  "title": "PXL_20240706_195956227.jpg",
  "description": "",
  "imageViews": "5",
  "creationTime": {
    "timestamp": "1720311619",
    "formatted": "Jul 7, 2024, 12:20:19 AM UTC"
  },
  "photoTakenTime": {
    "timestamp": "1720295996",
    "formatted": "Jul 6, 2024, 7:59:56 PM UTC"
  },
...

Even thought the photo was taken on Jul 7, 2024, the `Date created` here is 10/28/24.

This is interesting, though Google is still technically incorrect. According to the filename, the time stamp should be 2024:07:06 19:59:56. This is the same time as what Google has, but it is labeled as UTC, which would only be correct if the time zone was +00:00 when it was taken.

In all my tests, when I uploaded a file, it adjusted the time from the time it was taken to UTC, so the "photoTakenTime" values were always off by 8 or 7 hours, as my time zone is -08:00/-07:00. Maybe they've changed.

QuoteWhen I import these files into Immich, they are not organized by Year/Month/Day in the Immich UI and just end up in one big lump according as Immich uses the `Date created` property for sorting.

Try copying the time stamps from the embedded time stamps first
exiftool "-FileModifyDate<DateTimeOriginal" "-FileCreateDate<DateTimeOriginal" /path/to/files/
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

FlameDra

#7
Thanks for the clarification. You are correct and the 'Date taken' field does match the timestamp of what its supposed to be.

It seems like Immich does not use the `Data taken` field and uses one of the other date fields which are set to 10/28/2024.

I did notice that `Date taken` is set on some files, but not on all of them, for example:



From a cursory glance, it seems like the `Date taken` is only set for photos I have taken on my phone and is not set for:
1. .webp files
2. Screenshots
3. Photos saved from apps (Messenger, WhatsApp, etc.)
4. Some photos shared by others via shared albums
5. Any photos edited on the phone using Lightroom, Snapseed and exported

However, the json metadata files exist for all of them.

Since its not all inclusive for the types of photos I have in my gallery, I will still need a script to copy the date metadata from the json files to the media files.

StarGeek

Quote from: FlameDra on October 30, 2024, 08:12:16 PMFrom a cursory glance, it seems like the `Date taken` is only set for photos I have taken on my phone and is not set for:
1. .webp files

I don't believe Windows 10 or earlier read any metadata that may be embedded in WebP files. I'm pretty sure I've tested this, but I can't say for certain. You'll find a similar situation with PNG files.

From what I've heard, Windows 11 now correctly reads metadata in PNG files, but I haven't heard anything about changes for WebP.

Quote2. Screenshots
3. Photos saved from apps (Messenger, WhatsApp, etc.)

Screenshots almost never have any metadata, but a lot of the time the filename includes the date/time. In those cases, then copying directly from the filename would be best (FAQ #5)
exiftool "-AllDates<Filename" /path/to/files/

The others, especially social media, strip away all metadata for privacy. What's App images include a date, but not a time. There are several posts in these forums for dealing with those, but they set a time of midnight 00:00:00. The others are ones that you would actually want to copy from the JSON files

Quote4. Some photos shared by others via shared albums

I wouldn't know about them.

QuoteSince its not all inclusive for the types of photos I have in my gallery, I will still need a script to copy the date metadata from the json files to the media files.

I would suggest adding -wm cg to your commands. This allows exiftool to create new tags, but it will not overwrite existing tags. So if there is already data there, it will not be overwritten.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

FlameDra

Thanks for sharing all this info, its really helpful. I'll update my scripts to include some of your suggestions!

A question about the `-wm cg` option. Would adding it not override the incorrect date? For example when `Date created` is wrong but `Date taken` is correct and I want to copy `Date taken` 's value into `Date created`? Or will it just skip it.

Similar for when I want to override incorrect `Data created` values with the value from the JSON (for cases where Date taken does not exist).

Phil Harvey

Quote from: FlameDra on October 31, 2024, 12:53:16 AMA question about the `-wm cg` option. Would adding it not override the incorrect date?

Correct.  Existing tags will not be updated.

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

FlameDra

Thanks for the info everyone. For future visitors to this thread, I am sharing the Go script I have written. It goes through each JSON metadata file, finds the corresponding media file (image/video), and updates the media file with the data metadata from the JSON file.

// Google Photos now has file names such as filename.extension.supplemental-meta*.json and other variants
// This function handles these cases by going through each JSON file
// Finding the associated media file, and then performing metadata updates
// Media files which have no matching JSON files are ignored
func exiftoolMetadataFixFileByFile(dirPath string) error {
    // Regex to get string until first extension, ie. the media extension (.jpg, .mp4, etc.)
    filenamePattern := regexp.MustCompile(`^(.*?\.\w+)\.`)

    var successCount, errorCount, skipCount int

    err := filepath.Walk(dirPath, func(path string, info os.FileInfo, err error) error {
        if err != nil {
            return err
        }

        fmt.Println("Current filepath: " + path)

        // Check if the file has a .json extension
        if !info.IsDir() && filepath.Ext(info.Name()) == ".json" {
            fmt.Println("====")
            fmt.Println("Found JSON metadata file:", path)

            // Extract the path up to the original file extension
            // This grabs the filename up to the media extension file
            // IMG_20201031_144417.jpg.supplemental-metadata.json -> IMG_20201031_144417.jpg
            match := filenamePattern.FindStringSubmatch(path)

            fmt.Printf("Filenames until first extension: %v\n", match)

            if len(match) > 1 {
                nonJSONFilePath := match[1] // The second item in array is the best formatted string
                fmt.Println("Media filename with extension:", nonJSONFilePath)

                // Use exiftool to update metadata from JSON to the media file
                cmd := exec.Command("exiftool",
                    "-d", "%s",
                    "-tagsfromfile", path, // JSON file as the source of metadata
                    "-DateTimeOriginal<PhotoTakenTimeTimestamp",
                    "-FileCreateDate<PhotoTakenTimeTimestamp",
                    "-FileModifyDate<PhotoTakenTimeTimestamp",
                    "-overwrite_original",
                    "-ext", "mp4", "-ext", "jpg", "-ext", "heic", "-ext", "mov", "-ext", "jpeg", "-ext", "png", "-ext", "gif", "-ext", "webp",
                    nonJSONFilePath, // Target image file as the file to update
                )

                cmd.Stdout = os.Stdout
                cmd.Stderr = os.Stderr

                if err := cmd.Run(); err != nil {
                    fmt.Println("Error updating file:", nonJSONFilePath, "with JSON:", path, "-", err)
                    errorCount++
                } else {
                    fmt.Println("Updated", nonJSONFilePath, "with metadata from", path)
                    successCount++
                }
            } else {
                fmt.Println("Media file for this JSON file was not found: ", path)
                skipCount++
            }

            fmt.Println("====")
        } else {
            fmt.Println(path + " is not a JSON file, skipping!")
        }
        return nil
    })

    fmt.Printf("Success: %d\n", successCount)
    fmt.Printf("Errors: %d\n", errorCount)
    fmt.Printf("Skipped: %d\n", skipCount)

    return err
}

It's working for my use case, but if anyone wants to use it please review it and make sure it will work for your use case first.