Working with UTF8BOM

Started by Martin Z, October 31, 2024, 05:11:15 PM

Previous topic - Next topic

Martin Z

I have a PowerShell script that, for a given folder...
  • Collates metadata from various sources/files
  • Arranges the data into an EXIFtool-like table (Generating a SourceFile column, using tag names as column headers, etc)
  • Saves this as a combined Metadata.csv file
  • Uses EXIFtool to write the metadata from Metadata.csv into each file

Issue 1: I have to use utf8BOM
After tearing my hair out for a bit, I found that I needed to specify the CSV encoding format as "utf8BOM" [Info], otherwise non-ASCII characters (e.g. emoji) would get corrupted, and so instead of an image subject being recorded in the CSV as "Holiday photo 🌴" it would instead get saved as something like "Holiday photo ▯▯▯▯" if I encoded the CSV file as utf8 (without BOM).

Issue 2: This seemed to stop EXIFtool finding files
While this sorted the data in the CSV, it seems to have had a knock-on effect on EXIFtool whereby it can seemingly no longer read the filenames / match the SourceFile column and the files in the folder.

For example...
=== FOLDER STRUCTURE [C:\Test folder] ===
File1.jpg
File2.jpg
Metadata.csv

=== CSV STRUCTURE ===
SourceFile   | CreateDate           | Title             | XPSubject | XPKeywords
./File1.jpg* | 19/01/2024  16:13:00 | Holiday photo 🌴 | Foo       | Bar
./File2.jpg* | 19/01/2024  16:14:00 | Holiday photo 🌴 | Foo       | Bar

* NB: I have tried formatting the column as both "File1.jpg" and "./File1.jpg", as well as adding "FileName" and "Directory" columns, however I still couldn't get EXIFtool to find the files

> EXIFtool -csv:Metadata.csv -d "%d/%m/%Y  %H:%M:%S" -r .
No SourceFile './File1.jpg' in imported CSV database
(full path: 'c:\test folder\file1.jpg')
No SourceFile './File2.jpg' in imported CSV database
(full path: 'c:\test folder\file2.jpg')
    1 directories scanned
    0 image files read

Any way to fix this please?
Is there a way I can fix this / enable EXFItool to read filenames in an urf8bom-formatted CSV?

Notes
  • I'm running on Windows 11, with active code page = 65001
  • I did try and read some existing posts on utf8bom but went a bit over my head / most seemed to relate to specific tags/strings being utf8bom-formatted (rather than the entire CSV file)
  • Also, just to avoid getting side-tracked, I know storing emojis and other non-ascii characters is not ideal (I did even look at removing these from the compiled data, but this ended up creating other issues, such as an all-emoji description becoming a null string, etc -- Ultimately, I don't own the source data and so I just want to capture and write the metadata as-is

StarGeek

Are you sure you're using UTF-8 BOM and not UTF-16 BOM? Powershell forces UTF-16 BOM when redirecting output, < or >, or when using a pipe |.

Example with UTF-8 BOM, using the file unix program from MSYS2 to show the BOM type. You can also see the BOM in the output from type.
C:\>file temp.csv
temp.csv: CSV Unicode text, UTF-8 (with BOM) text

C:\>type temp.csv
�SourceFile,ExifIFD:DateTimeOriginal,ExifIFD:CreateDate,IFD0:ModifyDate
Y:/!temp/x/y/test/Holiday photo 🌴.jpeg,2024:10:31 12:00:00,2024:10:31 12:00:00,2024:10:31 12:00:00

C:\>exiftool -P -overwrite_original -csv=temp.csv "Y:\!temp\x\y\test\Holiday photo 🌴.jpeg"
    1 image files updated

C:\>exiftool -G1 -a -s -Alldates "Y:\!temp\x\y\test\Holiday photo 🌴.jpeg"
[ExifIFD]       DateTimeOriginal                : 2024:10:31 12:00:00
[ExifIFD]       CreateDate                      : 2024:10:31 12:00:00
[IFD0]          ModifyDate                      : 2024:10:31 12:00:00

This StackOverflow post talks about changing PowerShell's output to UTF-8
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Martin Z

Thanks for getting back to me @StarGeek!

Quote from: StarGeek on October 31, 2024, 06:11:28 PMAre you sure you're using UTF-8 BOM and not UTF-16 BOM? Powershell forces UTF-16 BOM when redirecting output with <, > or | (pipe).

Yep, I am setting utf8bom explictly, and using PowerShell's Export-CSV cmdlet...
Export-CSV -Encoding utf8BOM

I don't have MSYS2, however I used Notepad++ to verify the file formats...
• For the PowerShell-generated CSV, format: UTF-8-BOM
• For the EXIFtool-generated CSV (as a control), format: UTF-8

Thanks for the link to the PowerShell/UTF8 S/O post -- Think this is one of the key posts I used back in the day, as I actually implemented the default parameters technique it specifies 👍🏼

StarGeek

Notepad++ is what I used to change the encoding on the test file. There isn't much I can help with because I can't replicate it. UTF-8 BOM works fine for me.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype

Martin Z

Oh, so a UTF8BOM-encoded CSV works fine for you, in terms of reading SourceFile filenames?

Interesting?... OK, I am in the middle of something right now, but I will try and find a suitable sample file (some are massive) and upload it later tonight/tomorrow.

Cheers,
Martin

FrankB


Martin Z

Quote from: FrankB on October 31, 2024, 07:02:46 PMYou could give this a try: -Api WindowsWideFile=1

Thanks @FrankB, that seems to have solved it!

FrankB


StarGeek

Quote from: Martin Z on October 31, 2024, 06:55:52 PMOh, so a UTF8BOM-encoded CSV works fine for you, in terms of reading SourceFile filenames?

Yes, UTF-8 BOM worked correctly. See the CODE section in my post. I copied the name you gave and show what I did step by step.  The only way I got your response is when I switched to UTF-16 BOM.

Quote from: FrankB on October 31, 2024, 07:02:46 PMYou could give this a try:

I keep forgetting about that.
"It didn't work" isn't helpful. What was the exact command used and the output.
Read FAQ #3 and use that cmd
Please use the Code button for exiftool output

Please include your OS/Exiftool version/filetype