ExifTool Forum

ExifTool => The "exiftool" Application => Topic started by: Martin Z on October 31, 2024, 05:11:15 PM

Title: Working with UTF8BOM
Post by: Martin Z on October 31, 2024, 05:11:15 PM
I have a PowerShell script that, for a given folder...

Issue 1: I have to use utf8BOM
After tearing my hair out for a bit, I found that I needed to specify the CSV encoding format as "utf8BOM" [Info] (https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/export-csv?view=powershell-7.4#-encoding:~:text=utf8BOM%3A%20Encodes%20in%20UTF%2D8%20format%20with%20Byte%20Order%20Mark%20(BOM)), otherwise non-ASCII characters (e.g. emoji) would get corrupted, and so instead of an image subject being recorded in the CSV as "Holiday photo 🌴" it would instead get saved as something like "Holiday photo ▯▯▯▯" if I encoded the CSV file as utf8 (without BOM).

Issue 2: This seemed to stop EXIFtool finding files
While this sorted the data in the CSV, it seems to have had a knock-on effect on EXIFtool whereby it can seemingly no longer read the filenames / match the SourceFile column and the files in the folder.

For example...
=== FOLDER STRUCTURE [C:\Test folder] ===
File1.jpg
File2.jpg
Metadata.csv

=== CSV STRUCTURE ===
SourceFile   | CreateDate           | Title             | XPSubject | XPKeywords
./File1.jpg* | 19/01/2024  16:13:00 | Holiday photo 🌴 | Foo       | Bar
./File2.jpg* | 19/01/2024  16:14:00 | Holiday photo 🌴 | Foo       | Bar

* NB: I have tried formatting the column as both "File1.jpg" and "./File1.jpg", as well as adding "FileName" and "Directory" columns, however I still couldn't get EXIFtool to find the files

> EXIFtool -csv:Metadata.csv -d "%d/%m/%Y  %H:%M:%S" -r .
No SourceFile './File1.jpg' in imported CSV database
(full path: 'c:\test folder\file1.jpg')
No SourceFile './File2.jpg' in imported CSV database
(full path: 'c:\test folder\file2.jpg')
    1 directories scanned
    0 image files read

Any way to fix this please?
Is there a way I can fix this / enable EXFItool to read filenames in an urf8bom-formatted CSV?

Notes
Title: Re: Working with UTF8BOM
Post by: StarGeek on October 31, 2024, 06:11:28 PM
Are you sure you're using UTF-8 BOM and not UTF-16 BOM? Powershell forces UTF-16 BOM when redirecting output, < or >, or when using a pipe |.

Example with UTF-8 BOM, using the file unix program from MSYS2 (https://www.msys2.org/) to show the BOM type. You can also see the BOM in the output from type.
C:\>file temp.csv
temp.csv: CSV Unicode text, UTF-8 (with BOM) text

C:\>type temp.csv
�SourceFile,ExifIFD:DateTimeOriginal,ExifIFD:CreateDate,IFD0:ModifyDate
Y:/!temp/x/y/test/Holiday photo 🌴.jpeg,2024:10:31 12:00:00,2024:10:31 12:00:00,2024:10:31 12:00:00

C:\>exiftool -P -overwrite_original -csv=temp.csv "Y:\!temp\x\y\test\Holiday photo 🌴.jpeg"
    1 image files updated

C:\>exiftool -G1 -a -s -Alldates "Y:\!temp\x\y\test\Holiday photo 🌴.jpeg"
[ExifIFD]       DateTimeOriginal                : 2024:10:31 12:00:00
[ExifIFD]       CreateDate                      : 2024:10:31 12:00:00
[IFD0]          ModifyDate                      : 2024:10:31 12:00:00

This StackOverflow post (https://stackoverflow.com/questions/40098771/changing-powershells-default-output-encoding-to-utf-8) talks about changing PowerShell's output to UTF-8
Title: Re: Working with UTF8BOM
Post by: Martin Z on October 31, 2024, 06:38:56 PM
Thanks for getting back to me @StarGeek!

Quote from: StarGeek on October 31, 2024, 06:11:28 PMAre you sure you're using UTF-8 BOM and not UTF-16 BOM? Powershell forces UTF-16 BOM when redirecting output with <, > or | (pipe).

Yep, I am setting utf8bom explictly, and using PowerShell's Export-CSV cmdlet...
Export-CSV -Encoding utf8BOM

I don't have MSYS2, however I used Notepad++ to verify the file formats...
• For the PowerShell-generated CSV, format: UTF-8-BOM
• For the EXIFtool-generated CSV (as a control), format: UTF-8

Thanks for the link to the PowerShell/UTF8 S/O post -- Think this is one of the key posts I used back in the day, as I actually implemented the default parameters technique it specifies 👍🏼
Title: Re: Working with UTF8BOM
Post by: StarGeek on October 31, 2024, 06:49:54 PM
Notepad++ is what I used to change the encoding on the test file. There isn't much I can help with because I can't replicate it. UTF-8 BOM works fine for me.
Title: Re: Working with UTF8BOM
Post by: Martin Z on October 31, 2024, 06:55:52 PM
Oh, so a UTF8BOM-encoded CSV works fine for you, in terms of reading SourceFile filenames?

Interesting?... OK, I am in the middle of something right now, but I will try and find a suitable sample file (some are massive) and upload it later tonight/tomorrow.

Cheers,
Martin
Title: Re: Working with UTF8BOM
Post by: FrankB on October 31, 2024, 07:02:46 PM
You could give this a try:
-Api WindowsWideFile=1 -charset filename=utf8
https://exiftool.org/forum/index.php?topic=16544.msg88936#msg88936 (https://exiftool.org/forum/index.php?topic=16544.msg88936#msg88936)
Title: Re: Working with UTF8BOM
Post by: Martin Z on October 31, 2024, 07:11:22 PM
Quote from: FrankB on October 31, 2024, 07:02:46 PMYou could give this a try: -Api WindowsWideFile=1

Thanks @FrankB, that seems to have solved it!
Title: Re: Working with UTF8BOM
Post by: FrankB on October 31, 2024, 07:21:39 PM
Glad it worked!
Title: Re: Working with UTF8BOM
Post by: StarGeek on October 31, 2024, 08:58:46 PM
Quote from: Martin Z on October 31, 2024, 06:55:52 PMOh, so a UTF8BOM-encoded CSV works fine for you, in terms of reading SourceFile filenames?

Yes, UTF-8 BOM worked correctly. See the CODE section in my post. I copied the name you gave and show what I did step by step.  The only way I got your response is when I switched to UTF-16 BOM.

Quote from: FrankB on October 31, 2024, 07:02:46 PMYou could give this a try:

I keep forgetting about that.