I have a
PowerShell script that, for a given folder...
- Collates metadata from various sources/files
- Arranges the data into an EXIFtool-like table (Generating a SourceFile column, using tag names as column headers, etc)
- Saves this as a combined Metadata.csv file
- Uses EXIFtool to write the metadata from Metadata.csv into each file
Issue 1: I have to use utf8BOMAfter tearing my hair out for a bit, I found that I needed to specify the CSV encoding format as "
utf8BOM" [Info] (https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/export-csv?view=powershell-7.4#-encoding:~:text=utf8BOM%3A%20Encodes%20in%20UTF%2D8%20format%20with%20Byte%20Order%20Mark%20(BOM)), otherwise non-ASCII characters (e.g. emoji) would get corrupted, and so instead of an image subject being recorded in the CSV as "Holiday photo 🌴" it would instead get saved as something like "Holiday photo ▯▯▯▯" if I encoded the CSV file as utf8 (without BOM).
Issue 2: This seemed to stop EXIFtool finding filesWhile this sorted the data in the CSV, it seems to have had a knock-on effect on EXIFtool whereby it can seemingly no longer read the filenames / match the SourceFile column and the files in the folder.
For example...
=== FOLDER STRUCTURE [C:\Test folder] ===
File1.jpg
File2.jpg
Metadata.csv
=== CSV STRUCTURE ===
SourceFile | CreateDate | Title | XPSubject | XPKeywords
./File1.jpg* | 19/01/2024 16:13:00 | Holiday photo 🌴 | Foo | Bar
./File2.jpg* | 19/01/2024 16:14:00 | Holiday photo 🌴 | Foo | Bar
* NB: I have tried formatting the column as both "File1.jpg" and "./File1.jpg", as well as adding "FileName" and "Directory" columns, however I still couldn't get EXIFtool to find the files
> EXIFtool -csv:Metadata.csv -d "%d/%m/%Y %H:%M:%S" -r .
No SourceFile './File1.jpg' in imported CSV database
(full path: 'c:\test folder\file1.jpg')
No SourceFile './File2.jpg' in imported CSV database
(full path: 'c:\test folder\file2.jpg')
1 directories scanned
0 image files read
Any way to fix this please?Is there a way I can fix this / enable EXFItool to read filenames in an urf8bom-formatted CSV?
Notes
- I'm running on Windows 11, with active code page = 65001
- I did try and read some existing posts on utf8bom but went a bit over my head / most seemed to relate to specific tags/strings being utf8bom-formatted (rather than the entire CSV file)
- Also, just to avoid getting side-tracked, I know storing emojis and other non-ascii characters is not ideal (I did even look at removing these from the compiled data, but this ended up creating other issues, such as an all-emoji description becoming a null string, etc -- Ultimately, I don't own the source data and so I just want to capture and write the metadata as-is
Are you sure you're using UTF-8 BOM and not UTF-16 BOM? Powershell forces UTF-16 BOM when redirecting output, < or >, or when using a pipe |.
Example with UTF-8 BOM, using the file unix program from MSYS2 (https://www.msys2.org/) to show the BOM type. You can also see the BOM in the output from type.
C:\>file temp.csv
temp.csv: CSV Unicode text, UTF-8 (with BOM) text
C:\>type temp.csv
�SourceFile,ExifIFD:DateTimeOriginal,ExifIFD:CreateDate,IFD0:ModifyDate
Y:/!temp/x/y/test/Holiday photo 🌴.jpeg,2024:10:31 12:00:00,2024:10:31 12:00:00,2024:10:31 12:00:00
C:\>exiftool -P -overwrite_original -csv=temp.csv "Y:\!temp\x\y\test\Holiday photo 🌴.jpeg"
1 image files updated
C:\>exiftool -G1 -a -s -Alldates "Y:\!temp\x\y\test\Holiday photo 🌴.jpeg"
[ExifIFD] DateTimeOriginal : 2024:10:31 12:00:00
[ExifIFD] CreateDate : 2024:10:31 12:00:00
[IFD0] ModifyDate : 2024:10:31 12:00:00
This StackOverflow post (https://stackoverflow.com/questions/40098771/changing-powershells-default-output-encoding-to-utf-8) talks about changing PowerShell's output to UTF-8
Thanks for getting back to me @StarGeek!
Quote from: StarGeek on October 31, 2024, 06:11:28 PMAre you sure you're using UTF-8 BOM and not UTF-16 BOM? Powershell forces UTF-16 BOM when redirecting output with <, > or | (pipe).
Yep, I am setting
utf8bom explictly, and using PowerShell's
Export-CSV cmdlet...
Export-CSV -Encoding utf8BOMI don't have MSYS2, however I used Notepad++ to verify the file formats...
• For the PowerShell-generated CSV, format:
UTF-8-BOM• For the EXIFtool-generated CSV (as a control), format:
UTF-8Thanks for the link to the PowerShell/UTF8 S/O post -- Think this is one of the key posts I used back in the day, as I actually implemented the default parameters technique it specifies 👍🏼
Notepad++ is what I used to change the encoding on the test file. There isn't much I can help with because I can't replicate it. UTF-8 BOM works fine for me.
Oh, so a UTF8BOM-encoded CSV works fine for you, in terms of reading SourceFile filenames?
Interesting?... OK, I am in the middle of something right now, but I will try and find a suitable sample file (some are massive) and upload it later tonight/tomorrow.
Cheers,
Martin
You could give this a try:
-Api WindowsWideFile=1 -charset filename=utf8
https://exiftool.org/forum/index.php?topic=16544.msg88936#msg88936 (https://exiftool.org/forum/index.php?topic=16544.msg88936#msg88936)
Quote from: FrankB on October 31, 2024, 07:02:46 PMYou could give this a try: -Api WindowsWideFile=1
Thanks @FrankB, that seems to have solved it!
Glad it worked!
Quote from: Martin Z on October 31, 2024, 06:55:52 PMOh, so a UTF8BOM-encoded CSV works fine for you, in terms of reading SourceFile filenames?
Yes, UTF-8 BOM worked correctly. See the CODE section in my post. I copied the name you gave and show what I did step by step. The only way I got your response is when I switched to UTF-16 BOM.
Quote from: FrankB on October 31, 2024, 07:02:46 PMYou could give this a try:
I keep forgetting about that.