extracting meta elements from HTML on the command line (version: 9.28, Windows 7):
exiftool.exe -HTML:HTML-dc:All -a -G1 -s effects-edited.htm
extracts all this (which is correct):
[HTML-dc] Relation : test_relation_1, test_relation_2, test_relation_3
[HTML-dc] Identifier : test_id_value_1
[HTML-dc] Identifier : test_id_value_2
[HTML-dc] Subject : test_subject_value_1, test_subject_value_2
[HTML-dc] Title : test_title_value_1
[HTML-dc] Title : test_title_value_2
Adding the -csv output option:
exiftool.exe -HTML:HTML-dc:All -a -G1 -s -csv effects-edited.htm
SourceFile,HTML-dc:Relation,HTML-dc:Identifier,HTML-dc:Subject,HTML-dc:Title
effects-edited.htm,"test_relation_1, test_relation_2, test_relation_3",test_id_value_2,"test_subject
_value_1, test_subject_value_2",test_title_value_2
test_id_value_1 and test_title_value_1 don't appear in the CSV output. Is that right or am I missing something?
Email me the sample HTML file and I'll track this down. (philharvey66 at gmail.com)
- Phil
Thanks for the sample.
I'll fix ExifTool to tolerate leading white space in the HTML file as you suggested in your email.
The problem here is that the column headings in the CSV file must be unique. Since you are using -G1, the column headings for both copies of Identifier are identical. Try using -G4:1 instead. Adding the group 4 family name guarantees that all of the tags produce unique headings.
- Phil
Edit: I'll add a note to the documentation to explain this -csv feature.