ExifTool Forum

ExifTool => Bug Reports / Feature Requests => Topic started by: DNichols on May 16, 2013, 07:29:50 PM

Title: Different output to CSV
Post by: DNichols on May 16, 2013, 07:29:50 PM

extracting meta elements from HTML on the command line (version: 9.28, Windows 7):

exiftool.exe    -HTML:HTML-dc:All -a -G1 -s effects-edited.htm

extracts all this (which is correct):

[HTML-dc]       Relation                        : test_relation_1, test_relation_2, test_relation_3
[HTML-dc]       Identifier                      : test_id_value_1
[HTML-dc]       Identifier                      : test_id_value_2
[HTML-dc]       Subject                         : test_subject_value_1, test_subject_value_2
[HTML-dc]       Title                           : test_title_value_1
[HTML-dc]       Title                           : test_title_value_2


Adding the -csv output option:

exiftool.exe    -HTML:HTML-dc:All -a -G1 -s -csv effects-edited.htm

SourceFile,HTML-dc:Relation,HTML-dc:Identifier,HTML-dc:Subject,HTML-dc:Title
effects-edited.htm,"test_relation_1, test_relation_2, test_relation_3",test_id_value_2,"test_subject
_value_1, test_subject_value_2",test_title_value_2



test_id_value_1 and test_title_value_1 don't appear in the CSV output. Is that right or am I missing something?
Title: Re: Different output to CSV
Post by: Phil Harvey on May 17, 2013, 07:13:52 AM
Email me the sample HTML file and I'll track this down.  (philharvey66 at gmail.com)

- Phil
Title: Re: Different output to CSV
Post by: Phil Harvey on May 19, 2013, 08:03:39 AM
Thanks for the sample.

I'll fix ExifTool to tolerate leading white space in the HTML file as you suggested in your email.

The problem here is that the column headings in the CSV file must be unique.  Since you are using -G1, the column headings for both copies of Identifier are identical.  Try using -G4:1 instead.  Adding the group 4 family name guarantees that all of the tags produce unique headings.

- Phil

Edit:  I'll add a note to the documentation to explain this -csv feature.