leading spaces

Started by simonmcnair, June 26, 2023, 09:49:17 AM

Previous topic - Next topic

simonmcnair

I'm trying to automate tag addition using ML for Digikam, but I'm having a nightmare.  No matter what I do I end up with a leading space or two.
I've tried, in python,

ret = subprocess.check_output(['exiftool','-P','-overwrite_original', '-api', '"Filter=s/^ +//"','-TagsFromFile','@','-subject','-XMP:subject','-IPTC:Keywords','-XMP:CatalogSets','-XMP:TagsList',img_path])
but I still end up with some tags having ' this is a bed' vs 'this is a bed'

I would appreciate any help.  I tried using -sep ", " and -sep "," but tbh it doesn't seem to do what I thought it would, where it would alter the separator in the output.  #soconfused.

Phil Harvey

Can you provide a console log that allows us to reproduce the problem?  I get this:

> exiftool a.jpg -subject=" test1" -subject=" test2"
    1 image files updated
> exiftool b.jpg -tagsfromfile a.jpg -subject -api "Filter=s/^ +//"
    1 image files updated
> exiftool a.jpg b.jpg -subject
======== a.jpg
Subject                         :  test1,  test2
======== b.jpg
Subject                         : test1, test2
    2 image files read

(I'm copying to a different file above so I can run multiple tests.)

- Phil
...where DIR is the name of a directory/folder containing the images.  On Mac/Linux/PowerShell, use single quotes (') instead of double quotes (") around arguments containing a dollar sign ($).

StarGeek

What are you using to view the tags?  By default, exiftool will separate the tags with a comma(space) on the command line, even though they are completely separate.  Other programs often do something similar.

If you use exiftool to look at the raw XMP with
exifool -b -xmp /path/to/files/

You can see what the exact values are
  <dc:subject>
  <rdf:Bag>
    <rdf:li>tag 1</rdf:li>
    <rdf:li>tag 2</rdf:li>
  </rdf:Bag>
  </dc:subject>
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

simonmcnair

Thanks for the replies.  I will try and diagnose further.  Cheers

simonmcnair

I don't know how best to explain, I'm using digicam to diagnose the issue, then I tried using exiftool to get rid of the leading spaces, but despite running it multiple times, the leading space is not getting removed.

This is made twice as hard by the fact that I cannot get exiftool to output the comma separated values without it adding spaces.
I run the command
existing_tags = subprocess.check_output(['exiftool', '-XMP:Subject', '-IPTC:Keywords', '-XMP:CatalogSets', '-XMP:TagsList', img_path]).decode().strip()
and get the output

Subject                        : tag1,  tag2,  tag3,  tag4, 
Keywords                        : tag1,  tag2,  tag3,  tag4, 
Catalog Sets                    : tag1,  tag2,  tag3,  tag4, 
Tags List                      : tag1,  tag2,  tag3,  tag4, 


but the output is incorrect as I don't know how many spaces are before 'tag', and I can't massage the output without refactoring the code.

I tried working around it by just executing the remove spaces regex but the leading space(s) are not getting removed.

The somewhat messy and uncommented and poorly written python is here if you want to see

https://github.com/simonmcnair/SDTagging/blob/main/tag.py

StarGeek

Is the output the same when you do it on the command line?

Can you share an example image?

Your output places a comma at the end of each line, which exiftool wouldn't do unless the comma was part of the tag or there is an empty tag at the end.

Your code seems to indicate that you will be running exiftool once per image (see Common Mistake #3).  You might instead look to using PyExifTool, which is a wrapper that keeps exiftool running in the background using the -stay_open option and will improve processing time.
* Did you read FAQ #3 and use the command listed there?
* Please use the Code button for exiftool code/output.
 
* Please include your OS, Exiftool version, and type of file you're processing (MP4, JPG, etc).

simonmcnair

Quote from: StarGeek on June 27, 2023, 03:39:47 PMIs the output the same when you do it on the command line?

Can you share an example image?

Your output places a comma at the end of each line, which exiftool wouldn't do unless the comma was part of the tag or there is an empty tag at the end.

Your code seems to indicate that you will be running exiftool once per image (see Common Mistake #3).  You might instead look to using PyExifTool, which is a wrapper that keeps exiftool running in the background using the -stay_open option and will improve processing time.

I'll have a go at refactoring it, I did try pyexiftool in previous iterations of it, and I can't honestly remember why I went back to doing it the old way.

Hopefully that will show me the tags more clearly, and I can work out the issue.  Cheers

simonmcnair

So the filter does work beautifully, it's just parsing the data in Python that caused me heartache.  I had faith in exiftool, but it can be hard to work with spaces when I couldn't get the data out of exiftool clearly.

I've been refactoring my code to use pyexiftool, my only issue appears to be that I can't incrementally build a command line, like I did before unless I use the exiftool function rather than exiftoolhelper.

By the time I use exiftool in python it is virtually the same as running it from subprocess , perhaps, maybe, possibly.