Hi guys =) Ihave 1000 files with some description, and it contained some repeated words in (xpsubject as i think), like this:
Cartoon horse on white background. Cartoon horse vector illustration. Cartoon cute horse farm animals happy mane stallion character design.
I need to delete (just cut) all duplicates of cartoon horse, but first words must be saved. The final result i need is:
Cartoon horse on white background. vector illustration. cute farm animals happy mane stallion character design.
Any ideas?
What algorithm would you use to decide which words to remove? I could imagine a way to remove all duplicate words, but then you would have words like "the" removed too, which you may want duplicates of.
- Phil
Quote from: Phil Harvey on November 05, 2016, 10:20:33 AM
What algorithm would you use to decide which words to remove? I could imagine a way to remove all duplicate words, but then you would have words like "the" removed too, which you may want duplicates of.
- Phil
Hi, Phil. The, is, are and other it's no matter.
The main goal is removing duplicated words, excluding first.First of all , i am not a programmer, i am begginer :) So, as i understand, i need some function like this java, but for exiftool
public class FindDuplicateWordsInText {
public static Set<String> findDuplicateWordsInText(String text) {
String[] words = text.split(" ");
Set<String> duplicatesRemovedSet = new HashSet<>();
Set<String> duplicatesSet = Arrays.stream(words).filter(string -> !duplicatesRemovedSet.add(string))
.collect(Collectors.toSet());
return duplicatesSet;
}
}
Another variant is using tempory files with export metadate, but it's still difficult for me :o
Hope exiftool have some function
And thanks for your great product, i see many guys use ExifTool 👍 :)
Another decisions of the problem is cut all symbols after first "." symbols
Was
Cartoon horse on white background. Cartoon horse vector illustration. Cartoon cute horse farm animals happy mane stallion character design.
Need
Cartoon horse on white background
This variant is good too :D
Cutting everything from after the first "." is easy:
exiftool "-imagedescription<${imagedescription;s/\..*//}" DIR
Removing duplicates is trickier:
exiftool "-imagedescription<${imagedescription;my (@a,%h);$h{lc $_} or push(@a,$_),$h{lc $_}=1 foreach split;$_=join ' ',@a}" DIR
- Phil
Thank you. Phil!
First variant is working, second is not - no file specifed
As i understand, you have your own function mean my :)
exiftool "-imagedescription<${imagedescription;my (@a,%h);$h{lc $_} or push(@a,$_),$h{lc $_}=1 foreach split;$_=join ' ',@a}" DIR
Anyway i try to understand it, but i am still slowpok 8)
Quote from: Pantik on November 05, 2016, 04:55:38 PM
As i understand, you have your own function mean my :)
For this context, my is declaring array
@a and hash
%h. Every thing from the my to the closing brace is perl commands.
Quotesecond is not - no file specifed
Are you sure you remembered to replace
DIR with the file or directory? Or did you make sure to copy the quotes correctly? This error indicates that a file to process was not included.
Quote from: StarGeek on November 05, 2016, 05:35:28 PM
Are you sure you remembered to replace DIR with the file or directory? Or did you make sure to copy the quotes correctly? This error indicates that a file to process was not included.
Yes, my code is
exiftool "-XPSubject<${XPSubject; my(@a,%h); $h{lc $_} or push(@a,$_),$h{lc $_}=1 foreach split; $_=join ' ', @a}" C:\-\ -overwrite_original -r -k
In first case it works with it's directory (remove befor ".")
ps Thank you for explaining, i'll try to learn it! :D