Fuzzy matching of words with phrases with ruby -
i want match bunch of data short number of services
my data this
{"title" : "blorb", "category" : "zurb" "description" : "massage manipulation of superficial , deeper layers of muscle , connective tissue using various techniques, enhance function, aid in healing process, decrease muscle reflex activity..." }
and have match
["swedish massage", "haircut"]
clearly "swedish massage"
winner, running benchmark shows "haircut"
is:
require 'amatch' arr = [:levenshtein_similar, :hamming_similar, :pair_distance_similar, :longest_subsequence_similar, :longest_substring_similar, :jaro_similar, :jarowinkler_similar] arr.each |method| ["swedish massage", "haircut"].each |sh| pp ">>> #{sh} matched #{method.to_s}" pp sh.send(method, description) end end , nil
result:
">>> swedish massage matched jaro_similar" # 0.5246896118183247 ">>> haircut matched jaro_similar" # 0.5353606789250354 ">>> swedish massage matched jarowinkler_similar" # 0.5246896118183247 ">>> haircut matched jarowinkler_similar" # 0.5353606789250354
the rest of indices below 0.1
what better approach solving problem?
search constant battle between precision , recall. 1 thing try splitting input words - result in stronger match on massage
consequence of broadening out result set. find sentences returned words close swedish
. try control broadening averaging results multiple words, using stop lists avoid common words and
, boosts finding tokens close each other etc, never see perfect results. if you're interested in fine tuning recommend elasticsearch - relatively easy learn , powerful.
Comments
Post a Comment