Fuzzy matching of words with phrases with ruby -


i want match bunch of data short number of services

my data this

{"title" : "blorb", "category" : "zurb" "description" : "massage manipulation of superficial , deeper layers of muscle , connective tissue using various techniques, enhance function, aid in healing process, decrease muscle reflex activity..." } 

and have match

["swedish massage", "haircut"]

clearly "swedish massage" winner, running benchmark shows "haircut" is:

require 'amatch'  arr = [:levenshtein_similar, :hamming_similar, :pair_distance_similar, :longest_subsequence_similar, :longest_substring_similar, :jaro_similar, :jarowinkler_similar]  arr.each |method|   ["swedish massage", "haircut"].each |sh|     pp ">>> #{sh} matched #{method.to_s}"     pp sh.send(method, description)   end end , nil 

result:

">>> swedish massage matched jaro_similar" # 0.5246896118183247 ">>> haircut matched jaro_similar" # 0.5353606789250354 ">>> swedish massage matched jarowinkler_similar" # 0.5246896118183247 ">>> haircut matched jarowinkler_similar" # 0.5353606789250354 

the rest of indices below 0.1

what better approach solving problem?

search constant battle between precision , recall. 1 thing try splitting input words - result in stronger match on massage consequence of broadening out result set. find sentences returned words close swedish. try control broadening averaging results multiple words, using stop lists avoid common words and, boosts finding tokens close each other etc, never see perfect results. if you're interested in fine tuning recommend elasticsearch - relatively easy learn , powerful.


Comments

Popular posts from this blog

html - Sizing a high-res image (~8MB) to display entirely in a small div (circular, diameter 100px) -

java - IntelliJ - No such instance method -

identifier - Is it possible for an html5 document to have two ids? -