Python's way of finding similar words

  • 2020-04-02 14:34:54
  • OfStack

This example shows how Python can find similar words. Share with you for your reference. Specific analysis is as follows:

Question:

Give you a word a, and if you can get another word b by exchanging the order of the letters in the word, define b as a sibling of a. Now you are given a dictionary, the user enters a word, and you are asked to find out from the dictionary how many sibling words the word has.

The Python code is as follows:


from itertools import tee,izip
from collections import defaultdict
def pairwise(iterable):
  a, b = tee(iterable)
  for elem in b:
    break
  return izip(a, b)
buf_array=[]
buf_no={}
key_from_id=0
def add_to_buf(word):
  global key_from_id,buf_array
  if len(word)==1:
    pass
    #TODO
  for pos,pair in enumerate(pairwise(word)):
    if len(buf_array)<pos+1:
      buf_array.append(defaultdict(set))
    pos_dict=buf_array[pos]
    key=list(pair)
    key.sort()
    key="".join(key)
    if key not in buf_no:
      buf_no[key]=key_from_id
      key_from_id+=1
    key=buf_no[key]
    pos_dict[key].add(word)
def find_in_buf(word):
  global key_from_id,buf_array
  if len(word)==1:
    pass
    #TODO
  exist = []
  for pos,pair in enumerate(pairwise(word)):
    if len(buf_array)<pos+1:
      return  
    pos_dict=buf_array[pos]
    key=list(pair)
    key.sort()
    key="".join(key)
    if key not in buf_no:
      continue
    key=buf_no[key]
    if key not in pos_dict:
      continue
    exist.append(pos_dict[key])
  count_dict=defaultdict(int)
  for i_set in exist:
    for i in i_set:
      count_dict[i]+=1
  result=[]
  min_match = len(word)-3
  for k,v in count_dict.iteritems():
    if v>=min_match:
      result.append(k)
  return result
add_to_buf("1234")
add_to_buf("ABCD")
add_to_buf("CABD")
print find_in_buf("ACBD")

I hope this article has helped you with your Python programming.


Related articles: