Use Python to find the sample code that appears most frequently in the sequence

  • 2020-06-15 09:24:58
  • OfStack

preface

Python contains six built-in sequences: list, tuple, string, Unicode string, buffer object, xrange object. Each element in the sequence has its own number. The difference between a list and a tuple is that a list is modifiable and a tuple is not. In theory tuples can be replaced by lists in almost any case. The exception is when tuples are used as keys for dictionaries, in which case lists cannot be used because keys cannot be modified.

In the process of some statistical work or analysis, we sometimes encounter the element that appears most frequently in a sequence, such as what word appears most in the first paragraph of English, and the occurrence times of each word. The one-pass approach is to treat each of these as key, appearing once and increasing value by 1.

Such as:


morewords = ['why','are','you','not','looking','in','my','eyes']
for word in morewords:
 word_counts[word] += 1

collections.Counter Class is designed specifically for this type of problem, and it even has a useful one most_common() The method gives you the answer directly.

collections module

The collections module has been introduced since version 2.4 of Python and contains 1 special container type other than dict, set, list and tuple, which are:

OrderedDict class: Sorting dictionary, which is a subclass of dictionary. Introduced from 2.7. namedtuple() function: Named tuple, which is a factory function. Introduced from 2.6. Counter class: Counts hashable objects and is a subclass of dictionaries. Introduced from 2.7. deque: Two-way queue. Introduced from 2.4. defaultdict: Use the factory function to create the dictionary so that missing dictionary keys are not taken into account. Introduced from 2.5.

Document see: http: / / docs python. org / 2 / library collections. html.

Counter class

The purpose of the Counter class is to track the number of times a value appears. It is an unordered container type, stored as a dictionary key-value pair, with the element as key and the count as value. The count value can be any Interger (including 0 and negative Numbers). The Counter class is similar to bags or multisets for other languages.

To illustrate, suppose you have a list of words and want to find out which word appears most frequently. Here's what you can do:


words = [
 'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',
 'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around', 'the',
 'eyes', "don't", 'look', 'around', 'the', 'eyes', 'look', 'into',
 'my', 'eyes', "you're", 'under'
]
from collections import Counter
word_counts = Counter(words)
#  The most frequent 3 A word 
top_three = word_counts.most_common(3)
print(top_three)
# Outputs [('eyes', 8), ('the', 5), ('look', 4)]

In addition collections.Counter A more advanced feature supports adding and subtracting mathematical arithmetics.


>>> a = Counter(words)
>>> b = Counter(morewords)
>>> a
Counter({'eyes': 8, 'the': 5, 'look': 4, 'into': 3, 'my': 3, 'around': 2,
"you're": 1, "don't": 1, 'under': 1, 'not': 1})
>>> b
Counter({'eyes': 1, 'looking': 1, 'are': 1, 'in': 1, 'not': 1, 'you': 1,
'my': 1, 'why': 1})
>>> # Combine counts
>>> c = a + b
>>> c
Counter({'eyes': 9, 'the': 5, 'look': 4, 'my': 4, 'into': 3, 'not': 2,
'around': 2, "you're": 1, "don't": 1, 'in': 1, 'why': 1,
'looking': 1, 'are': 1, 'under': 1, 'you': 1})
>>> # Subtract counts
>>> d = a - b
>>> d
Counter({'eyes': 7, 'the': 5, 'look': 4, 'into': 3, 'my': 2, 'around': 2,
"you're": 1, "don't": 1, 'under': 1})
>>>

Reference document:

https://docs.python.org/3/library/collections.html

conclusion


Related articles: