Python counts the number of repeated lines of text

  • 2020-04-02 14:21:11
  • OfStack

This example shows how python counts the number of duplicate lines of text. Share with you for your reference. The specific implementation method is as follows:

Let's say I have the following file
2
3
1
2
We expect to get
2, 2
3, 1
1, 1

Ideas to solve the problem:

The text appears as key, the number appears as value, and then output according to value
It is better to output from large to small according to the value. You can refer to:

in recent Python 2.7, we have new OrderedDict type, which remembers the order in which the items were added.
>>> d = {"third": 3, "first": 1, "fourth": 4, "second": 2}
>>> for k, v in d.items():
...     print "%s: %s" % (k, v)
...
second: 2
fourth: 4
third: 3
first: 1
>>> d
{'second': 2, 'fourth': 4, 'third': 3, 'first': 1}To make a new ordered dictionary from the original, sorting by the values:
>>> from collections import OrderedDict
>>> d_sorted_by_value = OrderedDict(sorted(d.items(), key=lambda x: x[1]))The OrderedDict behaves like a normal dict:
>>> for k, v in d_sorted_by_value.items():
...     print "%s: %s" % (k, v)
...
first: 1
second: 2
third: 3
fourth: 4
>>> d_sorted_by_value
OrderedDict([('first': 1), ('second': 2), ('third': 3), ('fourth': 4)])

The code is as follows:
#coding=utf-8
import operator
f = open("f.txt")
count_dict = {}
for line in f.readlines():
    line = line.strip()
    count = count_dict.setdefault(line, 0)
    count += 1
    count_dict[line] = count
sorted_count_dict = sorted(count_dict.iteritems(), key=operator.itemgetter(1), reverse=True)
for item in sorted_count_dict:
    print "%s,%d" % (item[0], item[1])

Additional notes:
1. Two methods of python dict object:

The items method returns all the dictionary items as a list, each of which is derived from (key, value)
The iteritems method does much the same thing as items, but returns an iterator object instead of a list

2. Python's built-in function sorted

>>> help(sorted)
Help on built-in function sorted in module __builtin__:
sorted(...)
    sorted(iterable, cmp=None, key=None, reverse=False) --> new sorted list

I hope this article has helped you with your Python programming.


Related articles: