pandas Method for Retrieving Duplicate Data

  • 2021-07-09 08:42:47
  • OfStack

drop_duplicates provides us with a method to deduplicate data, so how can we get which data are duplicated?

Implementation steps:

1. drop_duplicates is used to deduplicate the data twice, one time to remove all duplicate data (keep=False) is recorded as data1, and the other time to retain one duplicate data (keep= 'first) is recorded as data2;

2. Find the difference set of data1 and data2: data2.append (data1). drop_duplicates (keep=False)


Related articles: