pandas Method for Retrieving Duplicate Data
- 2021-07-09 08:42:47
- OfStack
drop_duplicates provides us with a method to deduplicate data, so how can we get which data are duplicated?
Implementation steps:
1. drop_duplicates is used to deduplicate the data twice, one time to remove all duplicate data (keep=False) is recorded as data1, and the other time to retain one duplicate data (keep= 'first) is recorded as data2;
2. Find the difference set of data1 and data2: data2.append (data1). drop_duplicates (keep=False)