Oracle Step Method for Finding and Deleting Duplicate Records in Tables

  • 2021-07-24 11:55:08
  • OfStack

At this time, if there is duplicate data in the temporary table, whether the primary key field businessid is duplicated or the whole row of 1 is duplicated, the error of violating the only 1 primary key constraint will be reported.

Methods: group by XX having count (*) > 1,rowid,distinct,temporary table,procedure

1. Query duplicate data in a table
a. Repeat 1 field

b. Repeating multiple fields

c. Repeat 1 full line

Create a test table:


create table cfa (businessid number,customer varchar2(50),branchcode varchar2(10),data_date varchar2(10));
insert into cfa values (1,'Albert','SCB','2011-11-11');
insert into cfa values (2,'Andy','DB','2011-11-12');
insert into cfa values (3,'Allen','HSBC','2011-11-13');
--------------- The following is duplicate data ----------------------------------------------
insert into cfa values (1,'Alex','ICBC','2011-11-14');
insert into cfa values (1,'Albert','CTBK','2011-11-15');
insert into cfa values (1,'Albert','SCB','2011-11-11');

In the case of a, only businessid is duplicated


select * from cfa where businessid in (select businessid from cfa group by businessid having count(businessid)>1);

In the case of b, businessid and name are duplicated at the same time


select * from cfa where (businessid,customer) in (select businessid,customer from cfa group by businessid,customer having count(*)>1);

For the case of c, repeat 1 full line

Refer to the method of b:


select * from cfa where (businessid,customer,branchcode,data_date) in (select * from cfa group by businessid,customer,branchcode,data_date having count(*)>1);

2. Delete duplicate data in the table
In case of a, the redundant duplicate records in the table are deleted. The duplicate records are judged according to a single field (businessid), and only the records with the smallest rowid are left

It is also possible to keep only rowid not the minimum record, and it is necessary to change min in the code to max, which will not be repeated here.


delete from cfa
where businessid in (select businessid
from cfa
group by businessid
having count(businessid) > 1)
and rowid not in (select min(rowid)
from cfa
group by businessid
having count(businessid) > 1);

Or, use the following simpler and more efficient statement


DELETE FROM cfa t
WHERE t.ROWID >
(SELECT MIN(X.ROWID) FROM cfa X WHERE X.businessid = t.businessid);

In the case of b, the redundant duplicate records (multiple fields) in the table are deleted, and only the records with the smallest rowid are left


delete from cfa
where (businessid,customer) in (select businessid,customer
from cfa
group by businessid,customer
having count(*) > 1)
and rowid not in (select min(rowid)
from cfa
group by businessid,customer
having count(*) > 1);

Or, use the following simpler and more efficient statement


DELETE FROM cfa t
WHERE t.ROWID > (SELECT MIN(X.ROWID)
FROM cfa X
WHERE X.businessid = t.businessid
and x.customer = t.customer);

c, which is simpler, using the temporary table method


create table cfabak as select distinct * from cfa;
truncate table cfa;-- If it is a production, it is best to use this table backup
Insert into cfa select * from cfabak;
commit;


Related articles: