Efficient query method for duplicate fields in MySQL large table

  • 2020-05-14 05:12:11
  • OfStack

How should the duplicate fields of MySQL large table be queried? This is a problem that many people encounter. Here is a query method for repeated fields in a large MySQL table for your reference.

There is a large table in the database where you need to look for names with duplicate records id for comparison purposes.
It's easy to just look for fields in the database that name does not duplicate


SELECT min(`id`),`name` 
FROM `table` 
GROUP BY `name`; 

But this does not give you the id value that says there are duplicate fields. (only 1 minimum id)
It is also easy to query which fields are duplicate

SELECT `name`,count(`name`) as count 
FROM `table` 
GROUP BY `name` HAVING count(`name`) >1 
ORDER BY count DESC; 

However, if you want to query the id value of the repeated field once, you must use the subquery, so the following statement is used to realize the repeated field query of the MySQL large table.

SELECT `id`,`name` 
FROM `table` 
WHERE `name` in ( 
SELECT `name` 
FROM `table` 
GROUP BY `name` HAVING count(`name`) >1 
); 

But this statement is so inefficient in mysql that it feels like mysql is not generating temporary tables for subqueries.
This is used to create a temporary table first

create table `tmptable` as ( 
SELECT `name` 
FROM `table` 
GROUP BY `name` HAVING count(`name`) >1 
); 

The query is then joined using multiple tables

SELECT a.`id`, a.`name` 
FROM `table` a, `tmptable` t 
WHERE a.`name` = t.`name`; 

The results came out quickly this time.

Repeat with distinct

SELECT distinct a.`id`, a.`name` 
FROM `table` a, `tmptable` t 
WHERE a.`name` = t.`name`; 


Related articles: