MySQL's approach to dealing with duplicate data

  • 2020-12-10 00:53:44
  • OfStack

There may be duplicate records in some MySQL tables, and in some cases we allow duplicate records to exist, but sometimes we also need to delete the duplicate data.
In this section we will show you how to prevent and delete duplicates in a data table.
Prevent duplicate data from appearing in the table

You can set the specified field in the MySQL data table to PRIMARY KEY (primary key) or UNIQUE (only 1) index to ensure that the data is unique.
Let's try an example: The following table has no indexes or primary keys, so it allows multiple duplicates.


CREATE TABLE person_tbl
(
  first_name CHAR(20),
  last_name CHAR(20),
  sex CHAR(10)
);

If you want to set the first_name field in the table, last_name data cannot be repeated, you can set the double primary key mode to make the data unique. If you set the double primary key, then the default value of that key cannot be NULL, but NOT NULL. As shown below:


CREATE TABLE person_tbl
(
  first_name CHAR(20) NOT NULL,
  last_name CHAR(20) NOT NULL,
  sex CHAR(10),
  PRIMARY KEY (last_name, first_name)
);

If we set the only index, the SQL statement will fail to execute successfully and throw an error when inserting duplicate data.
The difference between INSERT IGNORE INTO and INSERT INTO is that INSERT IGNORE ignores data already in the database, inserts new data if there is no data in the database, and skips the data if there is. This allows you to keep the data that already exists in the database and insert the data into the gap.
The following example uses INSERT IGNORE INTO and does not make an error or insert duplicate data into the data table:


mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
  -> VALUES( 'Jay', 'Thomas');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
  -> VALUES( 'Jay', 'Thomas');
Query OK, 0 rows affected (0.00 sec)

When inserting data, after setting the uniqueness of the record, if duplicates are inserted, no error is returned and only a warning is returned. REPLACE INTO into If the same record of primary or unique exists, it is deleted first. Insert the new record again.
Another unique way to set the data is to add an UNIQUE index, as shown below:


CREATE TABLE person_tbl
(
  first_name CHAR(20) NOT NULL,
  last_name CHAR(20) NOT NULL,
  sex CHAR(10)
  UNIQUE (last_name, first_name)
);

Statistical duplication

Here we will count the duplicate records of first_name and last_name in the table:


mysql> SELECT COUNT(*) as repetitions, last_name, first_name
  -> FROM person_tbl
  -> GROUP BY last_name, first_name
  -> HAVING repetitions > 1;

The above query returns the number of duplicate records in the person_tbl table. 1 In general, to query for duplicate values, do the following:
Determine which column contains duplicate values.
Use the columns listed by COUNT(*) in the column selection list.
The columns listed in the GROUP BY clause.
The HAVING clause sets the number of repetitions to be greater than 1.
Filter duplicate data

If you need to read non-duplicate data, use the DISTINCT keyword in the SELECT statement to filter duplicate data.


mysql> SELECT DISTINCT last_name, first_name
  -> FROM person_tbl
  -> ORDER BY last_name;

You can also use GROUP BY to read non-duplicate data in the table:


mysql> SELECT last_name, first_name
  -> FROM person_tbl
  -> GROUP BY (last_name, first_name);

Delete duplicate data

If you want to delete duplicate data from the table, you can use the following SQL statement:


mysql> CREATE TABLE tmp SELECT last_name, first_name, sex
  ->         FROM person_tbl;
  ->         GROUP BY (last_name, first_name);
mysql> DROP TABLE person_tbl;
mysql> ALTER TABLE tmp RENAME TO person_tbl;

You can also simply add INDEX (index) and PRIMAY KEY (primary key) to the table to remove duplicates. The methods are as follows:


mysql> ALTER IGNORE TABLE person_tbl
  -> ADD PRIMARY KEY (last_name, first_name);

Above is MySQL processing duplicate data related material, hoped to be helpful to your study.


Related articles: