Easy Tutorial
❮ Mysql Clone Tables Mysql Null ❯

MySQL Handling Duplicate Data

Some MySQL tables may contain duplicate records. In some cases, we allow the existence of duplicate data, but there are times when we need to remove these duplicates.

This section will introduce how to prevent duplicate data from appearing in tables and how to delete duplicate data from tables.


Preventing Duplicate Data in Tables

Let's try an example: The following table has no indexes or primary keys, so it allows multiple duplicate records.

CREATE TABLE person_tbl
(
    first_name CHAR(20),
    last_name CHAR(20),
    sex CHAR(10)
);

If you want to set the first_name and last_name fields to be unique, you can set a dual primary key to ensure data uniqueness. If you set a dual primary key, the default value for that key cannot be NULL and should be set to NOT NULL, as shown below:

CREATE TABLE person_tbl
(
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   PRIMARY KEY (last_name, first_name)
);

If we set a unique index, the SQL statement will fail to execute and throw an error when inserting duplicate data.

The difference between INSERT IGNORE INTO and INSERT INTO is that INSERT IGNORE INTO will ignore existing data in the database. If the database has no data, it will insert new data; if there is data, it will skip that data. This way, existing data in the database is preserved, achieving the purpose of inserting data in gaps.

The following example uses INSERT IGNORE INTO, and after execution, no error will occur, nor will duplicate data be inserted into the table:

mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
    -> VALUES( 'Jay', 'Thomas');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT IGNORE INTO person_tbl (last_name, first_name)
    -> VALUES( 'Jay', 'Thomas');
Query OK, 0 rows affected (0.00 sec)

INSERT IGNORE INTO will not return an error but only a warning when inserting data after setting the uniqueness of the records. REPLACE INTO will delete the existing record with the same primary or unique key and then insert the new record.

Another method to ensure data uniqueness is to add a UNIQUE index, as shown below:

CREATE TABLE person_tbl
(
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   UNIQUE (last_name, first_name)
);

Counting Duplicate Data

The following query will count the number of duplicate records for first_name and last_name:

mysql> SELECT COUNT(*) as repetitions, last_name, first_name
    -> FROM person_tbl
    -> GROUP BY last_name, first_name
    -> HAVING repetitions > 1;

This query will return the number of duplicate records in the person_tbl table. Generally, to query duplicate values, follow these steps:


Filtering Duplicate Data

If you need to read non-duplicate data, you can use the DISTINCT keyword in the SELECT statement to filter out duplicate data.

mysql> SELECT DISTINCT last_name, first_name
    -> FROM person_tbl;

You can also use GROUP BY to read non-duplicate data from the table:

mysql> SELECT last_name, first_name
    -> FROM person_tbl
    -> GROUP BY (last_name, first_name);

Deleting Duplicate Data

If you want to delete duplicate data from the table, you can use the following SQL statement:

mysql> CREATE TABLE tmp SELECT last_name, first_name, sex FROM person_tbl  GROUP BY (last_name, first_name, sex);
mysql> DROP TABLE person_tbl;
mysql> ALTER TABLE tmp RENAME TO person_tbl;

Alternatively, you can add an INDEX and PRIMARY KEY to the table to delete duplicate records. Here is how:

mysql> ALTER IGNORE TABLE person_tbl
    -> ADD PRIMARY KEY (last_name, first_name);
❮ Mysql Clone Tables Mysql Null ❯