Some comparison and usage explanation of distinct and group by statements in MySQL

  • 2020-12-16 06:09:33
  • OfStack

In the data table, the books used for user verification are recorded, now I want to get all the books, using DISTINCT and group by, I get the results I want, but I find the results are not arranged in the same way, distinct will be displayed in the data storage order 1, while group by will be sorted (1 is usually ASC).

DISTINCT is actually very similar to the implementation of the GROUP BY operation, except that only one record is pulled from each group after GROUP BY. Therefore, the implementation of DISTINCT is similar to that of GROUP BY. There is not much difference between the implementation of DISTINCT and GROUP BY. It can also be implemented by either loose index scan or compact index scan.

Which is more efficient, DISTINCT or GROUP BY?

The DISTINCT operation just needs to find all the different values. The GROUP BY operation also has to be prepared for other aggregation functions. From this point, the GROUP BY operation should do a little more than DISTINCT does.

But actually, GROUP BY is a little more efficient. Why? For the DISTINCT operation, it reads all the records, and GROUP BY needs to read as many records as the number of grouped groups, which is a lot less than the number of records that actually exist.

Here is a look at some of the usage shares of distinct and group by in MySQL.


CREATE TABLE `student` (          
      `name` varchar(20) NOT NULL DEFAULT '', 
      `age` int(10) DEFAULT '0'        
     ) ENGINE=InnoDB DEFAULT CHARSET=latin1

1. Test 1


select * from student;   



a  5
a  5
c  0

Use distinct to filter out both columns of records that are identical


select distinct name,age from student;

return


a  5
c  0

2. Test 2
Change the data in Table student to the following:


select * from student;

c  2
c  5


select distinct name,age from student;

Returns the following to indicate that distinct has more than one column followed by a field that is filtered only if each column has exactly the same value


c  2
c  5

3. Test 3


select * from student;

select * from student;   
0

group by is grouped in two columns simultaneously


select * from student;   
1

select * from student;   
2

group by is grouped in two columns simultaneously, followed by the conditions of having


select name,age,sum(height) as n from student group by name,age having n > 500;

return
c 2 579

4. Test 4
Tests on group by followed by limit


select songname,sengerid,count(sengerid) as n from t_song group by songname,sengerid having n > 1 ORDER BY n DESC,songid ASC limit 10;


select * from student;   
4


select songname,sengerid,count(sengerid) as n from t_song group by songname,sengerid having n > 1 ORDER BY n DESC,songid ASC limit 5;


select * from student;   
5

After the above two tests, it can be seen that if sql statement contains limit, limit is to group group by and carry out relevant calculation after the limit operation, instead of grouping the specified number of records after limit, it can be seen that the value of the data in the 1 column of n is greater than 10 for each row.

5. Test 5
The same number of records can be obtained by using the following two forms of distinct. The result is 1 sample instead of 1 sample.


select * from student;   
6

6. Test 6
field singername string,max(singername), if some singername columns are empty and some columns are not empty, max(singername) takes a non-empty value, if 1 is zxx and 1 is lady, takes zxx, and gets it smoothly according to the letter.


select feedback_id,songid,songname,max(singername),max(time) as new_time from feedback group by songid order by new_time desc;

7. Order of where,group by,order by and limit in Sql


select * from student;   
7

8. Questions on group by and count
If the sql statement contains group by, it is best not to convert count sql to select count(*) from xxx, otherwise the fields between select and from are likely to be used later, for example


select feedback_id,songid,songname,max(singername),max(time) as new_time from feedback group by songid order by new_time desc;


MySQL Query Error: SELECT COUNT(*) FROM feedback GROUP BY songid ORDER BY new_time DESC Error Info:Unknown column 'new_time' in 'order clause'


Related articles: