A method to solve the problem of fuzzy retrieval in MySQL database

  • 2020-05-06 11:45:44
  • OfStack

Under   MySQL, when conducting Chinese fuzzy retrieval, it often returns some irrelevant records. For example, when searching   "%a%"  , it may return Chinese characters, but no a characters. I have encountered similar problems before. After reading Manual of MySQL in detail, I found that there is a way to solve them conveniently and get satisfactory results.

You want to search the news library by "title". The keywords may be in English and Chinese, as shown in the SQL statement:

The following is a quote:



Code:
select   id,title,name   from   achech_com.news   where   title   like   '%a%'


Some of the title fields are identified with the "a" keyword, while others are only in Chinese, but are also returned in the search results.

The solution is to use the   BINARY   property for retrieval, such as

The following is a quote:



Code:
select   id,title,name   from   achech_com.news   where   binary   title   like   '%a%'


The result is more correct than before, but the English letters are case sensitive, so sometimes the result is different when retrieving "Achech" and "achech". Know the use   BINARY   attributes can solve this problem, in front of the look at   MySQL   support UCASE   and   CONCAT   functions, including   UCASE   is all in English to uppercase, and CONCAT function is used to connect of characters, the following is our completely resolved SQL   statement:



Code:
select   id,title,name   from   achech_com.news

where     ucase(title)   like   concat('%',ucase('a'),'%')

The retrieval step is to specify the attribute as   BINARY   in order to accurately retrieve the result. However, the content of   title     like     title may contain upper and lower case letters. Therefore,   ucase   function is used to convert all the field contents into uppercase letters, and then   like   operation is performed. While the operation of   like   USES the fuzzy method, the advantage of using   concat is that the keyword passed in can be direct, do not need to take "%" universal character, "a'" directly into your variable, in any language everything is free.   of course you could also write



Code:
select   id,title,name   from   achech_com.news   where   binary   ucase(title)   like   ucase('%a%')


The result of the retrieval is satisfactory, but the speed may be reduced by N milliseconds.

Related articles: