Method sharing using a comma delimiter in the MySQL field

  • 2020-05-13 03:33:24
  • OfStack

The number of field 1 to be split must be finite and small, and we cannot store an infinite number of characters in a single string
The table to which this field belongs and the table associated with this field, 1 must be a 1 to many relationship
For example, the following table structure represents the content and tag objects

 
mysql> SELECT * FROM content; 
+----+------+| id | tags | +----+------+| 1 | 1,2 | | 2 | 2,3 | +----+------+ 
2 rows in set (0.01 sec) 
mysql> SELECT * FROM tag; 
+----+-------+| id | name | +----+-------+| 1 | php | | 2 | mysql | | 3 | java | +----+-------+ 
3 rows in set (0.00 sec) 

These principles, I believe you are familiar with the development process. But you're probably still a little nervous about using this method to deal with real problems, because it seems like a bit of a wild horse. There is no mention of this design approach in the thick databases textbook, and the standard approach seems to be to use a relational mapping table to plug in between the two tables, even though this would involve inefficient join queries.

Every developer has struggled with standards and efficiency, but I think our efforts will make the use of this approach look more standard. Note that the usage discussed below is limited to mysql, but other databases should be portable.

Correlation search
Many developers still use the old LIKE method for correlation retrieval. For example, in the above database structure, two records in the content table have the tag 2, so how can I display the records related to the tag when I fetch record 1? In fact, this is also a basic problem that CMS needs to face, that is, the query of relevant content.

If you're a newbie, you might only think of the LIKE method, such as first fetching record 1, then splitting the tags field by comma, and finally doing a loop using LIKE to retrieve all records with 2 in the tags field in the content table, and so on
 
SELECT * FROM content WHERE tag LIKE '%2%' AND id <> 1 

But this method is too slow, not to mention the number of queries, LIKE query is a relatively slow method. And you have to deal with the trailing comma problem, which is a lot of trouble.

So let's take a moment to go through the mysql manual and see if there are any surprises. At this point, a function called FIND_IN_SET flashes gold. Let's look at the definition of this function

 
FIND_IN_SET(str,strlist) 
Returns a value in the range of 1 to N if the string str is in the string list strlist consisting of N substrings. A string list is a string composed of substrings separated by  " , "  characters. If the first argument is a constant string and the second is a column of type SET, the FIND_IN_SET() function is optimized to use bit arithmetic. Returns 0 if str is not in strlist or if strlist is the empty string. Returns NULL if either argument is NULL. This function does not work properly if the first argument contains a comma ( " , " ) character. 

Oh, PERFECT! Simply put, finding a function that looks for one string to exist in another comma-separated string is tailor-made for us. So our sql becomes
 
SELECT * FROM content WHERE FIND_IN_SET('2', tags) AND id <> 1 

As you flip through these functions, you should already be well aware that the designers of mysql favor the comma-separated method of storing fields, because there are many methods designed to deal with this problem.

That looks better. The one slice looks perfect. Is that right? Not yet. If you have more tag, you need to create multiple sql statements, and some record more tag or less, so how can you rank them by correlation?

At this point, you can pay attention to mysql's full-text search feature. You must have seen this word many times, but it is rarely used in this way. Let's look at the statement directly
 
SELECT * FROM content WHERE MATCH(tags) AGAINST('1,2') AND id <> 1 

The advantage of this statement is that you do not need to split the tags field again. So how does this query work? If you know a little bit about MATCH AGAINST, the default delimiters for full-text retrieval are punctuation and stopwords, the former of which is exactly what we need. Full-text search splits the strings in MATCH and AGAINST by comma, and then matches them.

It is important to note that sql above is only an example. If you execute it this way, you will not get any results. The reasons are as follows

You need to create an fulltext index on the tags field (if this is just a test, don't do it, indexing just improves performance and has no effect on the results) Each word separated by punctuation must be at least 3 characters long, which is the key. Our tag id is too short and will be automatically ignored. At this time, you can consider starting id from a larger value, such as 1000, so that it will be long enough. You hit stopwords, so your tags field is like 'hello,nobody', nobody is a default stop words of mysql, and it's automatically ignored. stop words is a meaningless word in English, which is not needed when searching, like auxiliary words in Chinese and so on. But it's obviously not used for search in our case, so you can disable it by adding ft_stopword_file= "in my.cnf

With the development of WEB technology, it is less and less that SQL is used for relevant search. In many cases, only a search engine is needed. But the purpose of this article is not just to discuss this approach, but to demonstrate the process of achieving this result.


Related articles: