javascript regular expression definition (syntax) summary

  • 2020-11-20 05:59:26
  • OfStack

This article covers the javascript regular expression definition (syntax). To share for your reference, the details are as follows:

There are two ways to define regular expressions: one is to call RegExp() directly, and the other is to define it literally, i.e. var re = / regular rule /;

Both definitions essentially call the RegExp() method

ECMAScript3 and ECMAScript5 look completely different when the same piece of regular code is called


function reg(){
 var re = /\sjavascript/;
 return re;
}

The reg() method is called multiple times in ECMAScript3 and ECMAScript5, respectively

In ECMAScript3, the same RegExp object is called; in ECMAScript5, a different RegExp object is called because every time in EXCMAScript5 is executed, a new RegExp object is generated

So in ECMAScript3 this is a program hazard because if you modify this object in one place, all the calls to this object will change.

1. Direct character measurement

1 usually matches the character directly in the regex, for example

/javascript/
The character javascript matches directly

Non-alphabetic character matching is also supported, such as:

\o NUL characters (\u0000)

\t TAB (\u0009)

\n line break (\u000A)

\v Vertical TAB (\u000B)

\f Page break (\u000C)

\r carriage return (\u000D)

\xnn The Latin character specified by the decimal number nn, for example, \x0A is equivalent to \n

\uxxxx The Unicode character specified by the hexadecimal number xxxx, for example \u0009 is equivalent to \t

\cX control character ^X, for example, \cJ is equivalent to a newline \n

In regular expressions, there are also 1 punctuation mark with a special meaning that needs to be escaped by '\'

^$.*+?=!:|\/()[]{}

2. The character class

[...]. Any character in square brackets

[^...]. Any character that is not in square brackets

.Arbitrary character

\ ASCII Any word consisting of ASCII characters, equivalent to [ES108en-ES109en-Z0-9]

\W Any word that does not fit the ASCII character, equivalent to [^ ES115en-ES116en-Z0-9]

\s Any Unicode whitespace

\S Any characters that are not Unicode whitespace, note \w and \S are not the same

\d Any value of ASCII, equivalent to [0-9]

\D Any character other than the ASCII number, equivalent to [^0-9]

[\b] Backspace direct quantities (special case)

3. Repeat (number of times)

The & # 63; 0 or 1

+ 1 or more times

* any number of times

{n} n times

{m,n} minimum m, maximum n

{n,} n or above

By default, the regex matches greed

If [a+b+] wants to match aaabb, it will not match ab and aab and so on, it will only match aaabb

[a+?b+?] This will match aaab why does this make this difference?

A: + & # 63; It's a regular non-greedy match, so b will only match 1 b, so why does a match 3? This is because a pattern match of a regular expression always looks for the first possible match in the string.

4. Option | grouping | references

| is used to separate optional characters such as [ab|cd], which can match either ab or cd. Note: the order of attempted matches of the selection is left to right, so [a|ab], when the a match passes, it does not match ab, even if ab is a better match

() 1. The single item is treated as a subexpression /java(script)? / can match javascript and java, that is, the parenthesis part forms the child expression, can execute | * ? on the child expression; Operations such as

2. The expression /(['"])[a-ES204en]\1/ \1 in the complete schema defines a subschema that references the expression /(['"])[a-ES204en]\1/ \1 refers to the expression in the first parenthesis and therefore references ['"]

Note: /['"][a-ES208en]['"]/ This regularness means single or double quotation marks with 1 lowercase letter and 1 single or double quotation mark. Single and double quotation marks before and after are not matched. If you want to match, you can write [(['"])[ES209en-ES210en]\1]

A number can refer to the expression in the preceding parentheses

5. Make the matching position (anchor point)

Matches the beginning of the string, in multi-line retrieval, matches the beginning of line 1

Matches the end of the string and, in the multi-line retrieval, matches the end of 1 line

\b matches the boundary of 1 word, simply between the characters \w and \W, or between the character \w and the beginning or end of the string

\B matches the position of non-word boundaries

(& # 63; =p) zero-width forward anteaser assertion that requires the following characters to match p, but not those that match p

(& # 63; ! p) zero-width negative-forward assertion that the next character does not match p

6. The modifier

Written to the right of the regular expression literal //

i performs case-insensitive matching

g performs one global match, in short, finding all matches, rather than stopping after the first one is found

m multi-line matching pattern, ^ matches the beginning of 1 line and the beginning of a string, $matches the end of a line and the end of a string /java$/m matches java\nfunc

Note: when the regular expression is global, exec() and test() will be set to the current setting of lastIndex each time, and the execution will start at lastIndex again, so it is best to set lastIndex to 0 each time

I hope this article has been helpful for JavaScript programming.


Related articles: