A brief analysis of the compilation and use of variables in the Nginx configuration file

2020-05-10 23:26:28
OfStack

The nginx configuration file USES a tiny programming language, and many real-world Nginx configuration files are little programs. Of course, whether it is "Turing complete" or not, at least as far as I can see, its design is heavily influenced by Perl and Bourne shell. At this point, compared to the configuration notation of other Web servers such as Apache and Lighttpd, Nginx is a feature of Nginx. Since it's a programming language, 1 usually includes variables (except, of course, for weird functional languages like Haskell).
As anyone familiar with imperative programming languages like Perl, Bourne shell, C/C++ knows, variables are simply containers for "values." In many programming languages, a "value" can be a number like 3.14, a string like hello world, or even a complex data structure like an array or hash table. However, in the Nginx configuration, variables can only hold one type of value, because there is only one type of value, which is a string.
For example, our nginx.conf file has the following 1 line configuration:


set $a "hello world";

We used the standard ngx_rewrite module's set configuration directive to assign a value to the variable $a. In particular, we assigned the string hello world to it.
We see that the name of the Nginx variable is preceded by a $sign, which is required by notation. All Nginx variables must be referenced with the $prefix in the Nginx configuration file. This notation is similar to languages like Perl and PHP.
While a variable prefix such as $would be uncomfortable for the orthodox Java and C# programmers, the benefit of this notation is clear: you can embed the variable directly into a string constant to construct a new string:


set $a hello;  
set $b "$a, $a";

Here we construct the value of the variable $b from the value of the existing Nginx variable $a, so that when the two instructions are executed in sequence, $a is hello, and $b is hello, hello. This technique, known in the Perl world as "variable interpolation" (variable interpolation), makes specialized string splicer operators less necessary. We may as well use the term here.
Let's look at a more complete configuration example:


server {  
  listen 8080;  
 
  location /test {  
    set $foo hello;  
    echo "foo: $foo";  
  }  
}

This example omits the peripheral http configuration block and events configuration block in the nginx.conf configuration file. Using the curl HTTP client to request the /test interface on the command line, we can get it


$ curl 'http://localhost:8080/test'  
foo: hello

Here we use the echo configuration directive of the 3rd ngx_echo module to output the value of the $foo variable as the body of the response to the current request.
As you can see, the parameters of the echo configuration directive also support "variable interpolation". It should be noted, however, that not all configuration directives support "variable interpolation." In fact, whether an instruction parameter allows "variable interpolation" depends on the implementation module of the instruction.
Is there any way to escape the special $character if we want to output a string with the "dollar character" ($) directly through the echo instruction? The answer is no (at least until now with the latest Nginx stable version 1.0.10). Fortunately, we can get around this limitation by, for example, constructing an Nginx variable with a value of $through a module configuration instruction that does not support variable interpolation, and then using this variable in echo. Here's an example:


geo $dollar {  
  default "$";  
}  
 
server {  
  listen 8080;  
 
  location /test {  
    echo "This is a dollar sign: $dollar";  
  }  
}

The test results are as follows:


$ curl 'http://localhost:8080/test'  
This is a dollar sign: $

The standard module ngx_geo configuration instruction geo is used here to assign the string "$" to the variable $dollar, so that we can directly reference our $dollar variable where we need to use the dollar character below. The most common use of the ngx_geo module is to assign the specified Nginx variable based on the client's IP address, which is simply borrowed to "unconditionally" assign the "dollar sign" value to our $dollar variable.
In the context of variable interpolation, there is also a special case where the referenced variable name is followed by the constituent character of the variable name (such as followed by a letter, a number, and an underscore), we need to use a special notation to disambiguate, for example:


server {  
  listen 8080;  
 
  location /test {  
    set $first "hello ";  
    echo "${first}world";  
  }  
}

Here, we in the parameter values of echo configuration directive reference variable $first, followed by the word world, so if the direct writing "of" $firstworld Nginx "variable interpolation calculation engine will recognize as references to the variable $firstworld. In order to solve this problem, Nginx support using curly braces in $string notation after the variable name that such as the ${first} here. The output of the example above is:


$ curl 'http://localhost:8080/test  
hello world

The set directive (and the geo directive mentioned earlier) not only has the ability to assign a value, it also has the side effect of creating an Nginx variable, which it automatically creates when the variable that is an assigned object does not already exist. For example, in the above example, if the $a variable is not already created, the set directive automatically creates the $a user variable. If we use its value without creating it, we get an error. For example,


 server {  
  listen 8080;  
  
  location /bad {  
    echo $foo;  
  }  
 }

At this point, the Nginx server will refuse to load the configuration:


[emerg] unknown "foo" variable

Yes, we can't even start the service!
Interestingly, the creation and assignment of the Nginx variable took place at a completely different time. The Nginx variable can only be created when the Nginx configuration is loaded, or when Nginx is started. Assignment, on the other hand, occurs only when the request is actually processed. This means that using a variable without creating it will cause a startup failure, and it also means that we cannot dynamically create a new Nginx variable at request processing time.
When the Nginx variable 1 is created, its variable name is visible throughout the Nginx configuration, even across the server configuration blocks of different virtual hosts. Let's look at an example:


set $a hello;  
set $b "$a, $a";

Here we created the variable $foo with the set directive in location /bar, so this variable is visible throughout the configuration file, so we can reference it directly in location /foo without worrying about Nginx reporting an error.
Here is the result of using the curl tool on the command line to access both interfaces:


$ curl 'http://localhost:8080/foo'  
foo = []  
$ curl 'http://localhost:8080/bar'  
foo = [32]  
$ curl 'http://localhost:8080/foo'  
foo = []

As you can see from this example, since the set directive is used in location /bar, the assignment is only performed on requests to access /bar. When we request the /foo interface, we always get an empty $foo value, because if the user variable is output unassigned, we get an empty string.
Another important feature we can see from this example is that although the scope of the Nginx variable name is the entire configuration, each request has a separate copy of all the variables, or a separate copy of the container in which the variables are stored, without interfering with each other. For example, after we requested the /bar interface earlier, the $foo variable was assigned a value of 32, but it doesn't affect the $foo value for subsequent requests to the /foo interface (it's still empty!). Because each request has its own copy of the $foo variable.
For starters to Nginx, one of the most common mistakes is to think of the Nginx variable as something that is Shared globally between requests, or "global variables." In fact, the lifetime of the Nginx variable cannot cross the request boundary.

Another common misconception about the nginx variable is that the lifetime of the variable container is bound to the location configuration block. It's not. Let's look at an example involving an "internal jump" :


server {  
  listen 8080;  
  location /foo {  
    set $a hello;  
    echo_exec /bar;  
  }  
  location /bar {  
    echo "a = [$a]";  
  }  
}

Here we initiate an "internal jump" to location /bar using the echo_exec configuration directive provided by the third module ngx_echo in location /foo. An "internal jump" is a jump from one location to another location, within the server, in the process of processing a request. This is different from the "external jump" using the HTTP status codes 301 and 302, since the latter is performed in conjunction with the HTTP client, where the user can see the requested URL address change through an interface such as the browser address bar. The internal jump is very similar to the exec command in Bourne shell (or Bash), which is "there goes no return." Another similar example is the goto statement in the C language.
Since it is an internal jump, the request currently being processed is the same as the original one, except that the current location has changed, so it is still the container copy of the original set of nginx variables. Corresponding to the above example, if we request the interface /foo, the whole workflow is as follows: first assign the value of the $a variable to the string hello through the set instruction in location /foo, then launch an internal jump through the echo_exec instruction, then enter location /bar, and then output the value of the $a variable. Since $a is the same as $a, we can expect the output of the hello line. The test confirmed this 1 point:


set $a hello;  
set $b "$a, $a";

But if we access the /bar interface directly from the client, we get the value of the empty $a variable, because it relies on location /foo to initialize $a. As you can see from the above example, a request in its processing USES a copy of the same set of Nginx variables, even though it goes through several different location configuration blocks. Here, for the first time, we also touch on the concept of "internal jump". It is worth mentioning that the rewrite configuration instruction of the standard ngx_rewrite module can also initiate an "internal jump". For example, the rewrite configuration instruction in the above example can be rewritten as follows:


set $a hello;  
set $b "$a, $a";

The effect is exactly the same as with echo_exec. We'll cover more USES of this rewrite directive later, such as initiating "external jumps" such as 301 and 302. As you can see from the above example, the lifetime of the Nginx variable value container is bound to the request currently being processed, not to location. All we have touched on are the Nginx variables implicitly created by the set directive. These variables are called "user-defined variables" or, more simply, "user-defined variables." Since there are user-defined variables, there are also "predefined variables" or "built-in variables" (builtin variables) provided by the Nginx core and various Nginx modules. The most common use of the built-in Nginx variable is to obtain various information about the request or response. For example, the built-in variable $uri provided by the ngx_http_core module can be used to get the URI of the current request (decoded and without request parameters), while $request_uri can be used to get the original URI of the request (undecoded and with request parameters). Here's an example:


set $a hello;  
set $b "$a, $a";

For the sake of simplicity, even the server configuration block is omitted, and as in all previous examples, we are still listening on port 8080. In this example, we output the values of $uri and $request_uri into the response body. Let's test the /test interface with different requests:


set $a hello;  
set $b "$a, $a";

The other built-in variable that is particularly common is not actually a single variable, but a group of variables of infinite variety, that is, all variables whose names begin with arg_, which we estimate to be the $arg_XXX variable group. One example is $arg_name, which is the value of the URI parameter currently requested with the name name and is in its original form, undecoded. Let's look at a more complete example:


location /test {  
  echo "name: $arg_name";  
  echo "class: $arg_class";  
}

Then use various parameter combinations on the command line to request the /test interface:


$ curl 'http://localhost:8080/test'  
name:  
class:  
$ curl 'http://localhost:8080/test?name=Tom&class=3'  
name: Tom  
class: 3 
$ curl 'http://localhost:8080/test?name=hello%20world&class=9'  
name: hello%20world  
class: 9

In fact, $arg_name can match not only name parameters, but also NAME parameters, Name parameters, and so on:


set $a hello;  
set $b "$a, $a";

Nginx automatically adjusts the parameter names in the original request to all lowercase before matching them.
If you want to decode an encoding sequence such as %XX in the URI parameter value, you can use the set_unescape_uri configuration instruction provided by the 3rd ngx_set_misc module:


location /test {  
  set_unescape_uri $name $arg_name;  
  set_unescape_uri $class $arg_class;  
  echo "name: $name";  
  echo "class: $class";  
}

Now let's look at 1 more:


$ curl 'http://localhost:8080/test?name=hello%20world&class=9'  
name: hello world  
class: 9

The space was decoded!
As you can see from this example, the set_unescape_uri directive also has the ability to automatically create Nginx variables, just like the set directive. We will also focus on the ngx_set_misc module. Variables of the type $arg_XXX have an infinite number of possible names, so they don't correspond to any container that holds the value. Moreover, this variable is specially handled in the Nginx core, and the third Nginx module cannot provide such a magic built-in variable. There are many other built-in variables like $arg_XXX, such as the $cookie_XXX variable group used to fetch the cookie value, the $http_XXX variable group used to fetch the request header, and the $sent_http_XXX variable group used to fetch the response header. Instead of 11, interested readers can refer to the official documentation of the ngx_http_core module. It should be noted that many built-in variables are read-only, such as the $uri and $request_uri we just introduced. Assignment of read-only variables should definitely be avoided because of the unintended consequences, such as:


 location /bad {  
  set $uri /blah;  
  echo $uri;  
 }

This problematic configuration causes Nginx to report a bizarre error at startup:


[emerg] the duplicate "uri" variable in ...

If you try to overwrite another read-only built-in variable, such as $arg_XXX, it may even cause the process to crash in some versions of Nginx.
One example is $args. This variable returns the currently requested URL parameter string (that is, the part after the question mark in the request URL, if any) at read time and can be modified directly at assignment time. Let's look at an example:


location /test {  
  set $orig_args $args;  
  set $args "a=3&b=4";  
  echo "original args: $orig_args";  
  echo "args: $args";  
}

Here, we first save the original URL parameter string in the $orig_args variable, and then modify the current URL parameter string by overwriting the $args variable. Finally, we use the echo instruction to output the values of $orig_args and $args variables respectively. Let's test the /test interface this way:


$ curl 'http://localhost:8080/test'  
original args:  
args: a=3&b=4 
$ curl 'http://localhost:8080/test?a=0&b=1&c=2'  
original args: a=0&b=1&c=2 
args: a=3&b=4

In the first test, we did not set any URL parameter string, so when we output the value of the $orig_args variable, we get null. In the first and second tests, whether or not we provide URL parameter string, the parameter string will be forcibly rewritten as a= 3&b =4 in location /test.
In particular, the $args variable and $arg_XXX 1 are no longer using their own containers for storing values. When we read $args, nginx will execute a small piece of code to read data from the Nginx core where the current URL parameter string is stored. When we rewrite $args, Nginx will execute another small piece of code to rewrite the same location. Other parts of Nginx will read from that location when they need the current URL parameter string, so our changes to $args will affect the functionality of all parts. Let's look at an example:


location /test {  
  set $orig_a $arg_a;  
  set $args "a=5";  
  echo "original a: $orig_a";  
  echo "a: $arg_a";  
}

Here, we first save the value of the built-in variable $arg_a, that is, the URL parameter a of the original request, in the user variable $orig_a, then assign the value of the built-in variable $args, and rewrite the parameter string of the current request as a=5. Finally, we use the echo instruction to output the value of $orig_a and $arg_a respectively. Since the modification of the built-in variable $args will directly cause the URL parameter string of the current request to change, the built-in variable $arg_XXX will naturally change as well. The results of the test confirmed this 1 point:


$ curl 'http://localhost:8080/test?a=3'  
original a: 3 
a: 5

We can see that since the URL parameter string of the original request is a=3, the initial value of $arg_a is 3, but then by overwriting the $args variable, The URL parameter string is forcibly changed to a=5, so the value of $arg_a is automatically changed to 5 again. Let's look at another example of the HTTP agent module ngx_proxy which influences the standard by modifying the $args variable:


server {  
  listen 8080;  
  location /test {  
    set $args "foo=1&bar=2";  
    proxy_pass http://127.0.0.1:8081/args;  
  }  
}  
server {  
  listen 8081;  
  location /args {  
    echo "args: $args";  
  }  
}

Here we define two virtual hosts in the http configuration block. The first virtual host listens on port 8080, and its /test interface itself unconditionally changes the currently requested URL parameter string to foo=1&bar=2 by overwriting the $args variable. Then the /test interface consigns a reverse proxy through the proxy_pass instruction of ngx_proxy module. By default, when the ngx_proxy module forwards an HTTP request to a remote HTTP service, it automatically forwards the URL parameter string of the current request to a remote URL parameter string. The HTTP service on port 8081 is provided by the second virtual host we defined. In the location /args of the second virtual host, we used echo instruction to output the URL parameter string of the current request, so as to check the URL request parameter string actually forwarded by the ngx_proxy module through the /test interface. Let's actually access 1 of the first virtual host /test interface:


$ curl 'http://localhost:8080/test?blah=7'  
args: foo=1&bar=2

We can see that, although the request itself provided URL parameter string blah=7, in location /test, the parameter string was forced to be overwritten to foo=1&bar=2. The proxy_pass instruction then forwarded our overwritten parameter string to the /args interface configured on the second virtual host, and then the URL parameter string of /args interface was output. It turns out that our assignment to the $args variable also successfully affected the behavior of the ngx_proxy module.
This special code that is executed when a variable is read is called a fetch handler in Nginx (get handler). The special code that is executed when the variable is overwritten is called the "save handler" (set handler). The different Nginx modules 1 will probably have different "access handlers" for their variables, making them behave magically. This technique is not uncommon in the computing world. For example, in object-oriented programming, the designer of the class 1 usually does not expose the member variable of the class directly to the user of the class, but provides two separate methods (method) for the member variable's read and write operations, which are often referred to as "accessors" (accessor).