Docker Explore namespace in detail

  • 2020-06-23 02:28:48
  • OfStack

Docker implements resource isolation through namespace, resource constraints through cgroups, and efficient file operations through write-time replication (ES4en-ES5en-ES6en).

1.namespace resource isolation

6 isolation of namepsace:

namespace
系统调用参数
隔离内容
UTS
CLONE_NEWUTS
主机名与域名
IPC
CLONE_NEWIPC
信号量,消息队列和共享内存
PID
CLONE_NEWPID
进程编号
Network
CLONE_NEWNET
网络设备,网络栈,端口等
Mount
CLONE_NEWNS
挂载点(文件系统)
User
CLONE_NEWUSER
用户和用户组

One of the main purposes of the Linux kernel to implement namespace is to implement lightweight virtualization (container) services. Processes under the same namespace can perceive changes in each other and have no knowledge of external process 1. This gives the process in the container the illusion that it is in a separate system environment for the purpose of independence and isolation.

Four ways to do namespace API

The API of namespace includes clone(),setns() and unshare(), as well as some files under /proc. In order to determine which six namespace items are quarantined, it is usually necessary to specify one or more of the following six parameters when using these API, either by bit or by operation.

CLONE_NEWUTS,CLONE_NEWIPC,CLONE_NEWPID,CLONE_NEWNET,CLONE_NEWNS,CLONE_NEWUSER.

Create namespace with clone() while creating a new process

Using clone() to create a separate namespace process is the most common practice and the most basic method for Docker to use namespace, which is called as follows.


NAME 
    clone, __clone2 - create a child process 
SYNOPSIS 
    /* Prototype for the glibc wrapper function */ 
    #include <sched.h> 
    int clone(int (*fn)(void *), void *child_stack, 
         int flags, void *arg, ... 
         /* pid_t *ptid, struct user_desc *tls, pid_t *ctid */ ); 

clone() is actually a more general implementation of the fork system call, with flags controlling how much functionality is used. There are more than 20 CLONE_* flag (flag bit) parameters used to control all aspects of the clone process (such as whether virtual memory is Shared with the parent process, etc.).

View the /proc/[pid]/ns file

Starting with version 3.8 of the kernel, users can see files under this file that point to different namespace Numbers:


 ls -l /proc/2597/ns
total 0
lrwxrwxrwx 1 zhangxa zhangxa 0 Mar 2 06:42 cgroup -> cgroup:[4026531835]
lrwxrwxrwx 1 zhangxa zhangxa 0 Mar 2 06:42 ipc -> ipc:[4026531839]
lrwxrwxrwx 1 zhangxa zhangxa 0 Mar 2 06:42 mnt -> mnt:[4026531840]
lrwxrwxrwx 1 zhangxa zhangxa 0 Mar 2 06:42 net -> net:[4026531957]
lrwxrwxrwx 1 zhangxa zhangxa 0 Mar 2 06:42 pid -> pid:[4026531836]
lrwxrwxrwx 1 zhangxa zhangxa 0 Mar 2 06:42 user -> user:[4026531837]
lrwxrwxrwx 1 zhangxa zhangxa 0 Mar 2 06:42 uts -> uts:[4026531838]

If two processes have the same namespace number, they are under the same namespace.

Another use of these symbolic links in /proc/[pid]/ns is that once the link file is opened, even if all the processes under the namespace have finished, the namespace will still exist and subsequent processes can be added. In Docker, locating and adding an existing namespace via a file descriptor is the most basic approach.

In addition, mount the /proc/[pid]/ns directory file in bind mode until the same effect is achieved:


# mount --bind /proc/2454/ns/uts uts

Join an existing namespace via setns()

As mentioned above, it is also possible to keep namespace as a mount in case all the processes end, and keep namespace in case another process joins later. In Docker, you need this method to execute a new command in an already running container using the docker exec command. With the setns() system call, the process joins an existing namespace from the original namespace as follows. In order not to affect the caller of the process, and in order for the newly added pid namespace to take effect, the child process is created using clone after the execution of the setns() function to continue executing the command, killing the original process.


NAME 
    setns - reassociate thread with a namespace 
SYNOPSIS 
    #define _GNU_SOURCE       /* See feature_test_macros(7) */ 
    #include <sched.h> 
    int setns(int fd, int nstype); 

fd = open(argv[1],O_RDONLY); 
setns(fd,0); 
execvp(argv[2],&argv[2]); 

Suppose the compiled program is "ES118en-ES119en"
# ./setns-test ~/uts /bin/bash

At this point, you are ready to execute the shell command in the newly added namespace.

namespace isolation on the original process via unshare()

It is similar to clone(), except that unshare() runs on the original process and does not need to start a new process.


NAME 
    unshare - disassociate parts of the process execution context 
SYNOPSIS 
    #include <sched.h> 
    int unshare(int flags); 

The main purpose of calling unshare() is to isolate it without starting a new process, rather than operating out of the original namespace. This allows you to do 1 of the operations that need to be quarantined in the original process. Linux comes with the unshare command, which is implemented through the unshare() system call. Docker is not currently using this system call.

conclusion

Above is all there is in this article about Docker explore namespace explanation, hope to be of service, interested friends can continue to see this site: introduction to Docker security mechanism between the kernel security and container network security, use explanation Docker Linux iptables and Interfaces container network management, etc., have what problem can leave a message at any time, this site will reply you in time. Thank you for your support!


Related articles: