Docker Explore namespace in detail
- 2020-06-23 02:28:48
- OfStack
Docker implements resource isolation through namespace, resource constraints through cgroups, and efficient file operations through write-time replication (ES4en-ES5en-ES6en).
1.namespace resource isolation
6 isolation of namepsace:
namespace |
系统调用参数 |
隔离内容 |
UTS |
CLONE_NEWUTS |
主机名与域名 |
IPC |
CLONE_NEWIPC |
信号量,消息队列和共享内存 |
PID |
CLONE_NEWPID |
进程编号 |
Network |
CLONE_NEWNET |
网络设备,网络栈,端口等 |
Mount |
CLONE_NEWNS |
挂载点(文件系统) |
User |
CLONE_NEWUSER |
用户和用户组 |
One of the main purposes of the Linux kernel to implement namespace is to implement lightweight virtualization (container) services. Processes under the same namespace can perceive changes in each other and have no knowledge of external process 1. This gives the process in the container the illusion that it is in a separate system environment for the purpose of independence and isolation.
Four ways to do namespace API
The API of namespace includes clone(),setns() and unshare(), as well as some files under /proc. In order to determine which six namespace items are quarantined, it is usually necessary to specify one or more of the following six parameters when using these API, either by bit or by operation.
CLONE_NEWUTS,CLONE_NEWIPC,CLONE_NEWPID,CLONE_NEWNET,CLONE_NEWNS,CLONE_NEWUSER.
Create namespace with clone() while creating a new process
Using clone() to create a separate namespace process is the most common practice and the most basic method for Docker to use namespace, which is called as follows.
NAME
clone, __clone2 - create a child process
SYNOPSIS
/* Prototype for the glibc wrapper function */
#include <sched.h>
int clone(int (*fn)(void *), void *child_stack,
int flags, void *arg, ...
/* pid_t *ptid, struct user_desc *tls, pid_t *ctid */ );
clone() is actually a more general implementation of the fork system call, with flags controlling how much functionality is used. There are more than 20 CLONE_* flag (flag bit) parameters used to control all aspects of the clone process (such as whether virtual memory is Shared with the parent process, etc.).
View the /proc/[pid]/ns file
Starting with version 3.8 of the kernel, users can see files under this file that point to different namespace Numbers:
ls -l /proc/2597/ns
total 0
lrwxrwxrwx 1 zhangxa zhangxa 0 Mar 2 06:42 cgroup -> cgroup:[4026531835]
lrwxrwxrwx 1 zhangxa zhangxa 0 Mar 2 06:42 ipc -> ipc:[4026531839]
lrwxrwxrwx 1 zhangxa zhangxa 0 Mar 2 06:42 mnt -> mnt:[4026531840]
lrwxrwxrwx 1 zhangxa zhangxa 0 Mar 2 06:42 net -> net:[4026531957]
lrwxrwxrwx 1 zhangxa zhangxa 0 Mar 2 06:42 pid -> pid:[4026531836]
lrwxrwxrwx 1 zhangxa zhangxa 0 Mar 2 06:42 user -> user:[4026531837]
lrwxrwxrwx 1 zhangxa zhangxa 0 Mar 2 06:42 uts -> uts:[4026531838]
If two processes have the same namespace number, they are under the same namespace.
Another use of these symbolic links in /proc/[pid]/ns is that once the link file is opened, even if all the processes under the namespace have finished, the namespace will still exist and subsequent processes can be added. In Docker, locating and adding an existing namespace via a file descriptor is the most basic approach.
In addition, mount the /proc/[pid]/ns directory file in bind mode until the same effect is achieved:
# mount --bind /proc/2454/ns/uts uts
Join an existing namespace via setns()
As mentioned above, it is also possible to keep namespace as a mount in case all the processes end, and keep namespace in case another process joins later. In Docker, you need this method to execute a new command in an already running container using the docker exec command. With the setns() system call, the process joins an existing namespace from the original namespace as follows. In order not to affect the caller of the process, and in order for the newly added pid namespace to take effect, the child process is created using clone after the execution of the setns() function to continue executing the command, killing the original process.
NAME
setns - reassociate thread with a namespace
SYNOPSIS
#define _GNU_SOURCE /* See feature_test_macros(7) */
#include <sched.h>
int setns(int fd, int nstype);
fd = open(argv[1],O_RDONLY);
setns(fd,0);
execvp(argv[2],&argv[2]);
Suppose the compiled program is "ES118en-ES119en"
# ./setns-test ~/uts /bin/bash
At this point, you are ready to execute the shell command in the newly added namespace.
namespace isolation on the original process via unshare()
It is similar to clone(), except that unshare() runs on the original process and does not need to start a new process.
NAME
unshare - disassociate parts of the process execution context
SYNOPSIS
#include <sched.h>
int unshare(int flags);
The main purpose of calling unshare() is to isolate it without starting a new process, rather than operating out of the original namespace. This allows you to do 1 of the operations that need to be quarantined in the original process. Linux comes with the unshare command, which is implemented through the unshare() system call. Docker is not currently using this system call.
conclusion
Above is all there is in this article about Docker explore namespace explanation, hope to be of service, interested friends can continue to see this site: introduction to Docker security mechanism between the kernel security and container network security, use explanation Docker Linux iptables and Interfaces container network management, etc., have what problem can leave a message at any time, this site will reply you in time. Thank you for your support!