In depth understanding of the select module in python

  • 2020-05-30 20:27:24
  • OfStack

Introduction to the

The select module in Python focuses on I/O multiplexing, providing select poll epoll3 methods (the latter two are available in Linux, windows only supports select), and kqueue method (freeBSD system)

select method

The process specifies which events of which file descriptors (up to 1024 fd) the kernel is listening for. When no file descriptor events occur, the process is blocked. The process is awakened when one or more file descriptor events occur.

When we call select() :

1. Context switch to kernel mode

2. Copy fd from user space to kernel space

3. The kernel traverses all fd to see if the corresponding event has occurred

4. If it does not happen, the process will be blocked. When the device driver generates an interrupt or timeout time, the process will be awakened and traversed again

5. Return fd after traversal

6. Copy fd from kernel space to user space

fd:file descriptor file descriptor


fd_r_list, fd_w_list, fd_e_list = select.select(rlist, wlist, xlist, [timeout])

Parameters: 4 parameters are acceptable (the first 3 are required)

rlist: wait until ready for reading wlist: wait until ready for writing xlist: wait for an "exceptional condition" timeout: timeout

Return value: 3 lists

The select method is used to monitor file descriptors (select blocks when file descriptor conditions are not met) and returns three lists when a file descriptor state changes

1. When fd in the sequence of parameter 1 meets the "readable" condition, the changed fd is obtained and added to fd_r_list

2. When fd is included in the sequence of parameter 2, all fd in the sequence is added to fd_w_list

3. When an error occurs in fd in the parameter 3 sequence, the error fd is added to fd_e_list

4. When the timeout time is empty, select will block 1 until the handle of the listener changes

When timeout = n(positive integer), select blocks n seconds if none of the listening handles have changed, and then returns three empty lists. If the listening handles have changed, it executes directly.

Example: implement a concurrent server using select


import socket
import select

s = socket.socket()
s.bind(('127.0.0.1',8888))
s.listen(5)
r_list = [s,]
num = 0
while True:
 rl, wl, error = select.select(r_list,[],[],10)
 num+=1
 print('counts is %s'%num)
 print("rl's length is %s"%len(rl))
 for fd in rl:
  if fd == s:
   conn, addr = fd.accept()
   r_list.append(conn)
   msg = conn.recv(200)
   conn.sendall(('first----%s'%conn.fileno()).encode())
  else:
   try:
    msg = fd.recv(200)
    fd.sendall('second'.encode())
   except ConnectionAbortedError:
    r_list.remove(fd)


s.close()

import socket

flag = 1
s = socket.socket()
s.connect(('127.0.0.1',8888))
while flag:
 input_msg = input('input>>>')
 if input_msg == '0':
  break
 s.sendall(input_msg.encode())
 msg = s.recv(1024)
 print(msg.decode())

s.close()

On the server side we can see that we need to keep calling select, which means:

1 when there are too many file descriptors, copy between user space and kernel space can be time-consuming

2 when there are too many file descriptors, the kernel's traversal of the file descriptors is also a waste of time

3 select supports a maximum of 1024 file descriptors

The differences between poll and select are not large and will not be covered in this article

epoll method:

epoll is a good improvement on select:

1. The solution of epoll is in the epoll_ctl function. Each time a new event is registered into the epoll handle, all fd is copied into the kernel, instead of being copied repeatedly while epoll_wait is registered. epoll guarantees that each fd is copied only once during the entire process.

When epoll_ctl, epoll will iterate over the specified fd once (which is necessary once) and specify a callback function for each fd. When the device is ready to wake up the waiters on the waiting queue, this callback function will be called, and this callback function will add the ready fd to a ready list. epoll_wait's job is actually to look in this ready list to see if fd is ready

3. epoll has no additional restrictions on file descriptors


select.epoll(sizehint=-1, flags=0)  create epoll object 


epoll.close()
Close the control file descriptor of the epoll object. Shut down epoll Object file descriptor 

epoll.closed
True if the epoll object is closed. detection epoll Whether the object is closed or not 

epoll.fileno()
Return the file descriptor number of the control fd. return epoll Object file descriptor 

epoll.fromfd(fd)
Create an epoll object from a given file descriptor. According to the specified fd create epoll object 

epoll.register(fd[, eventmask])
Register a fd descriptor with the epoll object. to epoll Register in object fd And the corresponding events 

epoll.modify(fd, eventmask)
Modify a registered file descriptor. Modify the fd In the event 

epoll.unregister(fd)
Remove a registered file descriptor from the epoll object. Cancel the registration 

epoll.poll(timeout=-1, maxevents=-1)
Wait for events. timeout in seconds (float) Block until registered fd events , Returns the 1 a dict , the format is: {(fd1,event1),(fd2,event2), ... (fdn,eventn)}

Events:


EPOLLIN Available for read  Can be read   State for 1
EPOLLOUT Available for write  Can write   State for 4
EPOLLPRI Urgent data for read
EPOLLERR Error condition happened on the assoc. fd  An error occurred   State for 8
EPOLLHUP Hang up happened on the assoc. fd  Pending state 
EPOLLET Set Edge Trigger behavior, the default is Level Trigger behavior  It fires horizontally by default, and edges fire when the event is set 
EPOLLONESHOT Set one-shot behavior. After one event is pulled out, the fd is internally disabled
EPOLLRDNORM Equivalent to EPOLLIN
EPOLLRDBAND Priority data band can be read.
EPOLLWRNORM Equivalent to EPOLLOUT
EPOLLWRBAND Priority data may be written.
EPOLLMSG Ignored.

Horizontal trigger and edge trigger:

Level_triggered(horizontal trigger, sometimes called conditional trigger) : when a read-write event occurs on the monitored file descriptor, epoll.poll() The handler is notified to read and write. If you don't read and write the data all once (such as the read and write buffer is too small), call it again epoll.poll() , it will also tell you to continue reading and writing on the unfinished file descriptor, of course, if you do not read and write, it will tell you 1 straight!! If you have a large number of ready file descriptors in the system that you don't need to read or write to, and they return every time, it makes it much less efficient for the handler to retrieve the ready file descriptors it CARES about!! The advantages are obvious: stability and reliability

Edge_triggered(edge trigger, sometimes called state trigger) : when a read-write event occurs on the monitored file descriptor, epoll.poll() The handler is notified to read and write. If you don't read and write all the data this time (such as the read and write buffer is too small), call it again epoll.poll() It will not notify you until the second read-write event appears on the file descriptor!! This mode is more efficient than horizontal triggering, and the system won't be flooded with ready file descriptors you don't care about!! Cons: unreliable under certain conditions

epoll instances:


import socket
import select

s = socket.socket()
s.bind(('127.0.0.1',8888))
s.listen(5)
epoll_obj = select.epoll()
epoll_obj.register(s,select.EPOLLIN)
connections = {}
while True:
 events = epoll_obj.poll()
 for fd, event in events:
  print(fd,event)
  if fd == s.fileno():
   conn, addr = s.accept()
   connections[conn.fileno()] = conn
   epoll_obj.register(conn,select.EPOLLIN)
   msg = conn.recv(200)
   conn.sendall('ok'.encode())
  else:
   try:
    fd_obj = connections[fd]
    msg = fd_obj.recv(200)
    fd_obj.sendall('ok'.encode())
   except BrokenPipeError:
    epoll_obj.unregister(fd)
    connections[fd].close()
    del connections[fd]

s.close()
epoll_obj.close()

import socket

flag = 1
s = socket.socket()
s.connect(('127.0.0.1',8888))
while flag:
 input_msg = input('input>>>')
 if input_msg == '0':
  break
 s.sendall(input_msg.encode())
 msg = s.recv(1024)
 print(msg.decode())

s.close()

conclusion


Related articles: