Do not kill the python thread by force

  • 2020-05-26 09:34:10
  • OfStack

Preface:

Do not attempt to kill an python thread by forcing it to do so. This is not justified by the design of the service. Multithreading is primarily used for task collaboration and concurrency, and if you use force to kill threads, there is a good chance that an unexpected bug will occur. Remember 1 point, a locked resource does not release a locked resource because the thread exits!

We can cite two common examples:

1. An A thread got the lock, because it was forced to kill, release() failed to release the lock resource in time, so all the threads were blocked to acquire the resource, which is a typical deadlock scenario.

2. In the common production consumer scenario, the consumer gets the task from the task queue, but does not throw the task he is doing back to the queue after being killed, which results in data loss.

Here's how java and python terminate threads:

java has three ways to terminate a thread:

1. Use the exit flag to make the thread exit normally, that is, when the run method is finished, the thread terminates.
2. Use the stop method to forcibly terminate the thread (not recommended, as stop and suspend and resume1 can also have unpredictable results).
3. Interrupt the thread using the interrupt method.

python can be used in two ways:

1. Exit sign
2. Use ctypes to force the thread to be killed

Whether it's python or java, the ideal way to stop and exit a thread is to have the thread kill itself. A thread suicide is when you give it a token and it exits the thread.

We will now test for exceptions that stop the python thread in a variety of ways. We look at all the threads of execution of a process, the process is used to control resources, the thread is used as a scheduling unit, the process must have 1 thread to be scheduled to execute, the default thread and the pid1 of the process.


ps -mp 31449 -o THREAD,tid
 
USER   %CPU PRI SCNT WCHAN USER SYSTEM  TID
root   0.0  -  - -     -   -   -
root   0.0 19  - poll_s  -   - 31449
root   0.0 19  - poll_s  -   - 31450

After getting all the threads of the process, we learned from strace that 31450 is the thread id that needs us to kill. When we kill, the whole process will crash. In multi-threaded environment, the generated signal is passed to the entire process, 1 in general, all threads have the opportunity to receive this signal, the process in the thread context of the received signal to execute the signal processing function, the specific thread is difficult to know. That is, the signal is sent to one thread of the process at random.


strace -p <span style="font-size:14px;line-height:21px;">31450</span> Process <span style="font-size:14px;line-height:21px;">31450</span> attached - interrupt to quit
select(0, NULL, NULL, NULL, {0, 320326}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0})   = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0})   = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0})   = ? ERESTARTNOHAND (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
Process <span style="font-size:14px;line-height:21px;">31450</span> detached



The problem above is actually caused by 1 in pthread. When we add the signal signal handler into the python code, the callback function can prevent the entire process from exiting, so the problem is that the signal function cannot identify which thread you want to kill, that is to say, it cannot precisely kill a particular thread. You may send the signal to thread 31450 id, but the signal acceptor is any one of the processes. In addition, the parameters to the signal processing function are only the number of signals and stack, which are optional.

After adding signal processing, the process will not exit


select(0, NULL, NULL, NULL, {1, 0})   = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0})   = ? ERESTARTNOHAND (To be restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xffffffff)        = -1 EINTR (Interrupted system call)
select(0, NULL, NULL, NULL, {1, 0})   = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0})   = 0 (Timeout)

If you want to kill a thread by external notification, you can build it using the rpc service, or you can communicate in another way. The signal signal cannot, because it cannot transmit any more information.

The thread of python is not simulated, but a real kernel thread. The kernel calls the pthread method, but the upper layer of Python does not provide a method to close the thread, which needs to be handled by ourselves. It is highly recommended to use event or custom flag bits. If you must kill a thread, you can use python ctypes PyThreadState SetAsyncExc to force an exit without affecting the running python service.

The implementation principle of this function is relatively simple, in fact, also in the python virtual machine to make a marker, and then by the virtual machine to run an exception to cancel the thread, virtual machine to help you do try cache. Never kill a thread of python from the outside. You can find id via ctypes, but you can kill the whole process by kill.

The following code is a sample of killing threads with ctypes, which is not recommended because it is too rude.


import ctypes
 
def terminate_thread(thread):
  if not thread.isAlive():
    return
 
  exc = ctypes.py_object(SystemExit)
  res = ctypes.pythonapi.PyThreadState_SetAsyncExc(
    ctypes.c_long(thread.ident), exc)
  if res == 0:
    raise ValueError("nonexistent thread id")
  elif res > 1:
    ctypes.pythonapi.PyThreadState_SetAsyncExc(thread.ident, None)
    raise SystemError("PyThreadState_SetAsyncExc failed")



Let's simply look1 under PyThreadState source code, in short, to trigger thread exception mode. Those interested can read the design of python pystate.c, along with some video sharing by youtube.


 
int
PyThreadState_SetAsyncExc(long id, PyObject *exc) {
  PyInterpreterState *interp = GET_INTERP_STATE();
  ...
  HEAD_LOCK();
  for (p = interp->tstate_head; p != NULL; p = p->next) {
    if (p->thread_id == id) {
       Find the thread from the list id To avoid deadlocks, we need to release head_mutex . 
      PyObject *old_exc = p->async_exc;
      Py_XINCREF(exc); # Increases the number of references to the object 
      p->async_exc = exc; #  more exc model 
      HEAD_UNLOCK();
      Py_XDECREF(old_exc); #  Because to cancel, of course, you also decrement the reference 
      ...
      return 1; # Thread destroyed successfully 
    }
  }
  HEAD_UNLOCK();
  return 0;
}

Native posix pthread can terminate child threads in the main thread using ptread_cancel(tid). However, the thread library of Python does not support this, on the grounds that we should not end a thread compulsively, which would cause a lot of trouble, and should let the thread end itself. Therefore, in Python, the recommended method is to judge a flag bit in a loop in the child thread, change the flag bit in the main thread, and when the child thread reads the flag bit change, it will end itself.

Logic like this:


def consumer_threading():
 t1_stop= threading.Event()
 t1 = threading.Thread(target=thread1, args=(1, t1_stop))
 
 t2_stop = threading.Event()
 t2 = threading.Thread(target=thread2, args=(2, t2_stop))
 
 time.sleep(duration)
 #stop the thread2
 t2_stop.set()
 
def thread1(arg1, stop_event):
 while(not stop_event.is_set()):
   #similar to time.sleep()
   stop_event.wait(time)
   pass
 
 
def thread2(arg1, stop_event):
 while(not stop_event.is_set()):
   stop_event.wait(time)
   pass

To summarize, although we can use pystats in ctypes to control threads, this method of roughly interrupting threads is not reasonable. Please choose suicide mode! What if your thread is blocking io and cannot judge the event? Your program needs to be optimized, at least in the network io layer need to have active timeout, to avoid 1 straight block down.


Related articles: