Notes on debugging techniques for Linux kernel device drivers

  • 2020-12-21 18:18:49
  • OfStack


/******************
 *  Kernel debugging techniques 
 ******************/

(1) Some debugging related configuration options in the kernel source code

The kernel configuration options include 1 options related to kernel debugging, all in the "kernel hacking" menu. Include:

CONFIG_DEBUG_KERNEL

Make other debugging options available, which should be checked, and by themselves will not turn on all debugging capabilities.

For detailed debugging options, see the Driver 1 book, or see the help instructions for menuconfig.

(2) How to control printk debug statement globally through macro

In conjunction with Makefile, we can define our own debugging statements in the c file.

(3) the use of strace

strace can track all system calls made by user-space programs. Useful parameters are:

-ES29en shows when the call took place - The time it takes to explicitly call T -ES31en qualifies the type of system call being traced, such as "-ES32en execve" -ES34en tracks all child processes -ES35en tracks specific processes. Such as "- p 8856" -ES37en imports the output information into a specific file

strace is very useful for spotting subtle errors in system calls, especially for multi-process programs, and you can get a lot of useful information from the return value of the strace output and the process pid. Such as:

$>strace -o zht.txt -f ./process_create

(4) the use of ltrace

ltrace can track all dynamic library function calls made by user-space programs. Useful parameters are:

-ES52en shows when the call took place - The time taken for the T explicit call -ES54en tracks all child processes -ES55en tracks specific processes -ES56en imports the output information into a specific file

(5) Check the oops message

oops is the most common way for the kernel to tell users that something bad has happened. Typically, after sending oops, the kernel is in an unstable state.

In some cases, oops can cause kernel chaos, which can result in a crash. These situations may include:

*oops occurs in the code that holds the lock *oops occurs during communication with hardware devices *oops occurs in the interrupt context *oops occurs in either the idle process (0) or the init process (1), because the kernel cannot work without these two processes

If oops occurs while another process is running, the kernel kills the process and tries to continue running. oops was created for a number of reasons, including out-of-bounds memory access or illegal instructions.

The most important messages contained in oops are register context and backtrace cues (call trace) that can artificially induce oops, such as:


if(bad_thing)
 BUG();
// or  BUG_ON(bad_thing);

You can throw more serious errors with panic(), and calling panic() will not only print the error message but also suspend the entire system. Used only in extreme cases:


if(terrible_thing)
 panic("foo is %ld!\n", foo);

In some cases, simply printing 1 down stack information can help with testing, such as dump_stack():


 if(!debug_check){
  printk(KERNEL_DEBUG "provide some info\n");
  dump_stack();
 }

conclusion


Related articles: