Detail deadlock traps when calling Shell scripts with Python+Java

2020-07-21 09:04:32
OfStack

preface

Recently have one demand, should time to determine whether a task execution conditions meet and trigger Spark task, usually write Spark tasks are encapsulated as a Jar package, and then USES the Shell script execution forms required for the incoming parameters, considering the complex logic judgment conditions, only Shell script to complete the unfavorable to development and testing, so the research use the Python and Java call Spark script method respectively.

The versions used are Python 3.6.4 and JDK 8

Python

The subprocess library is mainly used. The API of Python changes frequently, and the run method is added after 3.5, which greatly reduces the difficulty of using and the probability of encountering Bug.


subprocess.run(["ls", "-l"])
subprocess.run(["sh", "/path/to/your/script.sh", "arg1", "arg2"])

Why does the run method reduce the probability of encountering Bug?

Without the run method, we usually call another advanced method, Older high-ES37en API, such as call, check_all, or simply create the Popen object. Since the default output is console, if you are not familiar with API or have not looked closely at doc and want to wait for the child process to run and get the output, use stdout = PIPE In addition to wait, when the output is large enough to cause Buffer to be full, the process waits until 1 is read, creating a deadlock. This strange phenomenon occurred the first time I printed log from Spark to console. The following script can simulate:


# a.sh
for i in {0..9999}; do
 echo '***************************************************'
done


p = subprocess.Popen(['sh', 'a.sh'], stdout=subprocess.PIPE)
p.wait()

call calls wait directly inside the method to produce the same effect.

To avoid deadlocks, you must either dispose of the input and output before calling the wait method, or use the recommended communicate method. The communicate method generates a reading thread internally to read stdout stderr respectively, thus avoiding Buffer being full. The new run method mentioned earlier calls communicate internally.


stdout, stderr = process.communicate(input, timeout=timeout)

Java

Python, Java is much easier.

Java 1 Runtime.getRuntime().exec() Or ProcessBuilder calls an external script:


Process p = Runtime.getRuntime().exec(new String[]{"ls", "-al"});
Scanner sc = new Scanner(p.getInputStream());
while (sc.hasNextLine()) {
 System.out.println(sc.nextLine());
}
// or
Process p = new ProcessBuilder("sh", "a.sh").start(); 
p.waitFor(); // dead lock

Note that the direction of stream here is relative to the main program, so getInputStream() Is the output of the child process, and getOutputStream() Is the input to the child process.

For the same Buffer reason, a deadlock can result if the waitFor method is called to wait for the child process to complete without processing the output in time.
Because Java API rarely change, so there is no like Python provide new run method, but the open source community also gave his scheme, such as commons exec, or http: / / www baeldung. com/run - shell - command - in - java, or alvin alexander solution (although not complete) is given.


// commons exec In order to get the output  python  It's complicated 1 some 
CommandLine commandLine = CommandLine.parse("sh a.sh");
  
ByteArrayOutputStream out = new ByteArrayOutputStream();
PumpStreamHandler streamHandler = new PumpStreamHandler(out);
  
Executor executor = new DefaultExecutor();
executor.setStreamHandler(streamHandler);
executor.execute(commandLine);
  
String output = new String(out.toByteArray());

But the idea and Python are unified 1, is to open a new thread in the background to read the output of the child process, prevent Buffer from being full.

Another thought of unity 1 is that it is recommended to use arrays or list to separate the shell commands entered into multiple segments, so that the system handles special characters such as Spaces.

Reference:

https: / / dcreager net 2009/08/06 / subprocess - communicate - drawbacks/https: / / alvinalexander com/java/java - exec - processbuilder - process - 1 https: / / www. javaworld. com/article / 2071275 / core - java/when - runtime - exec - won - t. html

conclusion