From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030206 Description of problem: I have a shell script say A that starts another script say B, and then C. B is my backup script, i.e., a long running script that runs a number of rsync processes in parallel. C shuts down all my local hosts. The problem seems to be a SIGINT handling isue in bash. If it's synchronously waiting for a foreground process to finish, it propagates the SIGINT from the child's exist status to itself, but if it's waiting for several background processes, it doesn't. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: 1.Create a shell script B that runs say some hundred sleep processes in background, then waits for them to complete, a script C that prints a message to stdout, and a script A that goes B then C (the scripts are going in an attachment) 2.Start A, and type ^C while B and the sleep processes are running Actual Results: You will see the message from C gets printed depending on whether B runs sleep in foreground or just waits for the background processes to complete. Another oddity is that sleep doesn't terminate on SIGINT, Expected Results: A shouldn't get to the point of running C in either case, since it was already interrupted while running B. Additional info: A: #! /bin/sh B ${1+"$@"}; C B: #! /bin/sh count=0 maxcount=${1-15} while test $count -lt $maxcount; do sleep ${2-300} & count=`expr $count + 1` done echo started $maxcount sleep ${2-10} processes, now waiting... # sleep ${2-300} # this line is enabled in the second run wait echo all sleep processes completed C: #! /bin/sh echo running C sleep 2 echo finishing C 9895 is A, 9896 is B, 9897 is background sleep started by B [pid 9895] wait4(-1, <unfinished ...> [pid 9896] wait4(-1, <unfinished ...> [pid 9897] gettimeofday({1045716721, 777963}, NULL) = 0 [pid 9897] nanosleep({247, 342004000}, <unfinished ...> [pid 9895] <... wait4 resumed> 0xbffff2d8, 0, NULL) = ? ERESTARTSYS (To be restarted) [pid 9896] <... wait4 resumed> 0xbfffeef8, 0, NULL) = ? ERESTARTSYS (To be restarted) [pid 9897] <... nanosleep resumed> 0) = -1 EINTR (Interrupted system call) [pid 9895] --- SIGINT (Interrupt) @ 0 (0) --- [pid 9896] --- SIGINT (Interrupt) @ 0 (0) --- [pid 9897] --- SIGINT (Interrupt) @ 0 (0) --- [pid 9895] sigreturn( <unfinished ...> [pid 9896] rt_sigaction(SIGINT, {SIG_DFL}, <unfinished ...> [pid 9897] gettimeofday( <unfinished ...> [pid 9895] <... sigreturn resumed> ) = ? (mask now [CHLD RTMIN]) [pid 9896] <... rt_sigaction resumed> {0x8075db0, [], SA_RESTORER, 0x40063438}, 8) = 0 [pid 9897] <... gettimeofday resumed> {1045716725, 977581}, NULL) = 0 [pid 9895] wait4(-1, <unfinished ...> [pid 9896] rt_sigprocmask(SIG_SETMASK, [RTMIN], <unfinished ...> [pid 9897] nanosleep({243, 142386000}, <unfinished ...> [pid 9896] <... rt_sigprocmask resumed> NULL, 8) = 0 [pid 9896] rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0 [pid 9896] exit_group(129) = ? [pid 9895] <... wait4 resumed> [WIFEXITED(s) && WEXITSTATUS(s) == 129], 0, NULL) = 9896 [pid 9895] rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0 [pid 9895] --- SIGCHLD (Child exited) @ 0 (0) --- [pid 9895] wait4(-1, 0xbfffefcc, WNOHANG, NULL) = -1 ECHILD (No child processes) [pid 9895] sigreturn() = ? (mask now [RTMIN]) [pid 9895] rt_sigaction(SIGINT, {SIG_DFL}, {0x8075db0, [], SA_RESTORER, 0x40063438}, 8) = 0 [runs C, while sleep keeps on running] 22108 is A, 22109 is B, 22110 is background sleep started by B [pid 22108] wait4(-1, <unfinished ...> [pid 22109] wait4(-1, <unfinished ...> [pid 22110] gettimeofday({1045717130, 150180}, NULL) = 0 [pid 22110] nanosleep({280, 444129000}, <unfinished ...> [pid 22108] <... wait4 resumed> 0xbffff2d8, 0, NULL) = ? ERESTARTSYS (To be restarted) [pid 22109] <... wait4 resumed> 0xbffff378, 0, NULL) = ? ERESTARTSYS (To be restarted) [pid 22110] <... nanosleep resumed> 0) = -1 EINTR (Interrupted system call) [pid 22108] --- SIGINT (Interrupt) @ 0 (0) --- [pid 22109] --- SIGINT (Interrupt) @ 0 (0) --- [pid 22110] --- SIGINT (Interrupt) @ 0 (0) --- [pid 22108] sigreturn( <unfinished ...> [pid 22109] sigreturn( <unfinished ...> [pid 22110] gettimeofday( <unfinished ...> [pid 22108] <... sigreturn resumed> ) = ? (mask now [CHLD RTMIN]) [pid 22109] <... sigreturn resumed> ) = ? (mask now [CHLD RTMIN]) [pid 22110] <... gettimeofday resumed> {1045717136, 886325}, NULL) = 0 [pid 22108] wait4(-1, <unfinished ...> [pid 22109] wait4(-1, <unfinished ...> [pid 22110] nanosleep({273, 707984000}, <unfinished ...> [pid 22109] <... wait4 resumed> [WIFSIGNALED(s) && WTERMSIG(s) == SIGINT], 0, NULL) = 22170 [pid 22109] rt_sigaction(SIGINT, {SIG_DFL}, {0x8075db0, [], SA_RESTORER, 0x40063438}, 8) = 0 [pid 22109] rt_sigaction(SIGINT, {SIG_DFL}, {SIG_DFL}, 8) = 0 [pid 22109] getpid() = 22109 [pid 22109] kill(22109, SIGINT) = 0 [pid 22109] --- SIGINT (Interrupt) @ 0 (0) --- [pid 22108] <... wait4 resumed> [WIFSIGNALED(s) && WTERMSIG(s) == SIGINT], 0, NULL) = 22109 [pid 22108] rt_sigaction(SIGINT, {SIG_DFL}, {0x8075db0, [], SA_RESTORER, 0x40063438}, 8) = 0 [pid 22108] rt_sigaction(SIGINT, {SIG_DFL}, {SIG_DFL}, 8) = 0 [pid 22108] getpid() = 22108 [pid 22108] kill(22108, SIGINT) = 0 [pid 22108] --- SIGINT (Interrupt) @ 0 (0) --- [sleep keeps on running]
Looking in /proc/pid/status I observe that the sleep processes are all ignoring SIGINT. Both they and the sh processes are also ignoring SIGQUIT, which is odd. In just "sh -c 'sleep 999'", the sh process ignores SIGQUIT but the sleep process does not ignore any signals. I guess ignoring SIGQUIT itself is the shell's business, though it seems odd. An strace shows that the shell ignores SIGINT after fork and before exec'ing sleep. It also suspiciously redirects < /dev/null, which was never requested. 22747 access("/bin/sleep", X_OK) = 0 22747 rt_sigaction(SIGINT, {SIG_DFL}, {SIG_DFL}, 8) = 0 22747 rt_sigaction(SIGQUIT, {SIG_DFL}, {SIG_IGN}, 8) = 0 22747 rt_sigaction(SIGCHLD, {SIG_DFL}, {0x8076d30, [], SA_RESTORER, 0x420277e8}, 8) = 0 22747 open("/dev/null", O_RDONLY|O_LARGEFILE) = 3 22747 dup2(3, 0) = 0 22747 close(3) = 0 22747 rt_sigaction(SIGINT, {SIG_IGN}, {SIG_DFL}, 8) = 0 22747 rt_sigaction(SIGQUIT, {SIG_IGN}, {SIG_DFL}, 8) = 0 22747 execve("/bin/sleep", ["sleep", "300"], [/* 28 vars */]) = 0
In fact, all of this looks like intentionally UTTERLY WRONG code in bash. See execute_cmd.c, and its misguided functions async_redirect_stdin and setup_async_signals. bash has decided that when you say "foo &", you really meant: (trap '' SIGINT SIGQUIT; exec foo < /dev/null) & Go figure. Somebody must be on crack.
These patently stupid behaviors seem to have been specified by the 2001 version of POSIX. They have clearly gone nuts. The redirection from /dev/null is just screwy, and utterly wrong and antiuseful in an interactive shell, but now POSIX seems to specify it in a way that does not AFAICT distinguish between scripts and interactive shells. Likewise ignoring SIGQUIT and SIGINT in background jobs makes sense in interactive shells not supporting job control, but is right loopy for scripts. I have not tried to read the entire chapter for all possible context, but it too seems to be specified without regard for the distinction between sane behavior for interactive shells and for scripts.
Gee, how did this end up filed under kernel? I was sure I had changed it to bash after some investigation before posting. Obviously I was wrong... Oh, well... Sorry.
It didn't get reassigned to me, so this is the first time I've seen it (by chance). So is this behaviour POSIX-mandated?
Chet Ramey believes that bash is doing the right thing here: "Bash notes that it receives the SIGINT but waits until the background jobs have completed before acting on it. Since the background jobs ignore keyboard-generated SIGINTs, they don't exit until they're done. `wait' would return immediately if there were a trap on SIGINT, though. POSIX specifies that, too."