Description of problem: The fork() function is incorrectly clearing the set of pending signals for the child process *after* a signal has been sent to the child process; consequently, the signal is never delivered. Version-Release number of selected component (if applicable): glibc 2.17 How reproducible: Always (on my system) if the `usleep(100000)` is commented-out. Steps to Reproduce: 1. Build and execute the attached program. 2. Watch it hang. Actual results: The program hangs due to the following sequence of events: 1. The child process is fork()ed; 2. The SIGTERM is sent to the child process; and 3. The set of pending signals for the child process is cleared Expected results: Program should immediately exit with status 0. Additional info: According to the POSIX standard at <http://pubs.opengroup.org/onlinepubs/9699919799/>, the following sequence of events should occur: 1. The child process is forked(); 2. The set of pending signals for the child process is cleared; and 3. The SIGTERM is sent to the child process. In other words, the fork() should be atomic. My platform is a 6 processor VMware Workstation 14 Player VM running CentOS 7.4.1708 with gcc 4.8.5 and glibc-2.17-196.el7_4.2.x86_64.
Steven, Thank you for the bug report. Could you please attach the example program?
Created attachment 1382626 [details] Program to demonstrate the problem Forgot to attach this. Sorry.
Created attachment 1477667 [details] Reproducer using sigsuspend Isn't this a bug in the implementation of sigpause? It unblocks the signal (which causes the handler to run if the signal is pending), and then suspends the calling thread. This also affects sigsuspend, and this is implemented in glibc as a direct system call, but the kernel (circa 4.18 sources) appears to have the same issue: first the signal is unblocked (via set_current_blocked), and then the waiting starts (the loop in sigsuspend). In strace, it looks like this: 8311 clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fdc7101b790) = 8312 8311 rt_sigaction(SIGTERM, {sa_handler=SIG_DFL, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fdc70aa26e0}, {sa_handler=0x400800, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7fdc70aa26e0}, 8) = 0 8311 rt_sigprocmask(SIG_UNBLOCK, [TERM], [], 8) = 0 8311 rt_sigprocmask(SIG_SETMASK, NULL, [], 8) = 0 8311 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 8311 kill(8312, SIGTERM) = 0 8311 wait4(-1, <unfinished ...> 8312 --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=8311, si_uid=1000} --- 8312 rt_sigreturn({mask=[]}) = 0 8312 rt_sigsuspend(~[TERM RTMIN RT_1], 8 This race does not happen all the time, something like this is needed: $ while strace -o log -f ./bz1535670-sigsuspend ; do date; done I think we need a kernel fix first, and then we can debate whether to fix the obsolete sigpause function.
I submitted a clarification request to the Austin Group: http://austingroupbugs.net/view.php?id=1201 I think that both kernel and glibc are conforming to the POSIX description as it is worded right now because the description contains the same race condition. It is not entirely clear to me if that is by accident or deliberate, hence the need for clarification.
(In reply to Florian Weimer from comment #6) > I submitted a clarification request to the Austin Group: > > http://austingroupbugs.net/view.php?id=1201 > > I think that both kernel and glibc are conforming to the POSIX description > as it is worded right now because the description contains the same race > condition. It is not entirely clear to me if that is by accident or > deliberate, hence the need for clarification. I think that submitting a clarification to the Austin Group to get an official statement for POSIX is the right way forward. I agree that as of *today* the API as implemented is conforming. Historically it has been the case that sigpause and sigsuspend do not atomically alter the signal mask and wait, therefore the parent is forced to follow a design that requires sending multiple signals until it knows the child has received at least one of those signals (this is not the same problem as "unreliable semantics" that were present in the original System-V). The alternative design is to have the child's signal handler finish by exiting (since a sent signal is always caught at least once). This design must establish a happens-before relationship between the child exiting and the parent calling wait() to capture the child's exit status. If the happens-before relationship is not established the parent may hang waiting for a child that is waiting for a signal. As it stands the test case is broken. Fixes include: (a) Make the signal handler do all that is required and then exit. static void handleSigTerm(int sig) { /* Do required shutdown... */ exit (0); } or (b) Have the parent loop. I wrote this up quickly, just to make my point, but I haven't vetted this with anyone from my team yet. Looping involves a synchronized state between parent and child to know when the child reaches the exit() state. Transitions are achieved with signals. You may need 1 signal to transition, or more than 1 signal. There is the opportunity for states to stack in the pipe, but eventually the parent will confirm the child has reached the exit state. The following example uses only pipe/write/read/close as additional functions, but you might use sigwaitinfo() in the child thread to do something like this safely with an even loop. #include <signal.h> #include <stdlib.h> #include <sys/types.h> #include <unistd.h> #include <sys/wait.h> #include <stdio.h> #include <errno.h> int output[2]; int input[2]; static void handleSigTerm(int sig) { /* Tell parent we handled the signal. */ write (input[1], "x", 1); } int main(void) { ssize_t res; char buf[1]; sighold(SIGTERM); sigset(SIGTERM, handleSigTerm); pipe(input); pipe(output); pid_t forkPid = fork(); if (forkPid == 0) { char buf[1]; close (input[0]); close (output[1]); /* Child process. */ sigpause(SIGTERM); /* Tell parent we caught the signal. */ read (output[0], buf, 1); write (input[1], &buf[0], 1); close (output[0]); close (input[1]); exit(0); } sigset(SIGTERM, SIG_DFL); sigrelse(SIGTERM); do { kill(forkPid, SIGTERM); res = write (output[1], "1", 1); res = read (input[0], buf, 1); printf ("read() = %c [%zd]\n", buf[0], res); } while (buf[0] != '1'); /* The read of '1' establishes a happens-before, that ensures the child is going to exit. */ wait(NULL); return 0; } Notes: - https://stackoverflow.com/questions/40592066/sigsuspend-vs-additional-signals-delivered-during-handler-execution/40900621#40900621
(In reply to Carlos O'Donell from comment #7) > (a) Make the signal handler do all that is required and then exit. > > static void > handleSigTerm(int sig) > { > /* Do required shutdown... */ > exit (0); > } Note that this should be “_exit (0);”, otherwise this will flush stdio streams and run atexit handlers, which can corrupt the state of the parent process.
(In reply to Florian Weimer from comment #6) >I think that both kernel and glibc are conforming to the POSIX description > as it is worded right now because the description contains the same race > condition. Forgive me, but I don't see how this statement from the POSIX standard for fork(2): The set of signals pending for the child process shall be initialized to the empty set. can be interpreted to mean that the child process is allowed to receive a signal before its fork(2) returns. I'll be very interested in the response from the Austin Group.
(In reply to Steven Emmerson from comment #9) > (In reply to Florian Weimer from comment #6) > > >I think that both kernel and glibc are conforming to the POSIX description > > as it is worded right now because the description contains the same race > > condition. > > Forgive me, but I don't see how this statement from the POSIX standard for > fork(2): > > The set of signals pending for the child process shall be initialized to > the empty set. > > can be interpreted to mean that the child process is allowed to receive a > signal before its fork(2) returns. The above simply states that the set of pending signals is *not* inherited from the parent process. In the producer, the parent process does not have any pending signals, so this does not apply here.
(In reply to Florian Weimer from comment #10) > >The set of signals pending for the child process shall be initialized to > >the empty set. > The above simply states that the set of pending signals is *not* inherited > from the parent process. I agree that the statement means that the set of pending signals is not inherited from the parent process; however, I interpret the statement to *also* mean that the child process *can rely on the set of pending signals being completely cleared*. > In the producer, the parent process does not have > any pending signals, so this does not apply here. I don't see how the set of pending signals in the parent process has any bearing on the set of pending signals in the child process if the child's pending signals are always cleared. In other words, the parent's pending signals are irrelevant to the child.
(In reply to Steven Emmerson from comment #11) > (In reply to Florian Weimer from comment #10) > > >The set of signals pending for the child process shall be initialized to > > >the empty set. > > > The above simply states that the set of pending signals is *not* inherited > > from the parent process. > > I agree that the statement means that the set of pending signals is not > inherited from the parent process; however, I interpret the statement to > *also* mean that the child process *can rely on the set of pending signals > being completely cleared*. Sure, but this doesn't make a difference here. What appears to happen is that the signal is unblocked, one signal is delivered (by running the handler), and then the process waits for another signal to arrive, which never happens in the reproducer. No pending signals are involved.
(In reply to Florian Weimer from comment #12) > Sure, but this doesn't make a difference here. What appears to happen is > that the signal is unblocked, one signal is delivered (by running the > handler), and then the process waits for another signal to arrive, which > never happens in the reproducer. No pending signals are involved. How can a signal be delivered because it was unblocked and yet not have been pending?
(In reply to Steven Emmerson from comment #13) > (In reply to Florian Weimer from comment #12) > > Sure, but this doesn't make a difference here. What appears to happen is > > that the signal is unblocked, one signal is delivered (by running the > > handler), and then the process waits for another signal to arrive, which > > never happens in the reproducer. No pending signals are involved. > > How can a signal be delivered because it was unblocked and yet not have been > pending? If the signal is unblocked, the pending state only exists for exploratory purposes as far as I can see. The signal is delivered immediately. POSIX says this: “During the time between the generation of a signal and its delivery or acceptance, the signal is said to be "pending". Ordinarily, this interval cannot be detected by an application. However, a signal can be "blocked" from delivery to a thread.”
(In reply to Florian Weimer from comment #14) So, in the fork(2) function, the signal was blocked, then it was delivered, and the standard says that all pending signals are cleared. And you don't see anything wrong with that. I give up.
(In reply to Steven Emmerson from comment #15) > (In reply to Florian Weimer from comment #14) > > So, in the fork(2) function, the signal was blocked, then it was delivered, > and the standard says that all pending signals are cleared. Let me try again: The signal necessarily arrives after the fork because it is targeted at the child process. It is delivered when it is unblocked as the first phase of the sigpause call (or it arrives after the unblocking; the difference is not observable). Then the second phase of the sigpause call begins, the waiting phase. At this point, there is no pending signal, so sigpause blocks. With the usleep commented out, the signal is sent always during the second phase of the sigpause call, at which it will observe that signal is delivered and return. I don't think the current non-atomic behavior, as implemented for sigpause in glibc and for sigsuspend in the kernel, is useful, but the POSIX wording allows it.
(In reply to Florian Weimer from comment #16) > The signal necessarily arrives after the fork because it is targeted at the > child process. It is delivered when it is unblocked as the first phase of > the sigpause call (or it arrives after the unblocking; the difference is not > observable). Then the second phase of the sigpause call begins, the waiting > phase. At this point, there is no pending signal, so sigpause blocks. How do you reconcile this sequence with the standard saying that, regarding fork(), "The set of signals pending for the child process shall be initialized to the empty set.". If your sequence is valid, then the standard shouldn't have mentioned this at all because it makes no difference if pending signals are allowed to be delivered.
(In reply to Steven Emmerson from comment #17) > (In reply to Florian Weimer from comment #16) > > The signal necessarily arrives after the fork because it is targeted at the > > child process. It is delivered when it is unblocked as the first phase of > > the sigpause call (or it arrives after the unblocking; the difference is not > > observable). Then the second phase of the sigpause call begins, the waiting > > phase. At this point, there is no pending signal, so sigpause blocks. > > How do you reconcile this sequence with the standard saying that, regarding > fork(), "The set of signals pending for the child process shall be > initialized to the empty set.". > > If your sequence is valid, then the standard shouldn't have mentioned this > at all because it makes no difference if pending signals are allowed to be > delivered. The quoted sentence requires that that the set of pending signals is not copied from the parent process, unlike other parts of the process state. In the execution sequence, there are no pending signals right after the fork, as required. But then the parent process sends a signal, and it becomes pending in the subprocess (either before or after the unblock in sigpause). This signal is delivered because sigpause unblocks the signal number. Due to the non-atomic nature of sigpause, the waiting begins after the delivery, but then the delivered signal is gone, so that sigpause never returns.
(In reply to Florian Weimer from comment #18) > This signal is delivered because sigpause unblocks the signal > number. Due to the non-atomic nature of sigpause, the waiting begins after > the delivery, but then the delivered signal is gone, so that sigpause never > returns. Ah! So the problem is sigpause(2). I didn't realize that. My bad. I modified the test program to use modern functions (e.g, sigsuspend(2) instead of sigpause(2)) and it worked.
(In reply to Steven Emmerson from comment #19) > (In reply to Florian Weimer from comment #18) > > This signal is delivered because sigpause unblocks the signal > > number. Due to the non-atomic nature of sigpause, the waiting begins after > > the delivery, but then the delivered signal is gone, so that sigpause never > > returns. > > Ah! So the problem is sigpause(2). I didn't realize that. My bad. Oh. Yes, it's definitely not a problem in fork. > I modified the test program to use modern functions (e.g, sigsuspend(2) > instead of sigpause(2)) and it worked. Are you sure? It seemed to me that it is just harder to trigger the race.
(In reply to Florian Weimer from comment #20) > (In reply to Steven Emmerson from comment #19) > > (In reply to Florian Weimer from comment #18) > > > This signal is delivered because sigpause unblocks the signal > > > number. Due to the non-atomic nature of sigpause, the waiting begins after > > > the delivery, but then the delivered signal is gone, so that sigpause never > > > returns. > > > > Ah! So the problem is sigpause(2). I didn't realize that. My bad. > > Oh. Yes, it's definitely not a problem in fork. > > > I modified the test program to use modern functions (e.g, sigsuspend(2) > > instead of sigpause(2)) and it worked. > > Are you sure? It seemed to me that it is just harder to trigger the race. Yes. So I just confirmed this with a discussion with other colleagues. In a single-threaded process the process will enter the kernel and in an uninterruptable fashion it will unblock the signal, and then wait. Signals in the kernel are only ever handled as the task is returning from a syscall. During the return to userspace the kernel checks for a signal to deliver and then calls into do_signal() which eventually calls into the per-arch signal delivery mechanism. Therefore an external process (parent) sending a signal to the child cannot interrupt the childs unmask+suspend. Only the child itself returning from a syscall can be called upon to handle a signal for itself. Therefore the kernel is atomically doing unmask+suspend, and so sigsuspend() which is a direct syscall should work to solve the problem. I'm sorry I didn't come up with this earlier. It took a while to reach out to some kernel people I knew to confirm this.
Should we retitle this to "Clarify atomicity semantics for sigpause in glibc?" Since it seems the kernel is atomic?
(In reply to Carlos O'Donell from comment #21) > In a single-threaded process the process will enter the kernel and in an > uninterruptable fashion it will unblock the signal, and then wait. Ah. I still see a hang, though. But the reason for that is that the original reproducer and my variant use sigset. sigset unblocks the signal, which is why it is never blocked in the subprocess, so it does not matter if the glibc function has a race or not because the race is inherent to the reproducer. Well, it helped us to find a problem in the sigpause implementation. (In reply to Carlos O'Donell from comment #22) > Should we retitle this to "Clarify atomicity semantics for sigpause in > glibc?" Since it seems the kernel is atomic? I think we can just remove the sigprocmask calls from the sigpause implementation and call sigsuspend directly in the sigpause implementation. I think sigsuspend already does what we are trying to achieve with sigprocmask. We can make this change upstream and close this bug as NOTABUG, I think.
(In reply to Florian Weimer from comment #20) > Are you sure? It seemed to me that it is just harder to trigger the race. I tested it a few thousand times and it always worked. Here's the code: // This file tests the atomicity of sigpause(2) #include <stdio.h> #include <stdlib.h> #include <signal.h> static void handleSigTerm(int sig) { puts("handleSigTerm() called"); // Commenting-out made no difference } #define USE_OBSOLETE_FUNCS 0 int main() { #if USE_OBSOLETE_FUNCS sighold(SIGTERM); #else sigset_t mask; sigemptyset(&mask); sigaddset(&mask, SIGTERM); sigprocmask(SIG_BLOCK, &mask, NULL); #endif #if USE_OBSOLETE_FUNCS sigset(SIGTERM, handleSigTerm); #else struct sigaction sigact = {}; sigact.sa_handler = handleSigTerm; sigaction(SIGTERM, &sigact, NULL); #endif pid_t forkPid = fork(); if (forkPid == 0) { // Child process #if USE_OBSOLETE_FUNCS sigpause(SIGTERM); #else sigdelset(&mask, SIGTERM); sigsuspend(&mask); #endif exit(0); } #if USE_OBSOLETE_FUNCS //usleep(100000); // Corrects non-atomicity of sigpause() #endif kill(forkPid, SIGTERM); wait(NULL); // Hangs here if usleep() commented-out return 0; }
(In reply to Florian Weimer from comment #23) > I think we can just remove the sigprocmask calls from the sigpause > implementation and call sigsuspend directly in the sigpause implementation. > I think sigsuspend already does what we are trying to achieve with > sigprocmask. We can make this change upstream and close this bug as > NOTABUG, I think. In the "Rationale" section of sigpause(2), the POSIX standard does say that sigpause(2) can be replaced with sigsuspend(2) -- so it would seem that the two functions should either work the same or that the glibc documentation for sigpause(2) should indicate that it's not conforming or, at least, doesn't work the same as sigsuspend(2).
(In reply to Florian Weimer from comment #23) > (In reply to Carlos O'Donell from comment #21) > > > In a single-threaded process the process will enter the kernel and in an > > uninterruptable fashion it will unblock the signal, and then wait. > > Ah. I still see a hang, though. But the reason for that is that the > original reproducer and my variant use sigset. sigset unblocks the signal, > which is why it is never blocked in the subprocess, so it does not matter if > the glibc function has a race or not because the race is inherent to the > reproducer. Well, it helped us to find a problem in the sigpause > implementation. Good catch. I missed that. Yes, sigset unblocks the signal. > (In reply to Carlos O'Donell from comment #22) > > Should we retitle this to "Clarify atomicity semantics for sigpause in > > glibc?" Since it seems the kernel is atomic? > > I think we can just remove the sigprocmask calls from the sigpause > implementation and call sigsuspend directly in the sigpause implementation. > I think sigsuspend already does what we are trying to achieve with > sigprocmask. We can make this change upstream and close this bug as > NOTABUG, I think. I don't know that we can. The old System V interface for sigpause() (__xpg_sigpause), which conflicts with the BSD definition of sigpause(), sets is_sig == 1, and has the semantics that it permanently (as opposed to temporarily while you wait for the signal) changes the signal mask. The only solution would be a sigpause kernel syscall that does what the old standard requires? I'm not sure that would ever be supported by upstream kernel. I think the net result here is going to be: * Document in POSIX that sigsuspend requires atomicity. * Document in glibc manual that sigpause does not atomically change the signal mask and then wait, and that because of that you may need to send multiple signals to the child. I'm not happy to change the semantics of sigpause() at this point given that existing software may just be working around this issue and expect the permanent process signal mask change. Thoughts?
(In reply to Steven Emmerson from comment #25) > (In reply to Florian Weimer from comment #23) > > I think we can just remove the sigprocmask calls from the sigpause > > implementation and call sigsuspend directly in the sigpause implementation. > > I think sigsuspend already does what we are trying to achieve with > > sigprocmask. We can make this change upstream and close this bug as > > NOTABUG, I think. > > In the "Rationale" section of sigpause(2), the POSIX standard does say that > sigpause(2) can be replaced with sigsuspend(2) -- so it would seem that the > two functions should either work the same or that the glibc documentation > for sigpause(2) should indicate that it's not conforming or, at least, > doesn't work the same as sigsuspend(2). My reading of the standards is that sigpause() unmaks the signal in question, and leaves it unmasked. Contrast that with sigsuspend() which only unmasks it for the duration of the pause, and the handling of the signal, and then restores the signal mask.
I think that the issues is that POSIX changed sigpause here in issue 5: ~~~ Issue 5 Moved from X/OPEN UNIX extension to BASE. The DESCRIPTION is updated to indicate that the sigpause() function restores the signal mask of the process to its original state before returning. ~~~ Where it moved into BASE, and the text was updated to say it restores the original state before returning. The legacy glibc XPG implementation does this instead from the original SUSv2 specification: http://pubs.opengroup.org/onlinepubs/7908799/xsh/sigpause.html So I think that you just have to be careful what version of sigpause() you request when building your code.
In summary: - The standard isn't clear if sigpause and sigsuspend act atomically on the signal mask change and suspend operations or not. A clarification was filed: http://austingroupbugs.net/view.php?id=1201 - The workaround is to use sigsuspend with a mask, since this is implemented in the kernel directly it can atomically unmask the signal and wait without a race. - The legacy XPG/SUSv2 implementation of sigpause in glibc, which takes a signal as the argument, is not atomic with respect to the unmask and suspend, and so it does permit signals to arrive before the suspend, and that is apparently OK by the standards as they are written today. Workarounds include loop-and-retry patterns as explained in comment #7, or _exit from the signal handler. - One solution is to implement sigpause in the kernel, but it's not likely there will be traction for fixing a legacy and obsolete API. Does that clarify everything?
Red Hat Enterprise Linux 7 is entering Maintenance Phase Support 1 this year and only Urgent priority bug fixes will be considered. This issue is not urgent and we are moving this to Red Hat Enterprise Linux 8 for further consideration. Note that the Austin Group clarified the wording: http://austingroupbugs.net/view.php?id=1201 It is now required that the operations are carried out atomically: ~~~ The sigsuspend( ) function shall atomically both replace the current signal mask of the calling thread with the set of signals pointed to by sigmask and suspend the thread until delivery of a signal whose action is either to execute a signal-catching function or to terminate the process. ~~~ The sigpause( ) function shall atomically both remove sig from the signal mask of the calling process and suspend the calling process until a signal is delivered whose action is either to execute a signal-catching function or to terminate the process. ~~~ Which means we should fix this upstream.
We are tracking this upstream now. I have assigned to the upstream bug to myself and will try to drive this towards a conclusion in a reasonable timeframe.