Red Hat Bugzilla – Bug 120960
Helper threads not blocking (intercepting) signals.
Last modified: 2007-11-30 17:10:40 EST
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/124 (KHTML, like Gecko) Safari/125.1
Description of problem:
It seems that POSIX timers and other librt.so calls create helper threads when they are invoked. These helper threads appear to be inheriting the sigprocmask at the time they are created, but do not block enough of the UNIX signals to prevent signal handlers from running under their context. This is just a speculation, here are the symptoms:
1.) My code receives a SIGIO when a piece of share memory is altered. There is a handler for the signal that sets a flag, and the application relies on sigsuspend to wait for that handler to have executed. The signals are of course blocked using sigprocmask. There are several signals the process receives in this fashion. When two signals arrive very close to each other, their handlers fire as expected, setting their flags, but sigsuspend isn't interrupted, so we don't break out to read and act on which ever flag was set.
2.) Since the signal handler was firing (observed), I decided to remove the handler and use SIGIOs default signal handler (crash the app). Now, instead of setting the flag in the handler and waking up sigsuspend, I dequeue signals with sigwaitinfo, read the signal number that was dequeued and process. In this way, there are never any signal handlers being executed, only signals being dequed. Since the signals are blocked, the default action for the signal should never fire. This works great for a while, until two signals (SIGIO, and SIGUSR1) came in very close to each other. At that point, the app crashed with a message telling me it had an unhandled SIGIO (basically it's default handler did fire, perhaps in another thread?)
3.) In a separate piece of code exhibiting the same strangeness, I actually put some debug in a signal handler (same sigsuspend semantics as before) to print out the thread id of where the signal handler was running. I did notice that some signals appeared to run occasionally in a different tid, and during that situation, the sigsuspend in the main thread (only thread from my perspective) didn't wake. Keep in mind I never called pthread_create, but I did call timer_create etc...
4.) If I do a sigprocmask to block the signal right as my first line in main(), it seems to work fine for case 2, the sigwaitinfo case... the signals are dequeued, and the handler never fires in the other thread. This tweak doesn't help sigsuspend though, its still flakey... sometimes it doesn't wake after the handler tells me it fired.
5.) If I do the sigprocmask to block SIGIO after I've done something like create and run a timer, I see the problem.
I haven't seen the librt.so code, but this is my best guess, it might be a heck of a lot more complicated....
Also, these tests have been run primarily on SMP machines (2 processors).
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Write software the receives signals from two sources (one being an internal timer), and uses sigsuspend to wait for the handler to fire. Perhaps the handler could print out a time to the screen, and then after sigsuspend wakes, print the time again.
2. You'll notice that occasionally, the handler will fire, but not wake sigsuspend.
3. Repeat the experiment using a sigwaitinfo approach with blocked signals. Make sure the handler for the signal will crash the app to let you know it's doing the wrong thing. That handle should never fire, but it will occasionally.
Expected Results: Over time you should observe the problem in your test code.
timers and message queue helpers should be blocking signals in FC2
AIO helper still needs changing.
AIO helper has been changed in glibc-2.3.3-31.