By "doesn't work", you mean that specifically the task gets stuck and eventually hits the local watchdog when it's run understand restraint, whereas under beah the task completes in a matter of seconds. It's an interesting example. The task appears to just run this pipeline a bunch of times: : [ 08:57:25 ] :: [ BEGIN ] :: Running 'tr -cd '[:lower:]' < /dev/urandom | fold -w33 | head -n1' yuqthjhzpqapfnpsxnxvjqjqemlkqqoei :: [ 08:57:25 ] :: [ PASS ] :: Command 'tr -cd '[:lower:]' < /dev/urandom | fold -w33 | head -n1' (Expected 0, got 0) But when restraint runs it, it gets stuck on the very first invocation for 2 hours 4 minutes: :: [ 08:47:53 ] :: [ BEGIN ] :: Running 'tr -cd '[:lower:]' < /dev/urandom | fold -w33 | head -n1' zoiuzafevzskpgapbtdkyjltswqwuyaig The ps-lwd.log file from restraint gives some clue why it's getting stuck: [...] 0 S root 1620 1 0 80 0 - 8877 - 08:47 ? 00:00:00 make run 0 S root 1632 1620 0 80 0 - 6681 - 08:47 ? 00:00:00 /bin/bash ./runtest.sh 0 R root 1732 1632 98 80 0 - 1480 - 08:47 ? 02:37:58 tr -cd [:lower:] 0 R root 1733 1632 78 80 0 - 1475 - 08:47 ? 02:04:50 fold -w33 From the TIME column you can see that the tr process has been burning 100% CPU for 2 hours 40 minutes. Presumably reading nonstop from /dev/urandom and feeding it to fold. But importantly, the final head process from the pipeline is missing from the ps output. It means head did exit straight away after reading one line, as it should. Then the earlier processes in the pipeline (tr and fold) *should* have received SIGPIPE or EPIPE and also exited. Clearly that is what happens under beah, and is what happens if you run the same pipeline on your workstation. But under restraint, the tr and fold processes don't realise the end of the pipe has gone away and so they sit there endlessly writing random bytes to nowhere. There must be something unusual in the way restraint is executing 'make run' that is causing the usual SIGPIPE/EPIPE mechanism not to work properly.
This is caused by glib2. When a GSocket is created, the SIGPIPE signal is ignored. https://developer.gnome.org/gio/stable/GSocket.html > Note that creating a GSocket causes the signal SIGPIPE to be ignored for the remainder of the program The fix here is to set SIGPIPE back to its default action in the forked child process.
Nice catch. I'm surprised that the GLib stuff we are using to execute subprocesses is not smart enough to reset signal handlers for us, since it is GLib messing with the signal handlers in the first place. Are there any other process characteristics we need to worry about resetting in the subprocesses? Do we have CLOEXEC set on all file descriptors? Is there anything else inherited that might be a problem? (I didn't even know that the signal handler table is inherited so I would never have even guessed this.)
(In reply to Dan Callaghan from comment #3) > Nice catch. I'm surprised that the GLib stuff we are using to execute > subprocesses is not smart enough to reset signal handlers for us. Restraint doesn't actually use any of glib's process spawning routines. Restraint does an old fashioned fork() and exec(). > Are there any other process characteristics we need to worry about resetting > in the subprocesses? It looks like there is. Restraint sets signal handlers for SIGINT and SIGTERM. We should set those back to default. > Do we have CLOEXEC set on all file descriptors? Is there anything else inherited that might be a problem? CLOEXEC is set in one place it seems. I'd have to check in a little more detail to make sure that covers everything.
(In reply to Matt Tyson from comment #4) > (In reply to Dan Callaghan from comment #3) > > Nice catch. I'm surprised that the GLib stuff we are using to execute > > subprocesses is not smart enough to reset signal handlers for us. > > Restraint doesn't actually use any of glib's process spawning routines. > Restraint does an old fashioned fork() and exec(). Should we perhaps?
(In reply to Roman Joost from comment #5) > > Restraint doesn't actually use any of glib's process spawning routines. > > Restraint does an old fashioned fork() and exec(). > > Should we perhaps? https://developer.gnome.org/glib/stable/glib-Spawning-Processes.html glib's process spawning routines seem to have some limitations on what can be done in the child process config routine. We do some setup between the fork and exec. I'd have to check to see if we can use the glib routines or not, and if they even solve the problem of SIGPIPE being changed.
Ah right, I think I remember now... we needed some custom code to handle setting up a pty for the task subprocess.
I've added a routine that sets every signal handler back to its default value. This should prevent tests exhibiting strange behaviour if they trip signals that restraint has handlers for.
This patch is available in builds > restraint-0.1.35-1.git.4.*, which are now available in beaker-devel.
This patch seems to have fixed the aforementioned bug. I re-ran the task mentioned in the #c1 and it worked as expected. Test can be seen here: https://beaker-devel.app.eng.bos.redhat.com/jobs/14740 The test was also performed with an older version of restraint, and it failed as it did for omejzlik. Test can be seen here: https://beaker-devel.app.eng.bos.redhat.com/jobs/14729
(In reply to Jacob McKenzie from comment #10) > This patch seems to have fixed the aforementioned bug. The bug seems to be fixed by mtyson's patch...
I have run into the same issue with this command: (while :; do echo ""; done ) | sensors-detect Further debugging has shown that Bash with restraint harness ignores SIGPIPE signal: trap -- '' SIGPIPE I have tried to work around the issue by canceling the trap trap - SIGPIPE; (while :; do echo ""; done ) | sensors-detect but it didn't help. I will try https://beaker-devel.app.eng.bos.redhat.com to verify the fix. Thanks Jirka
The fix works for me:-) https://beaker-devel.app.eng.bos.redhat.com/recipes/25253#task163205
*** Bug 1623391 has been marked as a duplicate of this bug. ***
Restraint 0.1.36 was released on 24 August.