Hello, I would like to use bkr job-watch like this: stdbuf -o0 -e0 bkr job-watch J:1679860 | grep -m 1 total --> Queued: 1 [total: 1] => At this point I expect bkr job-watch to terminate but it continues to run (you have to choose Beaker job which is running/queued to observe this bug). grep command terminates as soon as it finds word total. When bkr job-watch tries to write to pipe it will get SIGPIPE signal and it should terminate because it's writing to the pipe with no one to read it. From the documentation: =========================================================================== The SIGPIPE signal is sent to a process when it attempts to write to a pipe without a process connected to the other end. =========================================================================== It appears to be general problem with python 2. Please see http://www.chiark.greenend.org.uk/~cjwatson/blog/python-sigpipe.html for details and for the workaround. Could you please fix bkr job-watch to terminate when it gets SIGPIPE signal? Thanks a lot Jirka
You can use pkill -e -SIGPIPE bkr to explicitly test how bkr job-watch handles SIGPIPE signal.
Yeah. By default Python ignores SIGPIPE [1]. I think the command just needs to catch socket errors in this case. https://docs.python.org/2/library/signal.html
(In reply to Jiri Hladky from comment #0) > I would like to use bkr job-watch like this: > > stdbuf -o0 -e0 bkr job-watch J:1679860 | grep -m 1 total > --> Queued: 1 [total: 1] > => At this point I expect bkr job-watch to terminate but it continues to run > (you have to choose Beaker job which is running/queued to observe this bug). I don't think this will ever work the way you want. The problem is that SIGPIPE is not actually sent when the second half of the pipe goes away. It's only sent when the first half tries to write() to it. That's the same write() which also return EPIPE and causes an IOError in Python. https://stackoverflow.com/a/9337925/120202 Incidentally bkr already doesn't have any proper handling for the Broken pipe exceptions, which is why you see this behaviour when the EPIPE happens: $ stdbuf -o0 -e0 bkr job-watch J:13001 | grep -m 1 Watching Watching tasks (this may be safely interrupted)... Traceback (most recent call last): File "src/bkr/client/main.py", line 135, in <module> sys.exit(main()) File "src/bkr/client/main.py", line 106, in main return cmd.run(*cmd_args, **cmd_opts.__dict__) File "/home/dcallagh/work/beaker/Client/src/bkr/client/commands/cmd_job_watch.py", line 93, in run sys.exit(watch_tasks(self.hub, args)) File "/home/dcallagh/work/beaker/Client/src/bkr/client/task_watcher.py", line 43, in watch_tasks changed |= watcher.update(task) File "/home/dcallagh/work/beaker/Client/src/bkr/client/task_watcher.py", line 109, in update print "%s: %s" % (task, task.display_state(task.task_info)) IOError: [Errno 32] Broken pipe But in your scenario, the reason bkr job-watch keeps sitting there waiting is not because it's ignoring SIGPIPE. It's because it doesn't write anything after that line. To confirm, I tried out this trivial patch below. When I pipe various bkr commands to true and grep -m1, I get a clean exit instead of a Broken pipe traceback (which is nice) but I *don't* get the early exit behaviour you are expecting. --- a/Client/src/bkr/client/main.py +++ b/Client/src/bkr/client/main.py @@ -8,6 +8,7 @@ import os import sys +import signal import pkg_resources import logging from optparse import Option, IndentedHelpFormatter, SUPPRESS_HELP @@ -70,7 +71,11 @@ def warn_on_version_mismatch(response): 'but server version is %s\n' % (__version__, server_version)) +def sigpipe_handler(signum, frame): + sys.exit(0) + def main(): + signal.signal(signal.SIGPIPE, sigpipe_handler) log_to_stream(sys.stderr, level=logging.WARNING) global conf
Hi Dan, thanks a lot for looking into it. You are right I cannot accomplish what I want with SIGPIPE. The solution is to use the named pipes and monitor the reader process (grep) - see below. I think that it would be still nice to trap SIGPIPE to get a clean exit instead of a Broken pipe traceback. Thanks! Jirka Solution using named pipes: ======================================================== #!/bin/bash pipe=/tmp/testpipe trap "rm -f $pipe" INT TERM EXIT if [[ ! -p $pipe ]]; then mkfifo $pipe fi grep -m 1 total <$pipe & PID_READER=$! #stdbuf -oL python -c "import time; print 'total'; time.sleep(10); print 'b'" >$pipe & stdbuf -o0 -e0 bkr job-watch J:2122678 >$pipe & while true; do if ps -p $PID_READER > /dev/null ; then echo "Reader is running" sleep 2 else echo "Reader has stopped" exit fi done ======================================================== ./named_pipes.sh Reader is running Reader is running --> Running: 1 [total: 1] Reader has stopped
Okay, makes sense. Updating this bz to be just about exiting cleanly in the EPIPE case.
This is the naive approach: https://gerrit.beaker-project.org/5915 although now that I think about it, it introduces some weird edge cases. For example, bkr job-submit will submit multiple jobs and if any of them fails, it prints the exception to stderr, followed by "Submitted jobs..." to stdout, and then exits with status 1. But with this patch, if the write to stdout gets EPIPE, the exit status will be 0 instead. So that is not ideal. Not sure how likely that is to be a real issue though.
Hi Dan, thanks for working on it! Regarding the exit status: the default exit status is 141 (128+13, SIGPIPE has value 13). $seq 1 10000 | head -1 1 $echo ${PIPESTATUS[@]} 141 0 So I think that +def sigpipe_handler(signum, frame): + sys.exit(0) is not right. What about using SIG_DFL? ================================================ from signal import signal, SIGPIPE, SIG_DFL signal(SIGPIPE,SIG_DFL) ================================================ $cat /tmp/a.py from signal import signal, SIGPIPE, SIG_DFL signal(SIGPIPE,SIG_DFL) while True: print 'a' $python /tmp/a.py | head -1 $echo ${PIPESTATUS[@]} 141 0 Jirka
I think the problem with the default signal handler (the whole reason Python disables it in favour of handling EPIPE) is that it skips normal Python-level stack unwinding and exception handlers etc. It's the same reason Python handles SIGINT by converting it to KeyboardInterrupt. $ cat a.py import os, sys from signal import signal, SIGPIPE, SIG_DFL signal(SIGPIPE,SIG_DFL) f = open('dontleakme', 'w') try: print 'a' print 'a' print 'a' finally: f.close() os.unlink('dontleakme') $ PYTHONUNBUFFERED=1 python a.py a a a $ ls dontleakme ls: cannot access 'dontleakme': No such file or directory $ PYTHONUNBUFFERED=1 python a.py | true $ ls dontleakme dontleakme There is *probably* no important error handling being done anywhere inside the bkr client (just closing open sockets and so forth -- which is not important) but I wouldn't want to just assume that always holds true.
Hi Dan, yes, this is the valid point. So I think the commit you have posted is probably the best way how to handle it: def output(message): """ Prints a line to stdout. Handles EPIPE by exiting. """ try: sys.stdout.write(message + '\n') except IOError as e: if e.errno == errno.EPIPE: sys.exit(0) else: raise I would just propose to use exit status "141". Thanks Jirka
Hmm yeah that makes sense. I mean, exiting 141 (= -SIGPIPE) when we weren't really killed by SIGPIPE is a bit unusual. But seems like there is no harm in it. And it sidesteps my concerns above about masking other errors, in corner cases like submitting multiple jobs where one failed. Because the exit status will always be non-zero anyway.
Well, it's not a perfect solution but I think that if we catch the errno.EPIPE exception it's reasonable to set exit status to 141.
If we're going to exit non-zero, it makes the patch simpler too. In the original version, I was careful to only handle EPIPE when writing to stdout (as opposed to, some other socket or file). Since we wouldn't want to mask some kind of network error. But if we are exiting non-zero anyway, then we can just handle the exception at the topmost level as we do for some other exception types already. https://gerrit.beaker-project.org/5940
Great! Sounds good to me!
This bug fix is included in beaker-client-25.0-0.git.202.2f767547f for testing available here: https://beaker-project.org/nightlies/develop/
I can reproduce the fix. With Beaker client 24.5: stdbuf -o0 -e0 bkr job-watch J:109 | grep -m1 total --> New: 1 [total: 1] Traceback (most recent call last): [...] IOError: [Errno 32] Broken pipe exit code: 0 as above. With a nightly Beaker client however (203.08fe7cd7a) I get: stdbuf -o0 -e0 bkr job-watch J:107 | grep -m1 total --> Installing: 1 [total: 1] and exit code is: 141 write(1, "J:107 : Installing -> Cancelled", 31) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=16548, si_uid=20845} --- rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x33bb60f7e0}, {0x33bbb033f0, [], SA_RESTORER, 0x33bb60f7e0}, 8) = 0 exit_group(141) = ? +++ exited with 141 +++ The only tricky thing here is, that it always has to happen in unbuffered mode, either by using stdbuf or setting PYTHONUNBUFFERED=1. I the user uses the buffered mode by default, which I suspect everyone does, we get a different result: bkr job-watch J:106 | grep total close failed in file object destructor: Error in sys.excepthook: Original exception was: Terminated and an strace shows: write(1, "Watching tasks (this may be safe"..., 212) = -1 EPIPE (Broken pipe) --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=16442, si_uid=20845} --- rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x33bb60f7e0}, {0x33bbb033f0, [], SA_RESTORER, 0x33bb60f7e0}, 8) = 0 write(2, "close failed in file object dest"..., 40) = 40 write(2, "Error in sys.excepthook:\n", 25) = 25 write(2, "\nOriginal exception was:\n", 25) = 25 exit_group(1) = ? +++ exited with 1 +++
Note to self: remember to set -o pipefail in bash, if using the reproducer with grep -m1. I spent the last 10 minutes wondering why I was always getting exit status 0. It was because grep exits 0 in that case, and the exit status of bkr is ignored.
https://gerrit.beaker-project.org/5994 ... although it feels a bit like we are fighting a losing battle. Python really doesn't want to play nice like a traditional text-handling UNIX command. Ideally some library like Click would be handling all this mess for us.
Verified: Using beaker-client 24.5-1.fc26 and running: stdbuf -o0 -e0 bkr job-watch J:13987 | grep -m1 total I get an error as follows: IOError: [Errno 32] Broken pipe Using beaker-client 25.0-0.git.216.e147ce0c2.fc26 (ensuring I have done a `set -o pipefail`), I get an exit status of 141.
Beaker 25.0 has been released. Release notes are available upstream: https://beaker-project.org/docs/whats-new/release-25.html