1415104 – bkr commands should exit cleanly when stdout is a pipe and the pipe is closed

Bug 1415104 - bkr commands should exit cleanly when stdout is a pipe and the pipe is closed

Summary: bkr commands should exit cleanly when stdout is a pipe and the pipe is closed

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Beaker
Classification:	Retired
Component:	command line
Sub Component:
Version:	develop
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	25.0
Assignee:	Dan Callaghan
QA Contact:	Anwesha Chatterjee
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-01-20 09:50 UTC by Jiri Hladky
Modified:	2018-03-19 04:18 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2018-03-19 04:18:45 UTC
Embargoed:

Attachments	(Terms of Use)

Description Jiri Hladky 2017-01-20 09:50:26 UTC

Hello,

I would like to use bkr job-watch like this:

stdbuf -o0 -e0 bkr job-watch J:1679860 | grep -m 1 total
--> Queued: 1 [total: 1]
=> At this point I expect bkr job-watch to terminate but it continues to run (you have to choose Beaker job which is running/queued to observe this bug).

grep command terminates as soon as it finds word total.  When bkr job-watch tries to write to pipe it will get SIGPIPE signal and it should terminate because it's writing to the pipe with no one to read it.

From the documentation:
===========================================================================
The SIGPIPE signal is sent to a process when it attempts to write to a pipe without a process connected to the other end.
===========================================================================

It appears to be general problem with python 2. Please see 

http://www.chiark.greenend.org.uk/~cjwatson/blog/python-sigpipe.html

for details and for the workaround. Could you please fix bkr job-watch to terminate when it gets SIGPIPE signal?

Thanks a lot
Jirka

Comment 1 Jiri Hladky 2017-01-20 09:59:03 UTC

You can use 

pkill -e -SIGPIPE bkr

to explicitly test how bkr job-watch handles SIGPIPE signal.

Comment 2 Jiri Hladky 2017-01-23 09:00:20 UTC

Yeah. By default Python ignores SIGPIPE [1]. I think the command just needs to catch socket errors in this case.

https://docs.python.org/2/library/signal.html

Comment 3 Dan Callaghan 2017-11-03 07:03:37 UTC

(In reply to Jiri Hladky from comment #0)
> I would like to use bkr job-watch like this:
> 
> stdbuf -o0 -e0 bkr job-watch J:1679860 | grep -m 1 total
> --> Queued: 1 [total: 1]
> => At this point I expect bkr job-watch to terminate but it continues to run
> (you have to choose Beaker job which is running/queued to observe this bug).

I don't think this will ever work the way you want.

The problem is that SIGPIPE is not actually sent when the second half of the pipe goes away. It's only sent when the first half tries to write() to it. That's the same write() which also return EPIPE and causes an IOError in Python.

https://stackoverflow.com/a/9337925/120202

Incidentally bkr already doesn't have any proper handling for the Broken pipe exceptions, which is why you see this behaviour when the EPIPE happens:

$ stdbuf -o0 -e0 bkr job-watch J:13001 | grep -m 1 Watching
Watching tasks (this may be safely interrupted)...
Traceback (most recent call last):
  File "src/bkr/client/main.py", line 135, in <module>
    sys.exit(main())
  File "src/bkr/client/main.py", line 106, in main
    return cmd.run(*cmd_args, **cmd_opts.__dict__)
  File "/home/dcallagh/work/beaker/Client/src/bkr/client/commands/cmd_job_watch.py", line 93, in run
    sys.exit(watch_tasks(self.hub, args))
  File "/home/dcallagh/work/beaker/Client/src/bkr/client/task_watcher.py", line 43, in watch_tasks
    changed |= watcher.update(task)
  File "/home/dcallagh/work/beaker/Client/src/bkr/client/task_watcher.py", line 109, in update
    print "%s: %s" % (task, task.display_state(task.task_info))
IOError: [Errno 32] Broken pipe

But in your scenario, the reason bkr job-watch keeps sitting there waiting is not because it's ignoring SIGPIPE. It's because it doesn't write anything after that line.

To confirm, I tried out this trivial patch below. When I pipe various bkr commands to true and grep -m1, I get a clean exit instead of a Broken pipe traceback (which is nice) but I *don't* get the early exit behaviour you are expecting.

--- a/Client/src/bkr/client/main.py
+++ b/Client/src/bkr/client/main.py
@@ -8,6 +8,7 @@
 
 import os
 import sys
+import signal
 import pkg_resources
 import logging
 from optparse import Option, IndentedHelpFormatter, SUPPRESS_HELP
@@ -70,7 +71,11 @@ def warn_on_version_mismatch(response):
                     'but server version is %s\n'
                     % (__version__, server_version))
 
+def sigpipe_handler(signum, frame):
+    sys.exit(0)
+
 def main():
+    signal.signal(signal.SIGPIPE, sigpipe_handler)
     log_to_stream(sys.stderr, level=logging.WARNING)
 
     global conf

Comment 4 Jiri Hladky 2017-11-03 08:56:11 UTC

Hi Dan,

thanks a lot for looking into it. You are right I cannot accomplish what I want with SIGPIPE. The solution is to use the named pipes and monitor the reader process (grep) - see below. 

I think that it would be still nice to trap SIGPIPE to get a clean exit instead of a Broken pipe traceback.

Thanks!
Jirka

Solution using named pipes:

========================================================
#!/bin/bash

pipe=/tmp/testpipe

trap "rm -f $pipe" INT TERM EXIT

if [[ ! -p $pipe ]]; then
    mkfifo $pipe
fi

grep -m 1 total <$pipe &
PID_READER=$!

#stdbuf -oL python -c "import time; print 'total'; time.sleep(10); print 'b'" >$pipe  &
stdbuf -o0 -e0 bkr job-watch J:2122678 >$pipe  &

while true; do
  if ps -p $PID_READER > /dev/null ; then
    echo "Reader is running"
    sleep 2
  else
    echo "Reader has stopped"
    exit
  fi
done
========================================================

./named_pipes.sh
Reader is running
Reader is running
--> Running: 1 [total: 1]
Reader has stopped

Comment 5 Dan Callaghan 2017-11-06 07:13:41 UTC

Okay, makes sense. Updating this bz to be just about exiting cleanly in the EPIPE case.

Comment 6 Dan Callaghan 2017-11-06 07:51:28 UTC

This is the naive approach:

https://gerrit.beaker-project.org/5915

although now that I think about it, it introduces some weird edge cases. For example, bkr job-submit will submit multiple jobs and if any of them fails, it prints the exception to stderr, followed by "Submitted jobs..." to stdout, and then exits with status 1. But with this patch, if the write to stdout gets EPIPE, the exit status will be 0 instead. So that is not ideal. Not sure how likely that is to be a real issue though.

Comment 7 Jiri Hladky 2017-11-06 10:07:37 UTC

Hi Dan,

thanks for working on it!

Regarding the exit status: the default exit status is 141 (128+13, SIGPIPE has value 13). 

$seq 1 10000 | head -1
1
$echo ${PIPESTATUS[@]}
141 0

So I think that 

+def sigpipe_handler(signum, frame):
+    sys.exit(0)

is not right. What about using SIG_DFL?

================================================
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
================================================

$cat /tmp/a.py
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
while True: 
    print 'a'

$python /tmp/a.py | head -1
$echo ${PIPESTATUS[@]}
141 0

Jirka

Comment 8 Dan Callaghan 2017-11-07 00:46:31 UTC

I think the problem with the default signal handler (the whole reason Python disables it in favour of handling EPIPE) is that it skips normal Python-level stack unwinding and exception handlers etc. It's the same reason Python handles SIGINT by converting it to KeyboardInterrupt.

$ cat a.py 
import os, sys
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
f = open('dontleakme', 'w')
try:
    print 'a'
    print 'a'
    print 'a'
finally:
    f.close()
    os.unlink('dontleakme')
$ PYTHONUNBUFFERED=1 python a.py 
a
a
a
$ ls dontleakme
ls: cannot access 'dontleakme': No such file or directory
$ PYTHONUNBUFFERED=1 python a.py | true
$ ls dontleakme 
dontleakme

There is *probably* no important error handling being done anywhere inside the bkr client (just closing open sockets and so forth -- which is not important) but I wouldn't want to just assume that always holds true.

Comment 9 Jiri Hladky 2017-11-07 07:56:08 UTC

Hi Dan,

yes, this is the valid point. So I think the commit you have posted is probably the best way how to handle it:


def output(message):
    """
    Prints a line to stdout. Handles EPIPE by exiting.
    """
    try:
        sys.stdout.write(message + '\n')
    except IOError as e:
        if e.errno == errno.EPIPE:
            sys.exit(0)
        else:
            raise

I would just propose to use exit status "141". 

Thanks
Jirka

Comment 10 Dan Callaghan 2017-11-08 04:27:26 UTC

Hmm yeah that makes sense. I mean, exiting 141 (= -SIGPIPE) when we weren't really killed by SIGPIPE is a bit unusual. But seems like there is no harm in it.

And it sidesteps my concerns above about masking other errors, in corner cases like submitting multiple jobs where one failed. Because the exit status will always be non-zero anyway.

Comment 11 Jiri Hladky 2017-11-08 14:00:49 UTC

Well, it's not a perfect solution but I think that if we catch the errno.EPIPE exception it's reasonable to set exit status to 141.

Comment 12 Dan Callaghan 2017-12-05 00:46:19 UTC

If we're going to exit non-zero, it makes the patch simpler too. In the original version, I was careful to only handle EPIPE when writing to stdout (as opposed to, some other socket or file). Since we wouldn't want to mask some kind of network error. But if we are exiting non-zero anyway, then we can just handle the exception at the topmost level as we do for some other exception types already.

https://gerrit.beaker-project.org/5940

Comment 13 Jiri Hladky 2017-12-05 10:57:33 UTC

Great! Sounds good to me!

Comment 14 Roman Joost 2018-02-07 06:57:27 UTC

This bug fix is included in beaker-client-25.0-0.git.202.2f767547f for testing available here:

https://beaker-project.org/nightlies/develop/

Comment 15 Roman Joost 2018-02-13 00:59:21 UTC

I can reproduce the fix.

With Beaker client 24.5:

stdbuf -o0 -e0 bkr job-watch J:109 | grep -m1 total
--> New: 1 [total: 1]
Traceback (most recent call last):
[...]
IOError: [Errno 32] Broken pipe

exit code: 0

as above. With a nightly Beaker client however (203.08fe7cd7a) I get:

stdbuf -o0 -e0 bkr job-watch J:107 | grep -m1 total
--> Installing: 1 [total: 1]

and exit code is: 141

write(1, "J:107 : Installing -> Cancelled", 31) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=16548, si_uid=20845} ---
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x33bb60f7e0}, {0x33bbb033f0, [], SA_RESTORER, 0x33bb60f7e0}, 8) = 0
exit_group(141)                         = ?
+++ exited with 141 +++


The only tricky thing here is, that it always has to happen in unbuffered mode, either by using stdbuf or setting PYTHONUNBUFFERED=1. I the user uses the buffered mode by default, which I suspect everyone does, we get a different result:

bkr job-watch J:106 | grep total
close failed in file object destructor:
Error in sys.excepthook:

Original exception was:
Terminated

and an strace shows:

write(1, "Watching tasks (this may be safe"..., 212) = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=16442, si_uid=20845} ---
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x33bb60f7e0}, {0x33bbb033f0, [], SA_RESTORER, 0x33bb60f7e0}, 8) = 0
write(2, "close failed in file object dest"..., 40) = 40
write(2, "Error in sys.excepthook:\n", 25) = 25
write(2, "\nOriginal exception was:\n", 25) = 25
exit_group(1)                           = ?
+++ exited with 1 +++

Comment 16 Dan Callaghan 2018-02-13 02:47:29 UTC

Note to self: remember to set -o pipefail in bash, if using the reproducer with grep -m1.

I spent the last 10 minutes wondering why I was always getting exit status 0. It was because grep exits 0 in that case, and the exit status of bkr is ignored.

Comment 17 Dan Callaghan 2018-02-13 03:38:40 UTC

https://gerrit.beaker-project.org/5994

... although it feels a bit like we are fighting a losing battle. Python really doesn't want to play nice like a traditional text-handling UNIX command.

Ideally some library like Click would be handling all this mess for us.

Comment 19 Anwesha Chatterjee 2018-03-02 05:48:57 UTC

Verified:

Using beaker-client 24.5-1.fc26 and running:
stdbuf -o0 -e0 bkr job-watch J:13987 | grep -m1 total

I get an error as follows:



IOError: [Errno 32] Broken pipe

Using beaker-client 25.0-0.git.216.e147ce0c2.fc26 (ensuring I have done a `set -o  pipefail`), I get an exit status of 141.

Comment 20 Roman Joost 2018-03-19 04:18:45 UTC

Beaker 25.0 has been released.

Release notes are available upstream: https://beaker-project.org/docs/whats-new/release-25.html

Note You need to log in before you can comment on or make changes to this bug.