Bug 1122463 - docker attach exits with 2/148/150 when killed in a loop with safe signals
Summary: docker attach exits with 2/148/150 when killed in a loop with safe signals
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker
Version: 7.0
Hardware: Unspecified
OS: Unspecified
low
unspecified
Target Milestone: rc
: ---
Assignee: Matthew Heon
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-07-23 10:27 UTC by Lukas Doktor
Modified: 2019-03-06 01:03 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-08-21 03:28:55 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 1473513 None None None Never

Description Lukas Doktor 2014-07-23 10:27:14 UTC
Description of problem:
When you kill container with various safe signals in a loop in parallel, the attach process (or run -i) exits with exit code 2 (or 148 or 150). Container remains running. I guess it's not able to resend the signals and block the incoming ones at the same time.

Version-Release number of selected component (if applicable):
docker-io-1.0.0-7.fc20.x86_64

How reproducible:
Always (with different exit codes)

Steps to Reproduce:
1. TERM1: docker run -i -t fedora bash -c 'for NUM in `seq 1 64`; do trap "echo Received $NUM, ignoring..." $NUM; done; while :; do sleep 1; done'
2. TERM2: touch /var/tmp/docker_kill_stress
3. TERM2: for AAA in `seq 1 31`; do [ $AAA -eq 9 ] || [ $AAA -eq 17 ] || [ $AAA -eq 19 ] && continue; { while [ -e /var/tmp/docker_kill_stress ]; do kill -$AAA $CONTAINER_PROCESS_PID > /dev/null || echo "Sender $AAA failed"; done } & done
4. wait for failure
5. TERM2: rm /var/tmp/docker_kill_stress

Actual results:
in TERM1 the process exits, check with echo $? that it wasn't gratuitous. 

Expected results:
The attached process should survive and print the list of received signals.

Additional info:
There is similar test which uses `docker kill` which works fine.

Comment 2 Matthew Heon 2014-07-24 14:26:10 UTC
I can reproduce this locally on docker-1.1.1-2 and a build of upstream's git master.

Given that 'docker kill' works fine, this seems to be a --sig-proxy issue. It's expected that, in scenarios with a very large amount of signals, some will be lost, but an outright crash is definitely a bug. I'll investigate further.

Comment 3 Matthew Heon 2014-07-24 15:13:04 UTC
After further testing, I cannot reproduce this on docker-1.1.1-3 on RHEL7. 

My earlier success was actually due to the -t flag in your reproduction command. We do not at present support signal proxying with -t (patch for this is still waiting to be accepted upstream), so the signals were acting directly on the Docker client and not being proxied.

After removing -t, I cannot reproduce a client crash based off a large number of signals - I'm assuming that docker-1.1.1-3, which has the signal buffering patch, fixed the issue.

Comment 4 Lukas Doktor 2014-08-11 08:14:33 UTC
OK, I'm sorry about the -t flag. You are right that this failure won't occur without it.

Anyway with the same reproducer the results are still not as good as expected:
1) execute the reproducer
2) wait a while (10s on smp4 with docker-1.1.2-9.el7.x86_64) and notice, that no new signals are printed out
3) stop the parallel stresser
4) `kill -2 $PID` => signal is not listed
5) `docker kill -s 2` => signal arrives and is listed in the output
6) use ctrl+c => no signal is received nor listed in the output

(you can use any signal to test the failure)

Comment 5 Daniel Walsh 2014-09-12 19:19:27 UTC
Could you try with docker-1.2?

Comment 6 Lukas Doktor 2014-09-29 08:39:16 UTC
I tried that on docker-1.2.0-19.el7.x86_64 with the same results. The `docker attach` stops receiving signals after a while. When I send them using `docker kill` they arrive and are shown in the `docker attach` process correctly.


Note You need to log in before you can comment on or make changes to this bug.