Bug 1122463

Summary: docker attach exits with 2/148/150 when killed in a loop with safe signals
Product: Red Hat Enterprise Linux 7 Reporter: Lukáš Doktor <ldoktor>
Component: dockerAssignee: Matthew Heon <mheon>
Status: CLOSED CANTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: low    
Version: 7.0CC: cevich, dwalsh, jamills, ssekidde
Target Milestone: rcKeywords: Extras
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-21 03:28:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Lukáš Doktor 2014-07-23 10:27:14 UTC
Description of problem:
When you kill container with various safe signals in a loop in parallel, the attach process (or run -i) exits with exit code 2 (or 148 or 150). Container remains running. I guess it's not able to resend the signals and block the incoming ones at the same time.

Version-Release number of selected component (if applicable):
docker-io-1.0.0-7.fc20.x86_64

How reproducible:
Always (with different exit codes)

Steps to Reproduce:
1. TERM1: docker run -i -t fedora bash -c 'for NUM in `seq 1 64`; do trap "echo Received $NUM, ignoring..." $NUM; done; while :; do sleep 1; done'
2. TERM2: touch /var/tmp/docker_kill_stress
3. TERM2: for AAA in `seq 1 31`; do [ $AAA -eq 9 ] || [ $AAA -eq 17 ] || [ $AAA -eq 19 ] && continue; { while [ -e /var/tmp/docker_kill_stress ]; do kill -$AAA $CONTAINER_PROCESS_PID > /dev/null || echo "Sender $AAA failed"; done } & done
4. wait for failure
5. TERM2: rm /var/tmp/docker_kill_stress

Actual results:
in TERM1 the process exits, check with echo $? that it wasn't gratuitous. 

Expected results:
The attached process should survive and print the list of received signals.

Additional info:
There is similar test which uses `docker kill` which works fine.

Comment 2 Matthew Heon 2014-07-24 14:26:10 UTC
I can reproduce this locally on docker-1.1.1-2 and a build of upstream's git master.

Given that 'docker kill' works fine, this seems to be a --sig-proxy issue. It's expected that, in scenarios with a very large amount of signals, some will be lost, but an outright crash is definitely a bug. I'll investigate further.

Comment 3 Matthew Heon 2014-07-24 15:13:04 UTC
After further testing, I cannot reproduce this on docker-1.1.1-3 on RHEL7. 

My earlier success was actually due to the -t flag in your reproduction command. We do not at present support signal proxying with -t (patch for this is still waiting to be accepted upstream), so the signals were acting directly on the Docker client and not being proxied.

After removing -t, I cannot reproduce a client crash based off a large number of signals - I'm assuming that docker-1.1.1-3, which has the signal buffering patch, fixed the issue.

Comment 4 Lukáš Doktor 2014-08-11 08:14:33 UTC
OK, I'm sorry about the -t flag. You are right that this failure won't occur without it.

Anyway with the same reproducer the results are still not as good as expected:
1) execute the reproducer
2) wait a while (10s on smp4 with docker-1.1.2-9.el7.x86_64) and notice, that no new signals are printed out
3) stop the parallel stresser
4) `kill -2 $PID` => signal is not listed
5) `docker kill -s 2` => signal arrives and is listed in the output
6) use ctrl+c => no signal is received nor listed in the output

(you can use any signal to test the failure)

Comment 5 Daniel Walsh 2014-09-12 19:19:27 UTC
Could you try with docker-1.2?

Comment 6 Lukáš Doktor 2014-09-29 08:39:16 UTC
I tried that on docker-1.2.0-19.el7.x86_64 with the same results. The `docker attach` stops receiving signals after a while. When I send them using `docker kill` they arrive and are shown in the `docker attach` process correctly.