Bug 1096269

Summary: lost signals when sending lots of signals using --sig-proxy to docker
Product: Red Hat Enterprise Linux 7 Reporter: Lukáš Doktor <ldoktor>
Component: dockerAssignee: Matthew Heon <mheon>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.0CC: admiller, bsarathy, cevich, dwalsh, golang-updates, jkeck, jrieden, mattdm, mgoldman, ohadlevy, vbatts
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: docker-1.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1087700 Environment:
autotest-docker:docker_cli/kill
Last Closed: 2014-09-18 20:45:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1087697, 1087700    
Bug Blocks: 1109938    

Description Lukáš Doktor 2014-05-09 14:32:47 UTC
+++ This bug was initially created as a clone of Bug #1087700 +++

Description of problem:
When I send lots of signals to the running docker with --sig-proxy (actual kill signals, not `docker kill`), most of them got lost.

Version-Release number of selected component (if applicable):
docker-0.10.0-8.el7.x86_64
docker-io-0.9.1-1.fc21.x86_64


How reproducible:
always

Steps to Reproduce:
1. /usr/bin/docker -D run --tty=false --rm -i --name test_eoly localhost:5000/ldoktor/fedora:latest bash -c 'for NUM in `seq 1 64`; do trap "echo Received $NUM, ignoring..." $NUM; done; while :; do sleep 1; done'
2. ps ax |grep docker
3. for AAA in `seq 1 32`; do [ $AAA -ne 9 ] && [ $AAA -ne 20 ] && [ $AAA -ne 19 ] && kill -s $AAA $PID; done

Actual results:
Output of the docker is:
Received 1, ignoring...
Received 2, ignoring...


Expected results:
Messages for all of the `Received $NUM, ignoring...` printed (order doesn't matter)

Additional info:
Skipping 9, 19, 20 as they are a bit too special..

--- Additional comment from Lukas Doktor on 2014-05-05 04:10:09 EDT ---

The same results with upstream docker dc9c28f/0.10.0:

Output:
Received 1, ignoring...
[debug] stdcopy.go:111 framesize: 24
Received 2, ignoring...

Daemon output:
2014/05/05 10:08:45 POST /v1.10/containers/b01a849cb45ebe94c3a61fa021a5464186345d5b159faee4ea9d5da39fb36de5/kill?signal=HUP
[/home/medic/Work/Projekty/Docker/root|fa3816b6] +job kill(b01a849cb45ebe94c3a61fa021a5464186345d5b159faee4ea9d5da39fb36de5, HUP)
[/home/medic/Work/Projekty/Docker/root|fa3816b6] -job kill(b01a849cb45ebe94c3a61fa021a5464186345d5b159faee4ea9d5da39fb36de5, HUP) = OK (0)
2014/05/05 10:08:45 POST /v1.10/containers/b01a849cb45ebe94c3a61fa021a5464186345d5b159faee4ea9d5da39fb36de5/kill?signal=INT
[/home/medic/Work/Projekty/Docker/root|fa3816b6] +job kill(b01a849cb45ebe94c3a61fa021a5464186345d5b159faee4ea9d5da39fb36de5, INT)
[/home/medic/Work/Projekty/Docker/root|fa3816b6] -job kill(b01a849cb45ebe94c3a61fa021a5464186345d5b159faee4ea9d5da39fb36de5, INT) = OK (0)

Comment 3 Daniel Walsh 2014-05-19 20:01:52 UTC
I added a sleep of 1 second before sending the signal and I got.

# /usr/bin/docker run --sig-proxy --rm --tty=false -i fedora bash -c 'for NUM in `seq 1 64`; do trap "echo Received $NUM, ignoring..." $NUM; done; while :; do sleep 1; done'
Received 1, ignoring...
Received 2, ignoring...
Received 3, ignoring...
Received 4, ignoring...
Received 5, ignoring...
Received 6, ignoring...
Received 7, ignoring...
Received 8, ignoring...
Received 10, ignoring...
Received 11, ignoring...
Received 12, ignoring...
Received 13, ignoring...
Received 14, ignoring...
Received 15, ignoring...
Received 16, ignoring...
Received 21, ignoring...
Received 22, ignoring...
Received 23, ignoring...
Received 24, ignoring...
Received 25, ignoring...
Received 26, ignoring...
Received 28, ignoring...
Received 29, ignoring...
Received 30, ignoring...
Received 31, ignoring...

for AAA in `seq 1 32`; do [ $AAA -ne 9 ] && [ $AAA -ne 20 ] && [ $AAA -ne 19 ] && sleep 1 && echo $AAA && kill -s $AAA 2041; done

Comment 4 Daniel Walsh 2014-05-19 20:03:41 UTC
These are missing

#define	SIGCHLD		17	/* Child status has changed (POSIX).  */
#define	SIGCONT		18	/* Continue (POSIX).  */

Comment 5 Daniel Walsh 2014-05-19 20:05:46 UTC
#define	SIGPROF		27	/* Profiling alarm clock (4.2 BSD).  */
Also missing.

Comment 6 Daniel Walsh 2014-05-19 20:12:27 UTC
bash -c 'for NUM in `seq 1 64`; do trap "echo Received $NUM, ignoring..." $NUM; done; while :; do sleep 1; done'Received 1, ignoring...
Received 2, ignoring...
Received 3, ignoring...
Received 4, ignoring...
Received 5, ignoring...
Received 6, ignoring...
Received 7, ignoring...
Received 8, ignoring...
Received 10, ignoring...
Received 11, ignoring...
Received 12, ignoring...
Received 13, ignoring...
Received 14, ignoring...
Received 15, ignoring...
Received 16, ignoring...
Received 18, ignoring...
Received 21, ignoring...
Received 22, ignoring...
Received 23, ignoring...
Received 24, ignoring...
Received 25, ignoring...
Received 26, ignoring...
Received 27, ignoring...
Received 28, ignoring...
Received 29, ignoring...
Received 30, ignoring...
Received 31, ignoring...
Unknown signal 32

Running test against bash shows missing 17 

Why 18 and 27 don't show I have no idea.

Comment 7 Matthew Heon 2014-06-18 15:21:54 UTC
I've identified the root cause of this. Docker uses a buffer to store incoming signals before sending them to the (https://github.com/dotcloud/docker/blob/master/api/client/commands.go#L538). This buffer is, in current versions of Docker, size 1 - multiple signals arriving near-simultaneously will overwrite one another. I've submitted a pull request to increase the size of the buffer (https://github.com/dotcloud/docker/pull/6508).

Comment 8 Daniel Walsh 2014-06-23 12:39:11 UTC
Is this fixed in docker-1.0 for RHEL7?

Comment 9 Matthew Heon 2014-06-23 12:40:05 UTC
No, patch is not in docker-1.0

Comment 10 Daniel Walsh 2014-06-23 19:26:11 UTC
Ok lets get it in.

Comment 11 Matthew Heon 2014-06-25 14:12:51 UTC
Patch is in our builds of docker-1.0

Comment 12 Lukáš Doktor 2014-07-18 07:59:31 UTC
I'm sorry to report but the problem persists on docker-1.0.0-10.el7.x86_64:

output without additional sleep:
[debug] stdcopy.go:111 framesize: 48
Received 1, ignoring...
Received 8, ignoring...

output with sleep 0.1:
[debug] stdcopy.go:111 framesize: 24
Received 1, ignoring...
[debug] stdcopy.go:111 framesize: 218
Received 3, ignoring...
Received 4, ignoring...
Received 5, ignoring...
Received 6, ignoring...
Received 7, ignoring...
Received 8, ignoring...
Received 10, ignoring...
Received 11, ignoring...
Received 2, ignoring...
[debug] stdcopy.go:111 framesize: 125
Received 12, ignoring...
Received 13, ignoring...
Received 14, ignoring...
Received 15, ignoring...
Received 16, ignoring...
[debug] stdcopy.go:111 framesize: 225
Received 21, ignoring...
Received 22, ignoring...
Received 23, ignoring...
Received 24, ignoring...
Received 25, ignoring...
Received 26, ignoring...
Received 28, ignoring...
Received 29, ignoring...
Received 30, ignoring...
[debug] stdcopy.go:111 framesize: 25
Received 31, ignoring...

The received signals numbers differs with runs, but number of received signals is between 1-4.

Comment 14 Lukáš Doktor 2014-07-21 07:22:53 UTC
OK, today I tried the docker-1.1.1-1.el7.x86_64 and it works perfectly. Thanks

Comment 16 errata-xmlrpc 2014-09-18 20:45:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1266.html