Bug 1300076 - Docker command crashes after journald is stopped
Docker command crashes after journald is stopped
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker (Show other bugs)
7.2
Unspecified Linux
high Severity high
: rc
: ---
Assigned To: Daniel Walsh
atomic-bugs@redhat.com
Yoana Ruseva
: Extras
Depends On:
Blocks: 1186913 1203710
  Show dependency treegraph
 
Reported: 2016-01-19 17:06 EST by Jhon Honce
Modified: 2016-06-15 09:19 EDT (History)
24 users (show)

See Also:
Fixed In Version: docker-1.9.1-17.el7
Doc Type: Bug Fix
Doc Text:
Cause: Docker constantly writes to stdout ad stderr inside of its unit file. The systemd-journald writes this stdout and stderr content to the journal. If the journal is restarted, this results in the docker daemon getting a SIGPIPE. Consequence: When docker daemon receives a SIGPIPE it shuts down stopping all containers. Fix: We have added a helper program that will capture STDOUT and STDERR from docker and forward the content to the journal. This helper program can handle the SIGPIPE better then docker daemon. Result: If the systemd-journal is restarted the docker daemon will no longer shutdown stopping all containers.
Story Points: ---
Clone Of:
: 1318361 (view as bug list)
Environment:
Last Closed: 2016-03-31 19:23:18 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:0536 normal SHIPPED_LIVE docker bug fix and enhancement update 2016-03-31 23:19:56 EDT

  None (edit)
Description Jhon Honce 2016-01-19 17:06:19 EST
Description of problem:
docker commands crash after journald is stopped.

Version-Release number of selected component (if applicable):
Docker 1.8.2-10
Docker 1.9.1-11

How reproducible:
always

Steps to Reproduce:
1. systemctl stop systemd-journald
2. systemctl stop systemd-journald.socket
3. Repeat, docker ps until crash is seen
4.

Actual results:
After some (yet to be determined) time
# docker ps
Cannot connect to the Docker daemon. Is the docker daemon running on this host?

Expected results:
command runs without error after journald is stopped

Additional info:
Comment 3 Alex Jia 2016-01-19 23:00:48 EST
(In reply to Jhon Honce from comment #0)

> How reproducible:
> always
> 
> Steps to Reproduce:
> 1. systemctl stop systemd-journald
> 2. systemctl stop systemd-journald.socket
> 3. Repeat, docker ps until crash is seen

It is very weird, I can't reproduce it per above steps on docker-1.9.1-11.el7.x86_64, the following is some details.

# for i in {1..1000}; do systemctl stop systemd-journald; systemctl stop systemd-journald.socket; docker ps; done

And wait for some time, to run docker ps again, the docker daemon is running w/o crashing.

Jhon, any important steps I missed?
Comment 4 Alex Jia 2016-01-20 02:47:08 EST
(In reply to Alex Jia from comment #3)
> (In reply to Jhon Honce from comment #0)

> And wait for some time, to run docker ps again, the docker daemon is running
> w/o crashing.
> 
> Jhon, any important steps I missed?

Well, sometimes, I can reproduce the bug after running for loop in shell and waiting for sevaral minutes, I will try to find a simple approach to reproduce it each time.
Comment 5 Daniel Walsh 2016-01-20 09:21:22 EST
Any reason for this being a real world problem?  I am figuring something is being written to syslog and failing because the process listening on /dev/log goes away.

But is this something worth even looking to fix?
Comment 6 Daniel Walsh 2016-01-21 09:26:49 EST
alex, can you try with a process constantly pinging docker at the same time.  I believe the issue must be related to docker communicating with the journal, and I believe the only way this happens in via syslog.  

Making the docker daemon active, and doing your systemctl restart journald could cause it.

Do you have any logs from docker when it crashes?
Comment 7 Alex Jia 2016-01-22 03:25:20 EST
(In reply to Daniel Walsh from comment #6)
> alex, can you try with a process constantly pinging docker at the same time.
> I believe the issue must be related to docker communicating with the
> journal, and I believe the only way this happens in via syslog.  
> 

I can always reproduce the bug based on the following steps
[cloud-user@atomic-00 ~]$ sudo systemctl stop systemd-journald.socket
[cloud-user@atomic-00 ~]$ for i in {1..200}; do sudo docker ps & done

> Making the docker daemon active, and doing your systemctl restart journald
> could cause it.
> 
> Do you have any logs from docker when it crashes?

But I can't find any useful information from systemctl status docker.
Comment 8 Daniel Walsh 2016-01-22 08:52:31 EST
Do you have journald setup as your default logger?

systemctl status docker | grep journald
           └─1038 /usr/bin/docker daemon --selinux-enabled --exec-opt native.cgroupdriver=systemd --log-driver=journald --storage-opt dm.basesize=
Comment 9 Jhon Honce 2016-01-22 11:56:12 EST
Setting the following

# /etc/systemd/system/systemd-journald.service.d/local.conf
[Service]
WatchdogSec=10min

could provide a workaround for docker being signaled during log roll overs on slow storage devices.  The 10min would be tuned for the size of the logs vs. the speed of the storage devices.

This does nothing for the root issue of docker daemon crashing when journald is shutdown or crashes.
Comment 10 Jhon Honce 2016-01-22 12:41:56 EST
Yes, I added the following to my setup:

# grep OPTIONS /etc/sysconfig/docker
OPTIONS='--selinux-enabled --log-driver=journald'
Comment 15 Daniel Walsh 2016-01-28 18:00:13 EST
This looks like you are not using the new journald back end for docker.  I guess for some reason systemd is killing the docker daemon?  I added systemd maintainers to the list to see if they knew why this is happening?  If the journald service is killed, does systemd take down individual services?
Comment 16 Daniel Walsh 2016-01-29 12:46:07 EST
Looks like we are hitting this problem

https://bugs.freedesktop.org/show_bug.cgi?id=84923
Comment 31 Daniel Walsh 2016-02-08 08:55:50 EST
So bottom line are we blaming this on journald or on golang?  If journald is restarted, is systemd supposed to reconnect stdout/stderr sockets of services in unit files, or does all of the output just go to /dev/null with the services ignoring sigpipe?  

If we can get golang to ignore sigpipe, is that good enough?
Comment 32 Colin Walters 2016-02-08 09:57:25 EST
golang shouldn't rely on SIGPIPE I would say...except then I found:

https://github.com/golang/go/blob/master/src/os/signal/doc.go

Basically it's documented that writing to fd 0,1,2 may raise SIGPIPE.  I guess it's trying to be an in-between of default SIGPIPE (broken for pretty much any nontrivial program) and masked SIGPIPE (your app needs to properly propagate errors immediately to exit)

And it looks like the journal is FD 1.  So...since we're not using the docker daemon in a Unix shell command pipeline, let's change Docker to just entirely mask SIGPIPE.
Comment 33 Jhon Honce 2016-02-08 11:19:41 EST
From go-1.5.1/src/os/file_unix.go:66, Go is hard coded to catch 10 EPIPE/SIGPIPE 's from a fd and then die.  Note the use of the private sigpipe() function which by-passes the signal.Notify() mechanism. 

func epipecheck(file *File, e error) {
       if e == syscall.EPIPE {
              if atomic.AddInt32(&file.nepipe, 1) >= 10 {
                     sigpipe()
              }
       } else {
              atomic.StoreInt32(&file.nepipe, 0)
       }
}
Comment 34 Daniel Walsh 2016-02-08 13:03:32 EST
Vincent any thoughts on this?  We should probably bring this up for discussion on docker, to tell them about this problem.
Comment 35 Daniel Walsh 2016-02-08 13:04:50 EST
Or should we change docker to write all its messages to a log driver rather then to stdout/stderr (Not sure if writes to stderr).
Comment 36 Daniel Walsh 2016-02-08 13:06:46 EST
Moving this bug to golang, since this does not seem easily fixable with this sigpipe handling, I have no idea why they are doing this.  Would be nice to know some history here.
Comment 37 Daniel Walsh 2016-02-08 14:38:01 EST
Docker has had this issue a couple of times.

Interesting links.

https://github.com/docker/docker/issues/19728
https://github.com/docker/docker/issues/7087
https://github.com/golang/go/issues/11845
Comment 43 Nalin Dahyabhai 2016-02-12 11:59:50 EST
(In reply to Daniel Walsh from comment #35)
> Or should we change docker to write all its messages to a log driver rather
> then to stdout/stderr (Not sure if writes to stderr).

By default, messages are logged to stderr.
Comment 51 Luwen Su 2016-03-18 03:12:05 EDT
In docker-1.9.1-23.el7.x86_64 and golang-1.4.2-9.el7.x86_64

, after kill systemd-journald the docker daemon is stiall alive,

move to verified
Comment 53 errata-xmlrpc 2016-03-31 19:23:18 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0536.html
Comment 54 Jan Chaloupka 2016-06-15 09:19:38 EDT
*** Bug 1300056 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.