1300076 – Docker command crashes after journald is stopped

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1300076 - Docker command crashes after journald is stopped

Summary: Docker command crashes after journald is stopped

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	docker
Sub Component:
Version:	7.2
Hardware:	Unspecified
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	rc
Target Release:	---
Assignee:	Daniel Walsh
QA Contact:	atomic-bugs@redhat.com
Docs Contact:	Yoana Ruseva
URL:
Whiteboard:
Depends On:
Blocks:	1186913 1203710
TreeView+	depends on / blocked

Reported:	2016-01-19 22:06 UTC by Jhon Honce
Modified:	2019-10-10 10:56 UTC (History)
CC List:	24 users (show)
Fixed In Version:	docker-1.9.1-17.el7
Doc Type:	Bug Fix
Doc Text:	Cause: Docker constantly writes to stdout ad stderr inside of its unit file. The systemd-journald writes this stdout and stderr content to the journal. If the journal is restarted, this results in the docker daemon getting a SIGPIPE. Consequence: When docker daemon receives a SIGPIPE it shuts down stopping all containers. Fix: We have added a helper program that will capture STDOUT and STDERR from docker and forward the content to the journal. This helper program can handle the SIGPIPE better then docker daemon. Result: If the systemd-journal is restarted the docker daemon will no longer shutdown stopping all containers.
Clone Of:
Clones:	1318361 (view as bug list)
Environment:
Last Closed:	2016-03-31 23:23:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:0536	0	normal	SHIPPED_LIVE	docker bug fix and enhancement update	2016-04-01 03:19:56 UTC

Description Jhon Honce 2016-01-19 22:06:19 UTC

Description of problem:
docker commands crash after journald is stopped.

Version-Release number of selected component (if applicable):
Docker 1.8.2-10
Docker 1.9.1-11

How reproducible:
always

Steps to Reproduce:
1. systemctl stop systemd-journald
2. systemctl stop systemd-journald.socket
3. Repeat, docker ps until crash is seen
4.

Actual results:
After some (yet to be determined) time
# docker ps
Cannot connect to the Docker daemon. Is the docker daemon running on this host?

Expected results:
command runs without error after journald is stopped

Additional info:

Comment 3 Alex Jia 2016-01-20 04:00:48 UTC

(In reply to Jhon Honce from comment #0)

> How reproducible:
> always
> 
> Steps to Reproduce:
> 1. systemctl stop systemd-journald
> 2. systemctl stop systemd-journald.socket
> 3. Repeat, docker ps until crash is seen

It is very weird, I can't reproduce it per above steps on docker-1.9.1-11.el7.x86_64, the following is some details.

# for i in {1..1000}; do systemctl stop systemd-journald; systemctl stop systemd-journald.socket; docker ps; done

And wait for some time, to run docker ps again, the docker daemon is running w/o crashing.

Jhon, any important steps I missed?

Comment 4 Alex Jia 2016-01-20 07:47:08 UTC

(In reply to Alex Jia from comment #3)
> (In reply to Jhon Honce from comment #0)

> And wait for some time, to run docker ps again, the docker daemon is running
> w/o crashing.
> 
> Jhon, any important steps I missed?

Well, sometimes, I can reproduce the bug after running for loop in shell and waiting for sevaral minutes, I will try to find a simple approach to reproduce it each time.

Comment 5 Daniel Walsh 2016-01-20 14:21:22 UTC

Any reason for this being a real world problem?  I am figuring something is being written to syslog and failing because the process listening on /dev/log goes away.

But is this something worth even looking to fix?

Comment 6 Daniel Walsh 2016-01-21 14:26:49 UTC

alex, can you try with a process constantly pinging docker at the same time.  I believe the issue must be related to docker communicating with the journal, and I believe the only way this happens in via syslog.  

Making the docker daemon active, and doing your systemctl restart journald could cause it.

Do you have any logs from docker when it crashes?

Comment 7 Alex Jia 2016-01-22 08:25:20 UTC

(In reply to Daniel Walsh from comment #6)
> alex, can you try with a process constantly pinging docker at the same time.
> I believe the issue must be related to docker communicating with the
> journal, and I believe the only way this happens in via syslog.  
> 

I can always reproduce the bug based on the following steps
[cloud-user@atomic-00 ~]$ sudo systemctl stop systemd-journald.socket
[cloud-user@atomic-00 ~]$ for i in {1..200}; do sudo docker ps & done

> Making the docker daemon active, and doing your systemctl restart journald
> could cause it.
> 
> Do you have any logs from docker when it crashes?

But I can't find any useful information from systemctl status docker.

Comment 8 Daniel Walsh 2016-01-22 13:52:31 UTC

Do you have journald setup as your default logger?

systemctl status docker | grep journald
           └─1038 /usr/bin/docker daemon --selinux-enabled --exec-opt native.cgroupdriver=systemd --log-driver=journald --storage-opt dm.basesize=

Comment 9 Jhon Honce 2016-01-22 16:56:12 UTC

Setting the following

# /etc/systemd/system/systemd-journald.service.d/local.conf
[Service]
WatchdogSec=10min

could provide a workaround for docker being signaled during log roll overs on slow storage devices.  The 10min would be tuned for the size of the logs vs. the speed of the storage devices.

This does nothing for the root issue of docker daemon crashing when journald is shutdown or crashes.

Comment 10 Jhon Honce 2016-01-22 17:41:56 UTC

Yes, I added the following to my setup:

# grep OPTIONS /etc/sysconfig/docker
OPTIONS='--selinux-enabled --log-driver=journald'

Comment 15 Daniel Walsh 2016-01-28 23:00:13 UTC

This looks like you are not using the new journald back end for docker.  I guess for some reason systemd is killing the docker daemon?  I added systemd maintainers to the list to see if they knew why this is happening?  If the journald service is killed, does systemd take down individual services?

Comment 16 Daniel Walsh 2016-01-29 17:46:07 UTC

Looks like we are hitting this problem

https://bugs.freedesktop.org/show_bug.cgi?id=84923

Comment 31 Daniel Walsh 2016-02-08 13:55:50 UTC

So bottom line are we blaming this on journald or on golang?  If journald is restarted, is systemd supposed to reconnect stdout/stderr sockets of services in unit files, or does all of the output just go to /dev/null with the services ignoring sigpipe?  

If we can get golang to ignore sigpipe, is that good enough?

Comment 32 Colin Walters 2016-02-08 14:57:25 UTC

golang shouldn't rely on SIGPIPE I would say...except then I found:

https://github.com/golang/go/blob/master/src/os/signal/doc.go

Basically it's documented that writing to fd 0,1,2 may raise SIGPIPE.  I guess it's trying to be an in-between of default SIGPIPE (broken for pretty much any nontrivial program) and masked SIGPIPE (your app needs to properly propagate errors immediately to exit)

And it looks like the journal is FD 1.  So...since we're not using the docker daemon in a Unix shell command pipeline, let's change Docker to just entirely mask SIGPIPE.

Comment 33 Jhon Honce 2016-02-08 16:19:41 UTC

From go-1.5.1/src/os/file_unix.go:66, Go is hard coded to catch 10 EPIPE/SIGPIPE 's from a fd and then die.  Note the use of the private sigpipe() function which by-passes the signal.Notify() mechanism. 

func epipecheck(file *File, e error) {
       if e == syscall.EPIPE {
              if atomic.AddInt32(&file.nepipe, 1) >= 10 {
                     sigpipe()
              }
       } else {
              atomic.StoreInt32(&file.nepipe, 0)
       }
}

Comment 34 Daniel Walsh 2016-02-08 18:03:32 UTC

Vincent any thoughts on this?  We should probably bring this up for discussion on docker, to tell them about this problem.

Comment 35 Daniel Walsh 2016-02-08 18:04:50 UTC

Or should we change docker to write all its messages to a log driver rather then to stdout/stderr (Not sure if writes to stderr).

Comment 36 Daniel Walsh 2016-02-08 18:06:46 UTC

Moving this bug to golang, since this does not seem easily fixable with this sigpipe handling, I have no idea why they are doing this.  Would be nice to know some history here.

Comment 37 Daniel Walsh 2016-02-08 19:38:01 UTC

Docker has had this issue a couple of times.

Interesting links.

https://github.com/docker/docker/issues/19728
https://github.com/docker/docker/issues/7087
https://github.com/golang/go/issues/11845

Comment 43 Nalin Dahyabhai 2016-02-12 16:59:50 UTC

(In reply to Daniel Walsh from comment #35)
> Or should we change docker to write all its messages to a log driver rather
> then to stdout/stderr (Not sure if writes to stderr).

By default, messages are logged to stderr.

Comment 49 Jhon Honce 2016-03-04 16:52:37 UTC

See: 
- https://github.com/fedora-cloud/docker-rpm/pull/2
- https://github.com/projectatomic/forward-journald

Comment 51 Luwen Su 2016-03-18 07:12:05 UTC

In docker-1.9.1-23.el7.x86_64 and golang-1.4.2-9.el7.x86_64

, after kill systemd-journald the docker daemon is stiall alive,

move to verified

Comment 53 errata-xmlrpc 2016-03-31 23:23:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0536.html

Comment 54 Jan Chaloupka 2016-06-15 13:19:38 UTC

*** Bug 1300056 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.

agoldste
ajia
amurdaca
cww
dornelas
dwalsh
harald
jamills
jhonce
jkrieger
jyundt
lfriedma
lnykryn
lpoetter
lsm5
lsu
mmcgrath
mpatel
nalin
pweil
rrajaram
systemd-maint
vbatts
walters