Bug 1378929

Summary: [3.3.0.32] Restarting systemd-journald causes master controllers to die
Product: OpenShift Container Platform Reporter: Mike Fiedler <mifiedle>
Component: InstallerAssignee: Devan Goodwin <dgoodwin>
Status: CLOSED ERRATA QA Contact: Mike Fiedler <mifiedle>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.3.0CC: aos-bugs, bleanhar, decarr, dgoodwin, erich, gpei, jdetiber, jeder, jhonce, jialiu, jokerman, knakayam, lpoetter, mifiedle, mmccomas, msekleta, rhowe, tkimura, tstclair, wmeng
Target Milestone: ---   
Target Release: 3.3.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-15 19:09:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Logs from fresh recreate at Fri Sep 23 10:25:29 EDT 2016 none

Description Mike Fiedler 2016-09-23 14:17:08 UTC
Description of problem:

On 3.3.0.32 on a multi-master cluster, restarting systemd-journald on all nodes causes the master controllers to die.    The systemd units stop and the processes exit.   Logs show no panics or errors and there are no cores on the system.

We were restarting systemd-journald to pick up a fix for:  https://access.redhat.com/solutions/2117311

Version-Release number of selected component (if applicable):  3.3.0.32


How reproducible: always


Config:

All on AWS us-west-2b
3 combined master/etcd
1 master load balancer
2 infra nodes (router/registry)
5 application nodes

Steps to Reproduce:
1. Verify cluster is running 
2. In a loop, ssh to all nodes and systemctl restart systemd-journald
3. Watch master controller processes on all masters

Actual results:

All master controller processes stop

Expected results:

Cluster carries on normally

Additional info:

Contact @mffiedler for cluster access

Comment 1 Mike Fiedler 2016-09-23 14:31:26 UTC
Created attachment 1204192 [details]
Logs from fresh recreate at Fri Sep 23 10:25:29 EDT 2016

Added logs from all masters for fresh recreate.   Restart occurred at Fri Sep 23 10:25:29 EDT 2016



There are several recreates in there.   The first encounter was Sep 22 around 19:30.   At Sep 22 19:37 you can see the cluster go into a really bad state as all the lists started failing and nodes realized the cluster was hosed.

Comment 2 Mike Fiedler 2016-09-23 14:34:05 UTC
systemd was updated earlier.   The problem occurs on every restart of systemd, not just for the original update.

Comment 3 Mike Fiedler 2016-09-23 15:01:02 UTC
root@ip-172-31-51-247: ~ # systemctl status atomic-openshift-master-controllers                                                                                                                                                                                     [1/1860]
● atomic-openshift-master-controllers.service - Atomic OpenShift Master Controllers
   Loaded: loaded (/usr/lib/systemd/system/atomic-openshift-master-controllers.service; enabled; vendor preset: disabled)
   Active: inactive (dead) since Fri 2016-09-23 10:56:21 EDT; 26s ago
     Docs: https://github.com/openshift/origin
  Process: 94375 ExecStart=/usr/bin/openshift start master controllers --config=${CONFIG_FILE} $OPTIONS (code=killed, signal=PIPE)
 Main PID: 94375 (code=killed, signal=PIPE)

Comment 4 Mike Fiedler 2016-09-23 15:17:07 UTC
Systemd is 219-30.el7 

To be clear,  even though systemd-journald was restarted everywhere, this problem can be reproduced by just restarting it on the combined master/etcd.

Comment 5 Timothy St. Clair 2016-09-23 16:42:48 UTC
So logs don't show the window which appears @10:56 for a Controllers event.?.? and no OOM-ing either.  It looks like the api-sever is still executing after the event. 

$ grep Started messages.master1 | grep systemd
Sep 23 10:20:26 ip-172-31-51-245 systemd: Started Atomic OpenShift Master Controllers.
Sep 23 10:25:28 ip-172-31-51-245 systemd: Started Session 224 of user root.
Sep 23 10:25:28 ip-172-31-51-245 systemd: Started Flush Journal to Persistent Storage.
Sep 23 10:26:28 ip-172-31-51-245 systemd: Started Atomic OpenShift Node.
Sep 23 10:26:51 ip-172-31-51-245 systemd: Started Session 225 of user root.

$ grep systemd messages.master1 | grep -i Openshift
Sep 23 10:20:26 ip-172-31-51-245 systemd: Starting Atomic OpenShift Master Controllers...
Sep 23 10:20:26 ip-172-31-51-245 systemd: Started Atomic OpenShift Master Controllers.
Sep 23 10:26:27 ip-172-31-51-245 systemd: atomic-openshift-node.service holdoff time over, scheduling restart.
Sep 23 10:26:27 ip-172-31-51-245 systemd: Starting Atomic OpenShift Node...
Sep 23 10:26:28 ip-172-31-51-245 atomic-openshift-node: I0923 10:26:28.451318   87612 factory.go:54] Registering systemd factory
Sep 23 10:26:28 ip-172-31-51-245 atomic-openshift-node: I0923 10:26:28.453859   87612 oomparser.go:185] oomparser using systemd
Sep 23 10:26:28 ip-172-31-51-245 systemd: Started Atomic OpenShift Node.

Comment 6 Seth Jennings 2016-09-23 17:06:54 UTC
I was able to recreate once with a "systemctl daemon-reexec".  This is what the %post in systemd RPM does.

Let me first say that I don't think this is new.

https://bugzilla.redhat.com/show_bug.cgi?id=1317031#c20

I think that systemd upgrades have, at least in recent history, always taken out the master-controller.

It appears that the master-controller has a pipe open and a daemon-rexec or systemd-journald restart breaks it.  At some point in the future, the master-controller tries to access the pipe and gets a SIGPIPE.

The service unit does have:

Restart=on-failure
RestartSec=5s

However, SIGPIPE is considered a "clean" exit signal so it does not restart (search for SIGPIPE) here
https://www.freedesktop.org/software/systemd/man/systemd.service.html

So the process is not restarted by systemd.

systemd v219 and later has a mechanism to maintain file descriptors across daemon restart (thanks to Derek for finding this):
https://github.com/systemd/systemd/blob/master/NEWS#L2016

However, the service file must set FileDescriptorStoreMax to use it.

While it is included in the journald service file upstream for v219:
https://github.com/systemd/systemd/blob/v219/units/systemd-journald.service.in

It is not in the RHEL 7.2 service file.  It is patched in our rpm:

SOURCES]$ grep FileDescriptorStoreMax *
0125-Revert-journald-allow-restarting-journald-without-lo.patch:-FileDescriptorStoreMax=1024

Including that is likely the fix for this.

Comment 7 Mike Fiedler 2016-09-23 17:18:56 UTC
Installed 3.2 on an identical cluster config and ran the same scenario.   The problem was not reproducible.

Comment 8 Derek Carr 2016-09-23 18:40:38 UTC
Mike has since confirmed that it was reproduced on 3.2 using an identical cluster configuration, but it took time for the problem to manifest.

I am lowering the priority and moving this to UPCOMING_RELEASE.

The proper fix is to leverage the FileDescriptorStoreMax features introduced in systemd 219 but disabled.  I am moving this bug to the productization team to evaluate the risk of enabling the systemd feature so this problem does not occur.

Comment 9 Derek Carr 2016-09-23 18:43:55 UTC
See https://bugzilla.redhat.com/show_bug.cgi?id=1378929#c6 on the host change needed to prevent this from occurring.

Comment 10 Mike Fiedler 2016-09-23 19:00:40 UTC
Confirming comment 9.   The cluster mentioned in comment 7 did reproduce this issue when it was put under load during the restart of systemd.

Comment 12 Kenjiro Nakayama 2016-10-03 06:02:38 UTC
Can you increase the severity to high? It normally happens that the customer tuned /etc/systemd/journald.conf and restart the journald services in production environment. (One of the customer hit this issue, actually.)

Also, I appreciate it if you could add this issue to the Known-issue list in documentation - https://docs.openshift.com/container-platform/3.3/release_notes/ocp_3_3_release_notes.html#ocp-33-known-issues

Comment 15 Scott Dodson 2016-10-04 15:06:39 UTC
*** Bug 1380570 has been marked as a duplicate of this bug. ***

Comment 27 Gaoyun Pei 2016-11-03 10:08:18 UTC
Setup a multi-master ocp-3.3 cluster with openshift-ansible-3.3.44-1.git.0.48e34f6.el7.noarch.rpm, check the atomic-openshift-master-controllers.service configuration file, the service is using Restart=always now.

Run a script to restart systemd-journald on combined master/etcd continuously, after a while, atomic-openshift-master-controllers.service on the masters are all running well.

[root@ip-172-18-15-196 ~]# systemctl status atomic-openshift-master-controllers
● atomic-openshift-master-controllers.service - Atomic OpenShift Master Controllers
   Loaded: loaded (/usr/lib/systemd/system/atomic-openshift-master-controllers.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2016-11-03 05:24:12 EDT; 30min ago
     Docs: https://github.com/openshift/origin
 Main PID: 41694 (openshift)
   Memory: 50.3M
   CGroup: /system.slice/atomic-openshift-master-controllers.service
           └─41694 /usr/bin/openshift start master controllers --config=/etc/origin/master/master-config.yaml --loglevel=5 --listen=https://0.0.0.0:8444

Comment 28 Takayoshi Kimura 2016-11-04 00:58:02 UTC
I think we need to verify the status of atomic-openshift-master-api as well because in the duplicate bug BZ1380570 and customer support case, -api also stopped unexpectedly due to this issue.

Comment 29 Gaoyun Pei 2016-11-04 06:35:02 UTC
Also verified the status of atomic-openshift-master-api service on all masters were running. When systemd-journald restart on master, atomic-openshift-master-api also got restart, after that, the service would always be running.

@Mike, so is this working well for you now?

Comment 30 Devan Goodwin 2016-11-04 11:48:11 UTC
The api service has Restart=always for containerized deployments, but *not* for rpm deployments. I suspect this means it could happen there as well, this one is a bit hard to reproduce reliably as I think it requires logging messages to be sent at the right times.

I have applied the same fix for API service in this PR:

https://github.com/openshift/openshift-ansible/pull/2718

Comment 34 Mike Fiedler 2016-11-07 13:08:41 UTC
Verified on container install (3.3.1.4) that restarting systemd-journald caused both master-api and master-controllers containers to exit and both were immediately restarted.

Comment 35 Mike Fiedler 2016-11-08 03:02:34 UTC
Still have to check master-api on rpm install.   Moving back to ON_QA temporarily.

Comment 36 Johnny Liu 2016-11-08 03:45:24 UTC
BTW, do we also need this fix for 3.4?

Comment 37 Mike Fiedler 2016-11-08 08:27:21 UTC
On 3.3.1.4, verified 

rpm:  atomic-openshift-master-controllers and atomic-openshift-master-api are restarted by systemd.

containerized:  atomic-openshift-master-controllers and atomic-openshift-master-api are restarted by systemd.

Commands do fail during the restart interval, but the services restart.

Re: comment 36 the change seems to be on openshift-ansible master.   Is there anyplace it is missing?

Comment 38 Johnny Liu 2016-11-08 10:25:05 UTC
For comment 36, I was using an older playbook to install my env. Sorry for my fault. Pls ignore it.



According to comment 37, move this bug to "verified".

Comment 39 Kenjiro Nakayama 2016-11-11 08:23:12 UTC
Since this fix will be provided ansible, current userss will not get this fix. They need manual change by themselves, don't they?
On top of that, even if users got this fix, when they restart systemd-journald, OpenShift (master) services always restart.
I still think that this issue should be listed in known issue, but it is not necessary?

Comment 40 Devan Goodwin 2016-11-14 15:41:33 UTC
It's probably a good idea to recommend it, however journald restarts should be relatively rare (I think?) and personally I'd be rebooting that host after one.

Comment 41 Kenjiro Nakayama 2016-11-14 15:52:27 UTC
OK, I see. I may be worrying too much. If Engineering team says so, it is no problem for me.

Comment 43 errata-xmlrpc 2016-11-15 19:09:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2016:2778