1331590 – Out of memory event killing ovs-vswitchd process causes atomic-openshift-node enter restart loop

Bug 1331590 - Out of memory event killing ovs-vswitchd process causes atomic-openshift-node enter restart loop

Summary: Out of memory event killing ovs-vswitchd process causes atomic-openshift-node...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	3.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Devan Goodwin
QA Contact:	Wenkai Shi
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1379439 (view as bug list)
Depends On:
Blocks:	1267746 1379439
TreeView+	depends on / blocked

Reported:	2016-04-28 22:16 UTC by Ryan Howe
Modified:	2019-10-10 12:01 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1379439 (view as bug list)
Environment:
Last Closed:	2017-01-18 12:40:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Knowledge Base (Solution)	2747841	0	None	None	None	2017-01-18 06:16:32 UTC
Red Hat Product Errata	RHBA-2017:0066	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.4 RPM Release Advisory	2017-01-18 17:23:26 UTC

Description Ryan Howe 2016-04-28 22:16:08 UTC

Description of problem:

  Out of memory event killing ovs-vswitchd process causes atomic-openshift-node enter restart loop

Version-Release number of selected component (if applicable):
3.1.1.6

How reproducible:
100%

Steps to Reproduce:
[root@ose ~]# ps aux | grep ovs-vswitchd
root       2405  0.0  0.0  49148   796 ?        S<s  17:59   0:00 ovs-vswitchd: monitoring pid 2406 (healthy)
root       2406  0.2  0.9 270996 35324 ?        S<Ll 17:59   0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor

[root@ose ~]# skill 15 2405

[root@ose ~]# skill 15 2406

[root@ose ~]# systemctl restart atomic-openshift-node

Actual results:
Node stuck in a restart loop 


Expected results:
When atomic openshift node service restarts it also kicks off a restart of openvswitch and the node starts up fine

Additional info:

To remedy this the openvswitch.service must be restarted first before restarting the atomic-openshift-node-service

#  systemctl restart openvswitch
#  systemctl restart atomic-openshift-node


Log messages: 

  kernel: Out of memory: Kill process 1087 (ovs-vswitchd) score 2 or sacrifice child

 atomic-openshift-node[82445]: I0427 14:24:37.047317   82445 common.go:236] Waiting for SDN pod network to be ready...

 systemd[1]: atomic-openshift-node.service start operation timed out. Terminating.

 ovs-vsctl[82510]: ovs|00002|fatal_signal|WARN|terminating with signal 15 (Terminated)

atomic-openshift-node[82445]: F0427 14:24:40.329274   82445 node.go:175] SDN Node failed: Failed to start plugin: /usr/bin/ovs-vsctl failed: '/usr/bin/ovs-vsctl --if-exists del-br br0 -- add-br br0 -- set Bridge br0 fail-mode=secure protocols=OpenFlow13': signal: terminated

Comment 1 Ben Bennett 2016-05-03 19:46:39 UTC

The fundamental problem seems to be that the openvswitch-nonetwork.service systemd service doesn't know how to monitor the state of the processes it kicks off.  So when the atomic-openshift-node is restarted, the dependency on openvswitch (which openvswitch-nonetwork is PartOf) appears to be satisfied, so it doesn't restart it.  The longer term fix would be to fix the ovs service so it can monitor the state better, and I have kicked off a conversation about that.

The short-term fix would be to make the openvswitch-nonetwork.service immune to the OOM killer with a drop-in file (man systemd.unit):
 $ mkdir /etc/systemd/system/openvswitch-nonetwork.service.d
 $ cat > /etc/systemd/system/openvswitch-nonetwork.service.d/01-avoid-oom.conf <<EOF
# Avoid the OOM killer for us and our children
[Service]
OOMScoreAdjust=-1000
EOF
 # systemctl daemon-reload

Comment 4 Flavio Leitner 2016-08-08 21:23:49 UTC

The systemd will be fixed in upstream but most probably can't be backported to OVS 2.4 or 2.5.

It is not clear in the bz which OVS version is running.  I suppose it is OVS 2.4 and we have fixes for memleaks available.  Could you test it?
http://download.eng.bos.redhat.com/brewroot/packages/openvswitch/2.4.1/1.git20160727.el7_2/

Comment 5 Flavio Leitner 2016-08-08 21:28:52 UTC

Also, please clarify if the issue is related only to systemd or to the memleak as well.

Comment 9 Ben Bennett 2016-10-06 17:25:29 UTC

Re-opening this against OpenShift installer so that we can put the OOM score rule in place for OVS.

Comment 10 Ben Bennett 2016-10-26 20:15:37 UTC

*** Bug 1379439 has been marked as a duplicate of this bug. ***

Comment 11 Devan Goodwin 2016-11-02 12:52:56 UTC

I deployed the above override to /etc/systemd/system/openvswitch.service.d/01-avoid-oom.conf", daemon reload, skill the two ovs processes and restart atomic-openshift-node, we're still stuck in a loop:

Nov 02 08:48:17 m1.aos.example.com atomic-openshift-node[24220]: I1102 08:48:17.785889   24332 kubelet.go:2240] skipping pod synchronization - [SDN pod network is not ready]

Trying rpm now but I don't think the fix is working for containerized environments.

Comment 12 Devan Goodwin 2016-11-02 13:06:16 UTC

Thinking a little more I think I misunderstood what the fix will do, by skill'ing I'm still simulating the OOM so the node service still gets stuck in the loop. 

I will proceed with deploying the OOM systemd override for now, please let us know if anything changes or when this can be removed.

Comment 13 Devan Goodwin 2016-11-02 13:58:36 UTC

https://github.com/openshift/openshift-ansible/pull/2700

Comment 14 Wenkai Shi 2016-11-15 10:55:50 UTC

Failed to verified with version openshift-ansible-3.4.25-1.git.0.eb2f314. From Comment 13, code has been merged and can find in rpm. The step to reproduce is same as reporter, get same problem as Comment 11. It's still stuck in a loop.

[root@ocp ~]# ps aux | grep ovs-vswitchd
root      75069  0.0  0.0  46980   792 ?        S<s  05:03   0:00 ovs-vswitchd: monitoring pid 75070 (healthy)
root      75070  0.1  0.9 268744 35248 ?        S<Ll 05:03   0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor

[root@ocp ~]# skill 15 75069
[root@ocp ~]# skill 15 75070

[root@ocp ~]# systemctl restart atomic-openshift-node
Job for atomic-openshift-node.service failed because a timeout was exceeded. See "systemctl status atomic-openshift-node.service" and "journalctl -xe" for details.

[root@ocp ~]# journalctl -xe -u atomic-openshift-node
...
Nov 15 05:40:15 ocp.example.com atomic-openshift-node[78281]: I1115 05:40:15.105018   78281 kubelet.go:2237] skipping pod synchronization - [SDN pod network is not ready]
...

Comment 15 Devan Goodwin 2016-11-15 12:42:08 UTC

Wenkai Shi, please see comment #13, I believe we both had the same misunderstanding. Using skill forces the problem to happen again, this fix will not help. The fix will however hopefully prevent the problem (simulated with an skill) from occurring in the first place. 

If we can't use skill to reproduce though, I am not sure how you would simulate the oom.

Comment 16 Wenkai Shi 2016-11-21 10:36:45 UTC

Can not reproduce OOM with many times memory required actions.
Have been use strees to start enough process to require all free memory, and the process of ovs-vswitchd still works well, kernel killed stress process.

PR has been merged.

Comment 18 errata-xmlrpc 2017-01-18 12:40:21 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066

Note You need to log in before you can comment on or make changes to this bug.