Description of problem: Out of memory event killing ovs-vswitchd process causes atomic-openshift-node enter restart loop Version-Release number of selected component (if applicable): 3.1.1.6 How reproducible: 100% Steps to Reproduce: [root@ose ~]# ps aux | grep ovs-vswitchd root 2405 0.0 0.0 49148 796 ? S<s 17:59 0:00 ovs-vswitchd: monitoring pid 2406 (healthy) root 2406 0.2 0.9 270996 35324 ? S<Ll 17:59 0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor [root@ose ~]# skill 15 2405 [root@ose ~]# skill 15 2406 [root@ose ~]# systemctl restart atomic-openshift-node Actual results: Node stuck in a restart loop Expected results: When atomic openshift node service restarts it also kicks off a restart of openvswitch and the node starts up fine Additional info: To remedy this the openvswitch.service must be restarted first before restarting the atomic-openshift-node-service # systemctl restart openvswitch # systemctl restart atomic-openshift-node Log messages: kernel: Out of memory: Kill process 1087 (ovs-vswitchd) score 2 or sacrifice child atomic-openshift-node[82445]: I0427 14:24:37.047317 82445 common.go:236] Waiting for SDN pod network to be ready... systemd[1]: atomic-openshift-node.service start operation timed out. Terminating. ovs-vsctl[82510]: ovs|00002|fatal_signal|WARN|terminating with signal 15 (Terminated) atomic-openshift-node[82445]: F0427 14:24:40.329274 82445 node.go:175] SDN Node failed: Failed to start plugin: /usr/bin/ovs-vsctl failed: '/usr/bin/ovs-vsctl --if-exists del-br br0 -- add-br br0 -- set Bridge br0 fail-mode=secure protocols=OpenFlow13': signal: terminated
The fundamental problem seems to be that the openvswitch-nonetwork.service systemd service doesn't know how to monitor the state of the processes it kicks off. So when the atomic-openshift-node is restarted, the dependency on openvswitch (which openvswitch-nonetwork is PartOf) appears to be satisfied, so it doesn't restart it. The longer term fix would be to fix the ovs service so it can monitor the state better, and I have kicked off a conversation about that. The short-term fix would be to make the openvswitch-nonetwork.service immune to the OOM killer with a drop-in file (man systemd.unit): $ mkdir /etc/systemd/system/openvswitch-nonetwork.service.d $ cat > /etc/systemd/system/openvswitch-nonetwork.service.d/01-avoid-oom.conf <<EOF # Avoid the OOM killer for us and our children [Service] OOMScoreAdjust=-1000 EOF # systemctl daemon-reload
The systemd will be fixed in upstream but most probably can't be backported to OVS 2.4 or 2.5. It is not clear in the bz which OVS version is running. I suppose it is OVS 2.4 and we have fixes for memleaks available. Could you test it? http://download.eng.bos.redhat.com/brewroot/packages/openvswitch/2.4.1/1.git20160727.el7_2/
Also, please clarify if the issue is related only to systemd or to the memleak as well.
Re-opening this against OpenShift installer so that we can put the OOM score rule in place for OVS.
*** Bug 1379439 has been marked as a duplicate of this bug. ***
I deployed the above override to /etc/systemd/system/openvswitch.service.d/01-avoid-oom.conf", daemon reload, skill the two ovs processes and restart atomic-openshift-node, we're still stuck in a loop: Nov 02 08:48:17 m1.aos.example.com atomic-openshift-node[24220]: I1102 08:48:17.785889 24332 kubelet.go:2240] skipping pod synchronization - [SDN pod network is not ready] Trying rpm now but I don't think the fix is working for containerized environments.
Thinking a little more I think I misunderstood what the fix will do, by skill'ing I'm still simulating the OOM so the node service still gets stuck in the loop. I will proceed with deploying the OOM systemd override for now, please let us know if anything changes or when this can be removed.
https://github.com/openshift/openshift-ansible/pull/2700
Failed to verified with version openshift-ansible-3.4.25-1.git.0.eb2f314. From Comment 13, code has been merged and can find in rpm. The step to reproduce is same as reporter, get same problem as Comment 11. It's still stuck in a loop. [root@ocp ~]# ps aux | grep ovs-vswitchd root 75069 0.0 0.0 46980 792 ? S<s 05:03 0:00 ovs-vswitchd: monitoring pid 75070 (healthy) root 75070 0.1 0.9 268744 35248 ? S<Ll 05:03 0:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor [root@ocp ~]# skill 15 75069 [root@ocp ~]# skill 15 75070 [root@ocp ~]# systemctl restart atomic-openshift-node Job for atomic-openshift-node.service failed because a timeout was exceeded. See "systemctl status atomic-openshift-node.service" and "journalctl -xe" for details. [root@ocp ~]# journalctl -xe -u atomic-openshift-node ... Nov 15 05:40:15 ocp.example.com atomic-openshift-node[78281]: I1115 05:40:15.105018 78281 kubelet.go:2237] skipping pod synchronization - [SDN pod network is not ready] ...
Wenkai Shi, please see comment #13, I believe we both had the same misunderstanding. Using skill forces the problem to happen again, this fix will not help. The fix will however hopefully prevent the problem (simulated with an skill) from occurring in the first place. If we can't use skill to reproduce though, I am not sure how you would simulate the oom.
Can not reproduce OOM with many times memory required actions. Have been use strees to start enough process to require all free memory, and the process of ovs-vswitchd still works well, kernel killed stress process. PR has been merged.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066