Bug 1923753

Summary: Increase initialDelaySeconds for ovs-daemons container in the ovs-node daemonset for upgrade scenarios
Product: OpenShift Container Platform Reporter: Sai Sindhur Malleni <smalleni>
Component: NetworkingAssignee: Aniket Bhat <anbhat>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: akaris, anbhat, dblack, dcritch, dwilson, eminguez, mmethot, zzhao
Version: 4.5Keywords: Upgrades
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1924136 (view as bug list) Environment:
Last Closed: 2021-02-24 15:57:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1924136    

Description Sai Sindhur Malleni 2021-02-01 19:07:58 UTC
Description of problem:

Currently during an upgrade from 4.4.26 to 4.5.12, the upgrade is stuck on the roll-out of the ovs/ovn daemonsets. On one particular worker node, OVS pod is stuck in a crashloop and as a result ovnkube-node is unable to start as well.

Increasing the initialDelaySeconds helps the ovs container come up correctly and the upgrade proceeds after that.

Version-Release number of selected component (if applicable):
4.5.12

How reproducible:
Very likely on worker nodes when there are a lot of nodes

Steps to Reproduce:
1. Deploy a a large environemnt with OVNKubernetes
2. Perform an upgrade from 4.4.26 to 4.5.12
3.

Actual results:
Upgrade stuck on network operator due to ovs daemon rollout

Expected results:
Upgrade should proceed and the ovs container should come up correctly without being stuck in a crashloopbackoff

Additional info:
Logs from ovnkube-node
2021-02-01T15:46:28Z|08085|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:36Z|12358|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:36Z|08086|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:44Z|12359|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:44Z|08087|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:52Z|12360|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:52Z|08088|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:00Z|12361|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:00Z|08089|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:08Z|12362|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:08Z|08090|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:16Z|12363|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:16Z|08091|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:24Z|12364|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:24Z|08092|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:32Z|12365|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:32Z|08093|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:40Z|12366|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:40Z|08094|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:48Z|12367|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:48Z|08095|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:48Z|08096|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2021-02-01T15:47:49Z|08097|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection closed by peer
2021-02-01T15:47:56Z|12368|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:56Z|08098|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:04Z|12369|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:04Z|08099|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:12Z|12370|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:12Z|08100|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:20Z|12371|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:20Z|08101|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:28Z|12372|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:28Z|08102|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:36Z|12373|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:36Z|08103|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:44Z|12374|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:44Z|08104|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:52Z|12375|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refu
============================================================================
Logs from ovs pod
[kni@e16-h18-b03-fc640 ~]$ oc get pods -o wide | grep -i crash
ovnkube-node-kjgsx     1/2     CrashLoopBackOff   147        12h   192.168.220.44    worker031-fc640   <none>           <none>
ovs-node-9mj8m         0/1     CrashLoopBackOff   241        12h   192.168.220.44    worker031-fc640   <none>           <none>
[kni@e16-h18-b03-fc640 ~]$ oc logs ovs-node-9mj8m
Starting ovsdb-server.
Configuring Open vSwitch system IDs.

Comment 4 Aniket Bhat 2021-02-03 15:40:35 UTC
*** Bug 1921561 has been marked as a duplicate of this bug. ***

Comment 11 errata-xmlrpc 2021-02-24 15:57:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 12 Red Hat Bugzilla 2023-09-15 01:00:10 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days