Bug 1923753 - Increase initialDelaySeconds for ovs-daemons container in the ovs-node daemonset for upgrade scenarios [NEEDINFO]
Summary: Increase initialDelaySeconds for ovs-daemons container in the ovs-node daemon...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.5
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.7.0
Assignee: Aniket Bhat
QA Contact: Anurag saxena
URL:
Whiteboard:
: 1921561 (view as bug list)
Depends On:
Blocks: 1924136
TreeView+ depends on / blocked
 
Reported: 2021-02-01 19:07 UTC by Sai Sindhur Malleni
Modified: 2021-06-18 07:24 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1924136 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:57:46 UTC
Target Upstream Version:
smalleni: needinfo? (anbhat)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5786581 0 None None None 2021-02-08 20:34:29 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:58:06 UTC

Description Sai Sindhur Malleni 2021-02-01 19:07:58 UTC
Description of problem:

Currently during an upgrade from 4.4.26 to 4.5.12, the upgrade is stuck on the roll-out of the ovs/ovn daemonsets. On one particular worker node, OVS pod is stuck in a crashloop and as a result ovnkube-node is unable to start as well.

Increasing the initialDelaySeconds helps the ovs container come up correctly and the upgrade proceeds after that.

Version-Release number of selected component (if applicable):
4.5.12

How reproducible:
Very likely on worker nodes when there are a lot of nodes

Steps to Reproduce:
1. Deploy a a large environemnt with OVNKubernetes
2. Perform an upgrade from 4.4.26 to 4.5.12
3.

Actual results:
Upgrade stuck on network operator due to ovs daemon rollout

Expected results:
Upgrade should proceed and the ovs container should come up correctly without being stuck in a crashloopbackoff

Additional info:
Logs from ovnkube-node
2021-02-01T15:46:28Z|08085|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:36Z|12358|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:36Z|08086|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:44Z|12359|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:44Z|08087|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:52Z|12360|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:52Z|08088|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:00Z|12361|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:00Z|08089|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:08Z|12362|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:08Z|08090|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:16Z|12363|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:16Z|08091|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:24Z|12364|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:24Z|08092|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:32Z|12365|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:32Z|08093|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:40Z|12366|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:40Z|08094|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:48Z|12367|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:48Z|08095|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:48Z|08096|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2021-02-01T15:47:49Z|08097|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection closed by peer
2021-02-01T15:47:56Z|12368|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:56Z|08098|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:04Z|12369|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:04Z|08099|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:12Z|12370|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:12Z|08100|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:20Z|12371|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:20Z|08101|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:28Z|12372|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:28Z|08102|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:36Z|12373|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:36Z|08103|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:44Z|12374|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:44Z|08104|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:52Z|12375|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refu
============================================================================
Logs from ovs pod
[kni@e16-h18-b03-fc640 ~]$ oc get pods -o wide | grep -i crash
ovnkube-node-kjgsx     1/2     CrashLoopBackOff   147        12h   192.168.220.44    worker031-fc640   <none>           <none>
ovs-node-9mj8m         0/1     CrashLoopBackOff   241        12h   192.168.220.44    worker031-fc640   <none>           <none>
[kni@e16-h18-b03-fc640 ~]$ oc logs ovs-node-9mj8m
Starting ovsdb-server.
Configuring Open vSwitch system IDs.

Comment 4 Aniket Bhat 2021-02-03 15:40:35 UTC
*** Bug 1921561 has been marked as a duplicate of this bug. ***

Comment 11 errata-xmlrpc 2021-02-24 15:57:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633


Note You need to log in before you can comment on or make changes to this bug.