Bug 1923753 - Increase initialDelaySeconds for ovs-daemons container in the ovs-node daemonset for upgrade scenarios
Summary: Increase initialDelaySeconds for ovs-daemons container in the ovs-node daemon...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.5
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.7.0
Assignee: Aniket Bhat
QA Contact: Anurag saxena
URL:
Whiteboard:
: 1921561 (view as bug list)
Depends On:
Blocks: 1924136
TreeView+ depends on / blocked
 
Reported: 2021-02-01 19:07 UTC by Sai Sindhur Malleni
Modified: 2024-06-14 00:06 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1924136 (view as bug list)
Environment:
Last Closed: 2021-02-24 15:57:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5786581 0 None None None 2021-02-08 20:34:29 UTC
Red Hat Product Errata RHSA-2020:5633 0 None None None 2021-02-24 15:58:06 UTC

Description Sai Sindhur Malleni 2021-02-01 19:07:58 UTC
Description of problem:

Currently during an upgrade from 4.4.26 to 4.5.12, the upgrade is stuck on the roll-out of the ovs/ovn daemonsets. On one particular worker node, OVS pod is stuck in a crashloop and as a result ovnkube-node is unable to start as well.

Increasing the initialDelaySeconds helps the ovs container come up correctly and the upgrade proceeds after that.

Version-Release number of selected component (if applicable):
4.5.12

How reproducible:
Very likely on worker nodes when there are a lot of nodes

Steps to Reproduce:
1. Deploy a a large environemnt with OVNKubernetes
2. Perform an upgrade from 4.4.26 to 4.5.12
3.

Actual results:
Upgrade stuck on network operator due to ovs daemon rollout

Expected results:
Upgrade should proceed and the ovs container should come up correctly without being stuck in a crashloopbackoff

Additional info:
Logs from ovnkube-node
2021-02-01T15:46:28Z|08085|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:36Z|12358|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:36Z|08086|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:44Z|12359|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:44Z|08087|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:52Z|12360|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:46:52Z|08088|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:00Z|12361|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:00Z|08089|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:08Z|12362|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:08Z|08090|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:16Z|12363|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:16Z|08091|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:24Z|12364|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:24Z|08092|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:32Z|12365|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:32Z|08093|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:40Z|12366|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:40Z|08094|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:48Z|12367|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:48Z|08095|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:48Z|08096|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected
2021-02-01T15:47:49Z|08097|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connection closed by peer
2021-02-01T15:47:56Z|12368|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:47:56Z|08098|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:04Z|12369|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:04Z|08099|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:12Z|12370|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:12Z|08100|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:20Z|12371|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:20Z|08101|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:28Z|12372|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:28Z|08102|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:36Z|12373|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:36Z|08103|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:44Z|12374|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:44Z|08104|rconn|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refused)
2021-02-01T15:48:52Z|12375|rconn(ovn_pinctrl0)|WARN|unix:/var/run/openvswitch/br-int.mgmt: connection failed (Connection refu
============================================================================
Logs from ovs pod
[kni@e16-h18-b03-fc640 ~]$ oc get pods -o wide | grep -i crash
ovnkube-node-kjgsx     1/2     CrashLoopBackOff   147        12h   192.168.220.44    worker031-fc640   <none>           <none>
ovs-node-9mj8m         0/1     CrashLoopBackOff   241        12h   192.168.220.44    worker031-fc640   <none>           <none>
[kni@e16-h18-b03-fc640 ~]$ oc logs ovs-node-9mj8m
Starting ovsdb-server.
Configuring Open vSwitch system IDs.

Comment 4 Aniket Bhat 2021-02-03 15:40:35 UTC
*** Bug 1921561 has been marked as a duplicate of this bug. ***

Comment 11 errata-xmlrpc 2021-02-24 15:57:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Comment 12 Red Hat Bugzilla 2023-09-15 01:00:10 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.