Bug 1860522 - OVN-kubernetes --- servers get stuck after reboot on ovnkube-node pods
Summary: OVN-kubernetes --- servers get stuck after reboot on ovnkube-node pods
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.4
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.0
Assignee: Dumitru Ceara
QA Contact: Ross Brattain
URL:
Whiteboard:
: 1861087 (view as bug list)
Depends On: 1867183 1867185 1878099
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-24 22:07 UTC by Andreas Karis
Modified: 2020-12-08 01:56 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1867183 (view as bug list)
Environment:
Last Closed: 2020-10-27 16:17:14 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 240 0 None closed [wip] Dockerfile: bump to ovn2.13-20.06.2-1.el7fdp and openvswitch2.13-2.13.0-46.el7fdp 2020-12-21 11:33:23 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:20:16 UTC

Description Andreas Karis 2020-07-24 22:07:10 UTC
Description of problem:
OVN-kubernetes --- servers get stuck after reboot on ovnkube-node pods
The customer can reproduce this by rebooting their nodes on their 4.4.11 cluster 




Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 11 Dan Williams 2020-07-29 18:31:43 UTC
Just a note, if we ever see stuff about transaction failures or database inconsistency in northd logs or anywhere else, we need to get *all* the master DBs.

Comment 16 Dan Williams 2020-07-29 21:33:51 UTC
@Andreas, is the cluster still 4.4.11?

For those playing along at home 4.4.11 has:

ovn2.13.x86_64 0:2.13.0-31.el7fdp
openvswitch2.13.x86_64 0:2.13.0-29.el7fdp

Comment 27 Dan Williams 2020-08-07 19:35:03 UTC
*** Bug 1861087 has been marked as a duplicate of this bug. ***

Comment 30 Ben Bennett 2020-08-24 14:49:16 UTC
Reopening so we can use this bug to update the ovs version to get the fix.

Comment 31 Dan Williams 2020-09-08 20:16:11 UTC
OCP 4.6 is using RHEL8 content now, and openvswitch2.13-2.13.0-52.el8fdp is the latest available in OCP repos. So we currently have this fix in OCP 4.6.

We do *not* have this fix in earlier OCP versions yet, but that is a simple matter of agreeing as a team that we are comfortable with tagging the given OVS versions into OCP 4.4 and 4.5.

In any case, we'll get the fix anyway when FDP 20.G ships at the end of September.

Comment 33 Ross Brattain 2020-09-14 14:10:18 UTC
Tested on 4.6.0-0.ci-2020-09-13-124145 with openvswitch2.13-2.13.0-52.el8fdp.x86_64

Rebooting master succeeded, cluster recovered and is healthy, no "violations" in ovnkube-master logs.

Blocked waiting on correct RPM versions in nightly builds

Comment 34 Ross Brattain 2020-09-14 21:40:52 UTC
Verified on 4.6.0-0.nightly-2020-09-12-230035

Comment 36 errata-xmlrpc 2020-10-27 16:17:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Comment 37 errata-xmlrpc 2020-10-27 16:20:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.