Description of problem: OVN-kubernetes --- servers get stuck after reboot on ovnkube-node pods The customer can reproduce this by rebooting their nodes on their 4.4.11 cluster Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Just a note, if we ever see stuff about transaction failures or database inconsistency in northd logs or anywhere else, we need to get *all* the master DBs.
@Andreas, is the cluster still 4.4.11? For those playing along at home 4.4.11 has: ovn2.13.x86_64 0:2.13.0-31.el7fdp openvswitch2.13.x86_64 0:2.13.0-29.el7fdp
*** Bug 1861087 has been marked as a duplicate of this bug. ***
Reopening so we can use this bug to update the ovs version to get the fix.
OCP 4.6 is using RHEL8 content now, and openvswitch2.13-2.13.0-52.el8fdp is the latest available in OCP repos. So we currently have this fix in OCP 4.6. We do *not* have this fix in earlier OCP versions yet, but that is a simple matter of agreeing as a team that we are comfortable with tagging the given OVS versions into OCP 4.4 and 4.5. In any case, we'll get the fix anyway when FDP 20.G ships at the end of September.
Tested on 4.6.0-0.ci-2020-09-13-124145 with openvswitch2.13-2.13.0-52.el8fdp.x86_64 Rebooting master succeeded, cluster recovered and is healthy, no "violations" in ovnkube-master logs. Blocked waiting on correct RPM versions in nightly builds
Verified on 4.6.0-0.nightly-2020-09-12-230035
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196