Bug 1944180

Summary: OVN-Kube Master does not release election lock on shutdown
Product: OpenShift Container Platform Reporter: Tim Rozet <trozet>
Component: NetworkingAssignee: Dan Williams <dcbw>
Networking sub component: ovn-kubernetes QA Contact: Ross Brattain <rbrattai>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: rbrattai
Version: 4.6   
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1951064 (view as bug list) Environment:
Last Closed: 2021-07-27 22:56:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1943566, 1951064    

Description Tim Rozet 2021-03-29 13:43:03 UTC
Description of problem:
We have seen in CI that the same master always returns as leader when going down. In a recent CI run the time master was down for around 2 minutes:

https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade/1375019219732664320

No other master took over leadership while the previous active leader was down. dcbw found this is because we are not releasing the lock when the ovnkube master exits.

Comment 2 Ross Brattain 2021-04-09 03:52:11 UTC
Verified on 4.8.0-0.nightly-2021-04-08-005413

Leader changes after ovnkube-master killed.

I0409 02:27:42.941540       1 ovnkube.go:121] Received signal terminated. Shutting down
I0409 02:27:42.941694       1 services_controller.go:164] Shutting down controller ovn-lb-controller
I0409 02:27:42.941794       1 reflector.go:225] Stopping reflector *v1.Service (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.941875       1 reflector.go:225] Stopping reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.941944       1 reflector.go:225] Stopping reflector *v1.EgressIP (0s) from github.com/openshift/ovn-kubernetes/go-controller/pkg/crd/egressip/v1/apis/informers/externalversions/factory.go:117
I0409 02:27:42.942193       1 reflector.go:225] Stopping reflector *v1beta1.CustomResourceDefinition (0s) from k8s.io/apiextensions-apiserver/pkg/client/informers/externalversions/factory.go:117
I0409 02:27:42.942240       1 reflector.go:225] Stopping reflector *v1.NetworkPolicy (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.942287       1 reflector.go:225] Stopping reflector *v1.Node (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.942326       1 reflector.go:225] Stopping reflector *v1.Service (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.942358       1 reflector.go:225] Stopping reflector *v1.Endpoints (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.942406       1 reflector.go:225] Stopping reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.942445       1 reflector.go:225] Stopping reflector *v1beta1.EndpointSlice (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.959646       1 master.go:106] No longer leader; exiting

Comment 5 errata-xmlrpc 2021-07-27 22:56:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438