Bug 1944180 - OVN-Kube Master does not release election lock on shutdown
Summary: OVN-Kube Master does not release election lock on shutdown
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.8.0
Assignee: Dan Williams
QA Contact: Ross Brattain
URL:
Whiteboard:
Depends On:
Blocks: 1943566 1951064
TreeView+ depends on / blocked
 
Reported: 2021-03-29 13:43 UTC by Tim Rozet
Modified: 2021-07-27 22:56 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1951064 (view as bug list)
Environment:
Last Closed: 2021-07-27 22:56:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 480 0 None closed Bug 1944180: 3-30-21 merge 2021-05-17 17:51:45 UTC
Github ovn-org ovn-kubernetes pull 2140 0 None closed master: cancel leader election on exit 2021-03-29 13:44:25 UTC
Red Hat Product Errata RHSA-2021:2438 0 None None None 2021-07-27 22:56:50 UTC

Description Tim Rozet 2021-03-29 13:43:03 UTC
Description of problem:
We have seen in CI that the same master always returns as leader when going down. In a recent CI run the time master was down for around 2 minutes:

https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-ovn-upgrade/1375019219732664320

No other master took over leadership while the previous active leader was down. dcbw found this is because we are not releasing the lock when the ovnkube master exits.

Comment 2 Ross Brattain 2021-04-09 03:52:11 UTC
Verified on 4.8.0-0.nightly-2021-04-08-005413

Leader changes after ovnkube-master killed.

I0409 02:27:42.941540       1 ovnkube.go:121] Received signal terminated. Shutting down
I0409 02:27:42.941694       1 services_controller.go:164] Shutting down controller ovn-lb-controller
I0409 02:27:42.941794       1 reflector.go:225] Stopping reflector *v1.Service (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.941875       1 reflector.go:225] Stopping reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.941944       1 reflector.go:225] Stopping reflector *v1.EgressIP (0s) from github.com/openshift/ovn-kubernetes/go-controller/pkg/crd/egressip/v1/apis/informers/externalversions/factory.go:117
I0409 02:27:42.942193       1 reflector.go:225] Stopping reflector *v1beta1.CustomResourceDefinition (0s) from k8s.io/apiextensions-apiserver/pkg/client/informers/externalversions/factory.go:117
I0409 02:27:42.942240       1 reflector.go:225] Stopping reflector *v1.NetworkPolicy (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.942287       1 reflector.go:225] Stopping reflector *v1.Node (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.942326       1 reflector.go:225] Stopping reflector *v1.Service (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.942358       1 reflector.go:225] Stopping reflector *v1.Endpoints (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.942406       1 reflector.go:225] Stopping reflector *v1.Namespace (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.942445       1 reflector.go:225] Stopping reflector *v1beta1.EndpointSlice (0s) from k8s.io/client-go/informers/factory.go:134
I0409 02:27:42.959646       1 master.go:106] No longer leader; exiting

Comment 5 errata-xmlrpc 2021-07-27 22:56:24 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438


Note You need to log in before you can comment on or make changes to this bug.