Bug 2048563 - Leader election conventions for cluster topology
Summary: Leader election conventions for cluster topology
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: OLM
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Ehila
QA Contact: Jian Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-31 14:16 UTC by Ehila
Modified: 2022-08-10 10:46 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The package server was not topology aware when defining its leader election duration, renewal deadline, and retry periods. Consequence: The package server created unnecessary strain on topologies with limited resources, such as single node environments. Fix: Introduced a leaderElection package that is topology aware, reducing strain on clusters with limited resources. Result: The package server is topology aware and sets reasonable lease duration, renewal deadlines, and retry periods for the topology.
Clone Of:
Environment:
Last Closed: 2022-08-10 10:45:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift operator-framework-olm pull 228 0 None open Bug 2048563: feat added leader election conventions 2022-01-31 14:38:28 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:46:22 UTC

Description Ehila 2022-01-31 14:16:04 UTC
Description of problem:

Updated leader election to follow conventions as well as be cluster topology aware. This should provide a slight performance improvement in aggregate with other operators in SNO clusters and resource limited SNO DU clusters. 

Conventions defined:

https://github.com/openshift/enhancements/blob/master/CONVENTIONS.md#high-availability

Comment 4 Jian Zhang 2022-02-24 08:32:00 UTC
1, Create an SNO cluster with this fixed PR.

[cloud-user@preserve-olm-env jian]$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-02-23-185405   True        False         4h4m    Cluster version is 4.11.0-0.nightly-2022-02-23-185405

[cloud-user@preserve-olm-env jian]$ oc get infrastructure cluster -o=jsonpath={.status.controlPlaneTopology}
SingleReplica

[cloud-user@preserve-olm-env jian]$ oc get node
NAME                                         STATUS   ROLES           AGE     VERSION
ip-10-0-153-136.us-east-2.compute.internal   Ready    master,worker   4h24m   v1.23.3+fe7796f

[cloud-user@preserve-olm-env jian]$ oc exec catalog-operator-7f65bd4697-7swnp -- olm --version
OLM version: 0.19.0
git commit: 6858269bdc4b31466ff5eca7d6287fe387077fa7


126 2022-02-24T08:18:06.820Z        INFO    controllers.packageserver       currently topology mode {"csv": "openshift-operator-lifecycle-manager/packageserver", "highly available": false}

2, Check if the `leaseDurationSeconds` changed to 270s.
[cloud-user@preserve-olm-env jian]$ oc get cm packageserver-controller-lock -o yaml
apiVersion: v1
kind: ConfigMap
metadata:
  annotations:
    control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"package-server-manager-587c7d499f-5v4f8_ebadc75b-bc25-49d9-9090-79c11135cf75","leaseDurationSeconds":270,"acquireTime":"2022-02-24T03:54:37Z","renewTime":"2022-02-24T08:14:33Z","leaderTransitions":0}'
  creationTimestamp: "2022-02-24T03:54:37Z"
  name: packageserver-controller-lock
  namespace: openshift-operator-lifecycle-manager
  resourceVersion: "75149"
  uid: 7870db39-b92a-4faa-93e7-c83b66d9f877

3, Check if the package server works well.
[cloud-user@preserve-olm-env jian]$ oc get packagemanifest
NAME                                                CATALOG               AGE
ibm-security-verify-operator                        Certified Operators   4h30m
openshift-qiskit-operator                           Community Operators   4h30m
...

LGMT, verify it.

Comment 6 errata-xmlrpc 2022-08-10 10:45:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.