Bug 1849874
Summary: | When CLO pod Evicted, the new CLO pod couldn't become leader. | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Qiaoling Tang <qitang> |
Component: | Logging | Assignee: | Periklis Tsirakidis <periklis> |
Status: | CLOSED ERRATA | QA Contact: | Qiaoling Tang <qitang> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.5 | CC: | aos-bugs, jcantril, periklis |
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 16:08:40 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Qiaoling Tang
2020-06-23 05:44:16 UTC
After the Evicted pod is deleted, the new pod becomes leader. This is a known issue in the Operator SDK's implementation of leader-for-life with the most current fix being in the latest version v0.18.1. CLO vendors operator-sdk v0.8.2. Upstream: Evicted pod do not release controller ConfigMap lock Bug 1749620 https://github.com/operator-framework/operator-sdk/issues/1874 Fix: leader election bugfix: Delete evicted leader pods https://github.com/operator-framework/operator-sdk/pull/2210 Fix for the fix: leader: get most recent lock owner when attempting a claim https://github.com/operator-framework/operator-sdk/pull/3059 [operator-sdk] Evicted pod do not release controller ConfigMap lock https://bugzilla.redhat.com/show_bug.cgi?id=1749620 Note that the same issue affects the EO, which vendors the same version of operator-sdk. Turns out the Operator SDK update is already underway in the upstream: Migrate operator-sdk and deps to v0.18.1 https://github.com/openshift/cluster-logging-operator/pull/576 (In reply to Sergey Yedrikov from comment #2) > This is a known issue in the Operator SDK's implementation of > leader-for-life with the most current fix being in the latest version > v0.18.1. CLO vendors operator-sdk v0.8.2. > > Upstream: > > Evicted pod do not release controller ConfigMap lock Bug 1749620 > https://github.com/operator-framework/operator-sdk/issues/1874 > > Fix: > leader election bugfix: Delete evicted leader pods > https://github.com/operator-framework/operator-sdk/pull/2210 > > Fix for the fix: > leader: get most recent lock owner when attempting a claim > https://github.com/operator-framework/operator-sdk/pull/3059 > > [operator-sdk] Evicted pod do not release controller ConfigMap lock > https://bugzilla.redhat.com/show_bug.cgi?id=1749620 > > Note that the same issue affects the EO, which vendors the same version of > operator-sdk. EO is upgraded to operator-sdk v0.18.1 already by: https://github.com/openshift/elasticsearch-operator/pull/291 I am putting this to ON_QA because tech debt PRs are merged. Verified with quay.io/openshift/origin-elasticsearch-operator@sha256:4843a5191961655e7913c0d55efd225ce502ad7e4b8f6ba5fe58cb995492317c and quay.io/openshift/origin-cluster-logging-operator@sha256:8a63a377f26afe46786b5f3cb94b908dcae071001f9e0ade2d82aa318a4f081a. The operator-sdk version is v0.18.1 The new pod could became leader: $ oc logs -n openshift-operators-redhat elasticsearch-operator-85f596849b-7t5x8 {"level":"info","ts":1593311872.4003904,"logger":"cmd","msg":"Operator Version: 0.0.1"} {"level":"info","ts":1593311872.4004252,"logger":"cmd","msg":"Go Version: go1.13.8"} {"level":"info","ts":1593311872.40043,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1593311872.4004338,"logger":"cmd","msg":"Version of operator-sdk: v0.18.1"} {"level":"info","ts":1593311872.4008894,"logger":"leader","msg":"Trying to become the leader."} I0628 02:37:53.451944 1 request.go:621] Throttling request took 1.035717719s, request: GET:https://172.30.0.1:443/apis/operator.openshift.io/v1?timeout=32s {"level":"info","ts":1593311874.6751797,"logger":"leader","msg":"Found existing lock","LockOwner":"elasticsearch-operator-85f596849b-vn2wv"} {"level":"info","ts":1593311874.693283,"logger":"leader","msg":"Operator pod with leader lock has been evicted.","leader":"elasticsearch-operator-85f596849b-vn2wv"} {"level":"info","ts":1593311874.693376,"logger":"leader","msg":"Deleting evicted leader."} {"level":"info","ts":1593311875.8036397,"logger":"leader","msg":"Became the leader."} $ oc logs cluster-logging-operator-58b6875d75-6jk99 {"level":"info","ts":1593311339.1577475,"logger":"cmd","msg":"Operator Version: 0.0.1"} {"level":"info","ts":1593311339.1577837,"logger":"cmd","msg":"Go Version: go1.13.8"} {"level":"info","ts":1593311339.15779,"logger":"cmd","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1593311339.1577952,"logger":"cmd","msg":"Version of operator-sdk: v0.18.1"} {"level":"info","ts":1593311339.1585007,"logger":"leader","msg":"Trying to become the leader."} I0628 02:29:00.210226 1 request.go:621] Throttling request took 1.034629822s, request: GET:https://172.30.0.1:443/apis/logging.openshift.io/v1alpha1?timeout=32s {"level":"info","ts":1593311341.4317126,"logger":"leader","msg":"Found existing lock with my name. I was likely restarted."} {"level":"info","ts":1593311341.4317534,"logger":"leader","msg":"Continuing as the leader."} Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |