Bug 1905299
Summary: | OLM fails to update operator | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Alexey Kazakov <alkazako> |
Component: | OLM | Assignee: | Vu Dinh <vdinh> |
OLM sub component: | OLM | QA Contact: | kuiwang |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | high | CC: | krizza, mjobanek, pneedle, sgutz, vdinh |
Version: | 4.5 | ||
Target Milestone: | --- | ||
Target Release: | 4.7.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: Previously, Operator updates could result in Operator pods being deployed before a new service account was created.
Consequence: The pod could be deployed by using the existing service account and would fail to start with insufficient permissions.
Fix: A check has been added to verify that a new service account exists before the cluster service version (CSV) is moved from a `Pending` to `Installing` state.
Result: If a new service account does not exist, the CSV remains in a `Pending` state which prevents the deployment from being updated.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-02-24 15:40:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1907586 |
Description
Alexey Kazakov
2020-12-08 01:32:14 UTC
This is quite a strange scenario. It seems for some reason the new SA is being created after the new deployment pod is already spinning up. If you delete the failed pod, the ReplicaSet will spin up a new pod and it will succeed. Hi @vdinh Thanks a lot for looking at this issue. Is there any progress on it? I guess that you understand that we cannot delete the failed pod for every operator update. This issue is getting to be critical - it also affects our production OSD cluster - OpenShift version: 4.5.16 @vdinh could you please give me any update on this? Since this is holding up a release and is going to be impacting revenue unless we get it resolved, can get a bit more insight on the problem and when it is going to get fixed? If we're not asking the right question then please let us know what the right question is. verify it on 4.7. LGTM -- [root@preserve-olm-env 1905299]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-12-14-165231 True False 93m Cluster version is 4.7.0-0.nightly-2020-12-14-165231 [root@preserve-olm-env 1905299]# oc get pod -n openshift-operator-lifecycle-manager NAME READY STATUS RESTARTS AGE catalog-operator-66cf979978-k58km 1/1 Running 0 89m olm-operator-55d756959-v9vzn 1/1 Running 0 89m packageserver-597d7f4fb-jckjw 1/1 Running 0 89m packageserver-597d7f4fb-kgtsj 1/1 Running 0 90m [root@preserve-olm-env 1905299]# oc exec catalog-operator-66cf979978-k58km -n openshift-operator-lifecycle-manager -- olm --version OLM version: 0.17.0 git commit: 4b66803055a8ab611447c33ed86e755ad39cb313 [root@preserve-olm-env 1905299]# [root@preserve-olm-env 1905299]# cat og-single.yaml kind: OperatorGroup apiVersion: operators.coreos.com/v1 metadata: name: og-single1 namespace: default spec: targetNamespaces: - default [root@preserve-olm-env 1905299]# oc apply -f og-single.yaml operatorgroup.operators.coreos.com/og-single1 created [root@preserve-olm-env 1905299]# cat catsrc.yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: labels: opsrc-provider: codeready-toolchain name: hosted-toolchain-operators namespace: default spec: sourceType: grpc image: quay.io/codeready-toolchain/hosted-toolchain-index:latest displayName: Hosted Toolchain Operators updateStrategy: registryPoll: interval: 5m [root@preserve-olm-env 1905299]# oc apply -f catsrc.yaml catalogsource.operators.coreos.com/hosted-toolchain-operators created [root@preserve-olm-env 1905299]# [root@preserve-olm-env 1905299]# cat sub1.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: host-operator namespace: default spec: channel: staging installPlanApproval: Automatic name: toolchain-host-operator source: hosted-toolchain-operators sourceNamespace: default startingCSV: toolchain-host-operator.v0.0.302-134-commit-3f1ed73-e1d3119 [root@preserve-olm-env 1905299]# oc apply -f sub1.yaml subscription.operators.coreos.com/host-operator created [root@preserve-olm-env 1905299]# [root@preserve-olm-env 1905299]# oc get sub NAME PACKAGE SOURCE CHANNEL host-operator toolchain-host-operator hosted-toolchain-operators staging [root@preserve-olm-env 1905299]# oc get ip NAME CSV APPROVAL APPROVED install-cttrg toolchain-host-operator.v0.0.302-134-commit-3f1ed73-e1d3119 Automatic true install-hlgmw toolchain-host-operator.v0.0.303-134-commit-a512840-e1d3119 Automatic true install-q8m5s toolchain-host-operator.v0.0.304-134-commit-7723fcf-e1d3119 Automatic true [root@preserve-olm-env 1905299]# oc get csv NAME DISPLAY VERSION REPLACES PHASE toolchain-host-operator.v0.0.303-134-commit-a512840-e1d3119 Toolchain Host Operator 0.0.303-134-commit-a512840-e1d3119 toolchain-host-operator.v0.0.302-134-commit-3f1ed73-e1d3119 Replacing toolchain-host-operator.v0.0.304-134-commit-7723fcf-e1d3119 Toolchain Host Operator 0.0.304-134-commit-7723fcf-e1d3119 toolchain-host-operator.v0.0.303-134-commit-a512840-e1d3119 Installing [root@preserve-olm-env 1905299]# oc get ip NAME CSV APPROVAL APPROVED install-262fz toolchain-host-operator.v0.0.306-135-commit-c3ceb05-f0f86eb Automatic true install-6g84v toolchain-host-operator.v0.0.308-136-commit-ab38d4a-386dc5d Automatic true install-jh5pk toolchain-host-operator.v0.0.305-135-commit-aca313a-f0f86eb Automatic true install-pvzwp toolchain-host-operator.v0.0.307-136-commit-74f7fad-386dc5d Automatic true install-xvvlg toolchain-host-operator.v0.0.306-136-commit-c3ceb05-386dc5d Automatic true [root@preserve-olm-env 1905299]# oc get csv NAME DISPLAY VERSION REPLACES PHASE toolchain-host-operator.v0.0.307-136-commit-74f7fad-386dc5d Toolchain Host Operator 0.0.307-136-commit-74f7fad-386dc5d toolchain-host-operator.v0.0.306-136-commit-c3ceb05-386dc5d Replacing toolchain-host-operator.v0.0.308-136-commit-ab38d4a-386dc5d Toolchain Host Operator 0.0.308-136-commit-ab38d4a-386dc5d toolchain-host-operator.v0.0.307-136-commit-74f7fad-386dc5d Installing [root@preserve-olm-env 1905299]# oc get ip NAME CSV APPROVAL APPROVED install-6h4vl toolchain-host-operator.v0.0.314-140-commit-5c442dc-633c7ba Automatic true install-j2hxx toolchain-host-operator.v0.0.313-140-commit-a1632a7-633c7ba Automatic true install-lrmvt toolchain-host-operator.v0.0.316-140-commit-05b62d3-633c7ba Automatic true install-mfn4x toolchain-host-operator.v0.0.315-140-commit-8e834dc-633c7ba Automatic true install-r5c24 toolchain-host-operator.v0.0.316-141-commit-05b62d3-a2ed2a7 Automatic true [root@preserve-olm-env 1905299]# oc get csv NAME DISPLAY VERSION REPLACES PHASE toolchain-host-operator.v0.0.316-141-commit-05b62d3-a2ed2a7 Toolchain Host Operator 0.0.316-141-commit-05b62d3-a2ed2a7 toolchain-host-operator.v0.0.316-140-commit-05b62d3-633c7ba Succeeded -- Any chance to backport it to 4.6? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |