Bug 1905299
| Summary: | OLM fails to update operator | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Alexey Kazakov <alkazako> |
| Component: | OLM | Assignee: | Vu Dinh <vdinh> |
| OLM sub component: | OLM | QA Contact: | kuiwang |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | medium | ||
| Priority: | high | CC: | krizza, mjobanek, pneedle, sgutz, vdinh |
| Version: | 4.5 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.7.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: Previously, Operator updates could result in Operator pods being deployed before a new service account was created.
Consequence: The pod could be deployed by using the existing service account and would fail to start with insufficient permissions.
Fix: A check has been added to verify that a new service account exists before the cluster service version (CSV) is moved from a `Pending` to `Installing` state.
Result: If a new service account does not exist, the CSV remains in a `Pending` state which prevents the deployment from being updated.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-02-24 15:40:33 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1907586 | ||
|
Description
Alexey Kazakov
2020-12-08 01:32:14 UTC
This is quite a strange scenario. It seems for some reason the new SA is being created after the new deployment pod is already spinning up. If you delete the failed pod, the ReplicaSet will spin up a new pod and it will succeed. Hi @vdinh Thanks a lot for looking at this issue. Is there any progress on it? I guess that you understand that we cannot delete the failed pod for every operator update. This issue is getting to be critical - it also affects our production OSD cluster - OpenShift version: 4.5.16 @vdinh could you please give me any update on this? Since this is holding up a release and is going to be impacting revenue unless we get it resolved, can get a bit more insight on the problem and when it is going to get fixed? If we're not asking the right question then please let us know what the right question is. verify it on 4.7. LGTM
--
[root@preserve-olm-env 1905299]# oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.7.0-0.nightly-2020-12-14-165231 True False 93m Cluster version is 4.7.0-0.nightly-2020-12-14-165231
[root@preserve-olm-env 1905299]# oc get pod -n openshift-operator-lifecycle-manager
NAME READY STATUS RESTARTS AGE
catalog-operator-66cf979978-k58km 1/1 Running 0 89m
olm-operator-55d756959-v9vzn 1/1 Running 0 89m
packageserver-597d7f4fb-jckjw 1/1 Running 0 89m
packageserver-597d7f4fb-kgtsj 1/1 Running 0 90m
[root@preserve-olm-env 1905299]# oc exec catalog-operator-66cf979978-k58km -n openshift-operator-lifecycle-manager -- olm --version
OLM version: 0.17.0
git commit: 4b66803055a8ab611447c33ed86e755ad39cb313
[root@preserve-olm-env 1905299]#
[root@preserve-olm-env 1905299]# cat og-single.yaml
kind: OperatorGroup
apiVersion: operators.coreos.com/v1
metadata:
name: og-single1
namespace: default
spec:
targetNamespaces:
- default
[root@preserve-olm-env 1905299]# oc apply -f og-single.yaml
operatorgroup.operators.coreos.com/og-single1 created
[root@preserve-olm-env 1905299]# cat catsrc.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
labels:
opsrc-provider: codeready-toolchain
name: hosted-toolchain-operators
namespace: default
spec:
sourceType: grpc
image: quay.io/codeready-toolchain/hosted-toolchain-index:latest
displayName: Hosted Toolchain Operators
updateStrategy:
registryPoll:
interval: 5m
[root@preserve-olm-env 1905299]# oc apply -f catsrc.yaml
catalogsource.operators.coreos.com/hosted-toolchain-operators created
[root@preserve-olm-env 1905299]#
[root@preserve-olm-env 1905299]# cat sub1.yaml
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: host-operator
namespace: default
spec:
channel: staging
installPlanApproval: Automatic
name: toolchain-host-operator
source: hosted-toolchain-operators
sourceNamespace: default
startingCSV: toolchain-host-operator.v0.0.302-134-commit-3f1ed73-e1d3119
[root@preserve-olm-env 1905299]# oc apply -f sub1.yaml
subscription.operators.coreos.com/host-operator created
[root@preserve-olm-env 1905299]#
[root@preserve-olm-env 1905299]# oc get sub
NAME PACKAGE SOURCE CHANNEL
host-operator toolchain-host-operator hosted-toolchain-operators staging
[root@preserve-olm-env 1905299]# oc get ip
NAME CSV APPROVAL APPROVED
install-cttrg toolchain-host-operator.v0.0.302-134-commit-3f1ed73-e1d3119 Automatic true
install-hlgmw toolchain-host-operator.v0.0.303-134-commit-a512840-e1d3119 Automatic true
install-q8m5s toolchain-host-operator.v0.0.304-134-commit-7723fcf-e1d3119 Automatic true
[root@preserve-olm-env 1905299]# oc get csv
NAME DISPLAY VERSION REPLACES PHASE
toolchain-host-operator.v0.0.303-134-commit-a512840-e1d3119 Toolchain Host Operator 0.0.303-134-commit-a512840-e1d3119 toolchain-host-operator.v0.0.302-134-commit-3f1ed73-e1d3119 Replacing
toolchain-host-operator.v0.0.304-134-commit-7723fcf-e1d3119 Toolchain Host Operator 0.0.304-134-commit-7723fcf-e1d3119 toolchain-host-operator.v0.0.303-134-commit-a512840-e1d3119 Installing
[root@preserve-olm-env 1905299]# oc get ip
NAME CSV APPROVAL APPROVED
install-262fz toolchain-host-operator.v0.0.306-135-commit-c3ceb05-f0f86eb Automatic true
install-6g84v toolchain-host-operator.v0.0.308-136-commit-ab38d4a-386dc5d Automatic true
install-jh5pk toolchain-host-operator.v0.0.305-135-commit-aca313a-f0f86eb Automatic true
install-pvzwp toolchain-host-operator.v0.0.307-136-commit-74f7fad-386dc5d Automatic true
install-xvvlg toolchain-host-operator.v0.0.306-136-commit-c3ceb05-386dc5d Automatic true
[root@preserve-olm-env 1905299]# oc get csv
NAME DISPLAY VERSION REPLACES PHASE
toolchain-host-operator.v0.0.307-136-commit-74f7fad-386dc5d Toolchain Host Operator 0.0.307-136-commit-74f7fad-386dc5d toolchain-host-operator.v0.0.306-136-commit-c3ceb05-386dc5d Replacing
toolchain-host-operator.v0.0.308-136-commit-ab38d4a-386dc5d Toolchain Host Operator 0.0.308-136-commit-ab38d4a-386dc5d toolchain-host-operator.v0.0.307-136-commit-74f7fad-386dc5d Installing
[root@preserve-olm-env 1905299]# oc get ip
NAME CSV APPROVAL APPROVED
install-6h4vl toolchain-host-operator.v0.0.314-140-commit-5c442dc-633c7ba Automatic true
install-j2hxx toolchain-host-operator.v0.0.313-140-commit-a1632a7-633c7ba Automatic true
install-lrmvt toolchain-host-operator.v0.0.316-140-commit-05b62d3-633c7ba Automatic true
install-mfn4x toolchain-host-operator.v0.0.315-140-commit-8e834dc-633c7ba Automatic true
install-r5c24 toolchain-host-operator.v0.0.316-141-commit-05b62d3-a2ed2a7 Automatic true
[root@preserve-olm-env 1905299]# oc get csv
NAME DISPLAY VERSION REPLACES PHASE
toolchain-host-operator.v0.0.316-141-commit-05b62d3-a2ed2a7 Toolchain Host Operator 0.0.316-141-commit-05b62d3-a2ed2a7 toolchain-host-operator.v0.0.316-140-commit-05b62d3-633c7ba Succeeded
--
Any chance to backport it to 4.6? Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |