Bug 1857877
Summary: | Operator upgrades can delete existing CSV before completion | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Nick Hale <nhale> | |
Component: | OLM | Assignee: | Vu Dinh <vdinh> | |
OLM sub component: | OLM | QA Contact: | Jian Zhang <jiazha> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | urgent | |||
Priority: | urgent | CC: | aivaras.laimikis, alkazako, assingh, dageoffr, ecordell, kaczynsk, kangell, krizza, openshift-bugzilla-robot | |
Version: | 4.4 | |||
Target Milestone: | --- | |||
Target Release: | 4.7.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause: OLM deletes existing CSV before the operator upgrade is completed.
Consequence: The new CSV is stuck in Pending state
Fix: OLM will check ServiceAccount's ownership to ensure the new ServiceAccount is created for the new CSV because transitioning the new CSV into Succeeded state.
Result: The existing CSV will not be deleted until the new CSV reaches Succeeded state correctly.
|
Story Points: | --- | |
Clone Of: | ||||
: | 1947946 (view as bug list) | Environment: | ||
Last Closed: | 2021-02-24 15:13:58 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1904583, 1947946 |
Description
Nick Hale
2020-07-16 17:28:32 UTC
I can reproduce this on Cluster version is 4.7.0-0.nightly-2020-12-04-013308 1, Create the index image for etcd 0.9.2 version. [root@preserve-olm-env etcd]# opm alpha bundle build -c alpha -e alpha -d ./0.9.2/ -o -b docker -p etcd -t quay.io/olmqe/etcd-bundle:0.9.2-sa ... [root@preserve-olm-env etcd]# docker push quay.io/olmqe/etcd-bundle:0.9.2-sa The push refers to repository [quay.io/olmqe/etcd-bundle] 1f7e5652ecb7: Pushed f9cde18c30f6: Pushed 0.9.2-sa: digest: sha256:5aedf81994df417ea9a051738d499e7bd66b9faf1bf74be528d92b8a35fbae20 size: 732 [root@preserve-olm-env etcd]# opm index add -b quay.io/olmqe/etcd-bundle:0.9.2-sa -t quay.io/olmqe/etcd-index:0.9.2-sa INFO[0000] building the index bundles="[quay.io/olmqe/etcd-bundle:0.9.2-sa]" [root@preserve-olm-env etcd]# docker push quay.io/olmqe/etcd-index:0.9.2-sa The push refers to repository [quay.io/olmqe/etcd-index] ... 2, Modify the CRD etcdcluster for etcd 0.9.4 version, add an invalid OpenAPI schema: https://github.com/jianzhangbjz/community-operators/tree/bug-1857877/community-operators/etcd/0.9.4 1) Create a bundle image [root@preserve-olm-env etcd]# opm alpha bundle build -c alpha -e alpha -d ./0.9.4/ -o -b docker -p etcd -t quay.io/olmqe/etcd-bundle:0.9.4-sa ... 2) add the bundle image to 0.9.2 index image and generate a new index image: quay.io/olmqe/etcd-index:0.9.4-sa [root@preserve-olm-env etcd]# opm index add -f quay.io/olmqe/etcd-index:0.9.2-sa --mode semver -c docker -b quay.io/olmqe/etcd-bundle:0.9.4-sa -t quay.io/olmqe/etcd-index:0.9.4-sa INFO[0000] building the index bundles="[quay.io/olmqe/etcd-bundle:0.9.4-sa]" ... 3, Consume this index image on the cluster. [root@preserve-olm-env etcd]# cat /data/cs-etcd.yaml apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: etcd-test namespace: openshift-marketplace spec: displayName: Jian Test publisher: Jian sourceType: grpc image: quay.io/olmqe/etcd-index:0.9.4-sa updateStrategy: registryPoll: interval: 10m [root@preserve-olm-env etcd]# oc get catalogsource -n openshift-marketplace NAME DISPLAY TYPE PUBLISHER AGE ... etcd-test Jian Test grpc Jian 94m 4, subscribe to the etcd operator with manual approval. [root@preserve-olm-env etcd]# cat /data/og.yaml apiVersion: operators.coreos.com/v1 kind: OperatorGroup metadata: name: test-og namespace: default spec: targetNamespaces: - default [root@preserve-olm-env etcd]# cat /data/sub-0.9.2.yaml apiVersion: operators.coreos.com/v1alpha1 kind: Subscription metadata: name: etcd-sub namespace: default spec: installPlanApproval: Manual channel: alpha name: etcd source: etcd-test sourceNamespace: openshift-marketplace startingCSV: etcdoperator.v0.9.2 [root@preserve-olm-env etcd]# oc get sub -n default NAME PACKAGE SOURCE CHANNEL etcd-sub etcd etcd-test alpha [root@preserve-olm-env etcd]# oc get ip -n default NAME CSV APPROVAL APPROVED install-mc4cw etcdoperator.v0.9.2 Manual false [root@preserve-olm-env etcd]# oc get ip NAME CSV APPROVAL APPROVED install-jfnrv etcdoperator.v0.9.4 Manual false install-mc4cw etcdoperator.v0.9.2 Manual true [root@preserve-olm-env etcd]# oc get csv NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.2 etcd 0.9.2 Succeeded 5, Approve the 0.9.4 installplan [root@preserve-olm-env etcd]# oc get ip NAME CSV APPROVAL APPROVED install-jfnrv etcdoperator.v0.9.4 Manual true install-mc4cw etcdoperator.v0.9.2 Manual true [root@preserve-olm-env etcd]# oc get csv NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.2 etcd 0.9.2 Replacing etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Installing [root@preserve-olm-env etcd]# oc get csv NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Pending The etcd-operator ServiceAccount was deleted. [root@preserve-olm-env etcd]# oc get sa NAME SECRETS AGE builder 2 4h59m default 2 5h11m deployer 2 4h59m [root@preserve-olm-env etcd]# oc get csv etcdoperator.v0.9.4 -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: ClusterServiceVersion ... phase: Pending reason: RequirementsNotMet requirementStatus: ... - group: "" kind: ServiceAccount message: Service account does not exist name: etcd-operator status: NotPresent version: v1 Test it on the cluster that contains the fixed PR: Cluster version is 4.7.0-0.nightly-2020-12-07-232943 [root@preserve-olm-env data]# oc -n openshift-operator-lifecycle-manager exec catalog-operator-8649b7f8d5-f4lhq -- olm --version OLM version: 0.17.0 git commit: 4ee4e876522c4d1b97e59d96588b2468149673eb Rerun the above steps: 3, 4, 5 [root@preserve-olm-env data]# oc get sa NAME SECRETS AGE builder 2 31m default 2 45m deployer 2 31m etcd-operator 2 2m14s [root@preserve-olm-env data]# oc get csv NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.2 etcd 0.9.2 Replacing etcdoperator.v0.9.4 etcd 0.9.4 etcdoperator.v0.9.2 Pending ... The sa still exist and the owner is v0.9.2 csv. [root@preserve-olm-env data]# oc get sa etcd-operator -o yaml apiVersion: v1 imagePullSecrets: ... ownerReferences: - apiVersion: operators.coreos.com/v1alpha1 blockOwnerDeletion: false controller: false kind: ClusterServiceVersion name: etcdoperator.v0.9.2 uid: 6f4527b0-e200-49be-ab5d-7c3c387bc441 The error info is "Service account is not owned by this ClusterServiceVersion", LGTM. [root@preserve-olm-env data]# oc get csv etcdoperator.v0.9.4 -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: ClusterServiceVersion metadata: annotations: ... - group: "" kind: ServiceAccount message: Service account is not owned by this ClusterServiceVersion name: etcd-operator status: PresentNotSatisfied version: v1 verify it. *** Bug 1907586 has been marked as a duplicate of this bug. *** *** Bug 1904585 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |