Bug 1723818
Summary: | OLM upgrade failure from 4.1 to 4.2 due to packageserver csv OwnerConflict | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Russell Bryant <rbryant> |
Component: | OLM | Assignee: | Evan Cordell <ecordell> |
OLM sub component: | OLM | QA Contact: | Cuiping HUO <chuo> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | akashem, bandrade, bparees, ccoleman, chezhang, dhellmann, jfan, jiazha, markmc, mstaeble, scolange |
Version: | 4.1.z | Keywords: | Regression |
Target Milestone: | --- | ||
Target Release: | 4.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-10-16 06:32:26 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1733015 | ||
Bug Blocks: |
Description
Russell Bryant
2019-06-25 13:34:43 UTC
Likely related to a PR merged yesterday - https://github.com/operator-framework/operator-lifecycle-manager/pull/863 where the operator-lifecycle-manager-packageserver ClusterOperator was added Investigated the issue further, here are the findings. packageserver fails to deploy since it can't adopt ownership of the `APIService` object. The status of the new version of the csv (packageserver.v0.10.1) reflects this. apiVersion: operators.coreos.com/v1alpha1 kind: ClusterServiceVersion metadata: annotations: olm.operatorGroup: olm-operators olm.operatorNamespace: openshift-operator-lifecycle-manager olm.targetNamespaces: openshift-operator-lifecycle-manager creationTimestamp: 2019-06-25T02:05:19Z generation: 16 labels: olm.api.4bca9f23e412d79d: provided olm.clusteroperator.name: operator-lifecycle-manager-packageserver name: packageserver.v0.10.1 namespace: openshift-operator-lifecycle-manager status: certsLastUpdated: null certsRotateAt: null message: unable to adopt APIService phase: Failed conditions: - lastTransitionTime: 2019-06-25T02:05:19Z lastUpdateTime: 2019-06-25T02:05:19Z message: requirements not yet checked phase: Pending reason: RequirementsUnknown - lastTransitionTime: 2019-06-25T02:05:19Z lastUpdateTime: 2019-06-25T02:05:19Z message: unable to adopt APIService phase: Failed reason: OwnerConflict lastTransitionTime: 2019-06-25T02:05:19Z lastUpdateTime: 2019-06-25T02:05:19Z message: unable to adopt APIService phase: Failed reason: OwnerConflict The APIService object has the following labels "labels": { "olm.owner": "packageserver.v0.9.0", "olm.owner.kind": "ClusterServiceVersion", "olm.owner.namespace": "openshift-operator-lifecycle-manager" }, Because of a name mismatch ( csv name 4.1 -> 4.2 has changed from 'packageserver.v0.9.0' to 'packageserver.v0.10.1' ) and this is causing olm to throw an 'OwnerConflict' error. https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/olm/operator.go#L1411-L1413 Next steps: - Reproduce by adding an e2e test that simulates this upgrade scenario where the name of the csv changes. - Make changes to 'apiServiceOwnerConflicts' to make sure we can adopt an APIService when the current owner csv has been removed or is being replaced. *** Bug 1724801 has been marked as a duplicate of this bug. *** Setting priority appropriately, all upgrade jobs are blocked by this. If there’s not easy fix (a few hours), let’s revert the breaking change and make sure we test upgrades before we re-introduce it. Did the upgrade job pass when this PR merged? I would have expected it to fail, unless we delivered setting the value and then using it in two separate PRs. All 4.1 to 4.2 upgrade jobs, that is Still failing after https://github.com/operator-framework/operator-lifecycle-manager/pull/925#issuecomment-506928237 merged it looks like: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.1-to-4.2/131 This PR ( https://github.com/operator-framework/operator-lifecycle-manager/pull/937 ) should fix this issue. It's merged and included in this release - https://openshift-release.svc.ci.openshift.org/releasestream/4.2.0-0.ci/release/4.2.0-0.ci-2019-06-30-145631. This failed again: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.1-to-4.2/189 Moving this back to New to ensure it gets attention. If you feel it's a new issue(I imagine it is), please open a new BZ and move this back to modified. CSV status: lastTransitionTime: 2019-07-18T11:17:18Z lastUpdateTime: 2019-07-18T11:17:18Z message: unable to adopt APIService phase: Failed reason: OwnerConflict I think it's a regression, packageserver fails to install since it can't adopt ownership of the APIService object. We met this issue again, see bug 1731123 for more details. I'd like to change the status back to ASSIGNED since there is a fixed PR before. But, it didn't work. I also change the version to 4.1.z since it's 4.1.z upgrading issue. Correct me if I'm wrong. *** Bug 1731123 has been marked as a duplicate of this bug. *** https://github.com/operator-framework/operator-lifecycle-manager/pull/957 should fix this issue. The code to ensure this issue doesn't occur wasn't looking at the right namespace because of the wrong syntax. I will follow up with an e2e test to verify this in the future. Verification blocked due to 4.1 upgrade to 4.2 failed as https://bugzilla.redhat.com/show_bug.cgi?id=1733015 shows. Changing Target Release to 4.2 since this issue occured in upgrade from 4.1 to 4.2. QE will double check while block issue is fixed. Verified. OLM version: 0.11.0 git commit: d2209c409b35f1db4669c474044decc6995f624d $ oc get clusteroperator | grep package operator-lifecycle-manager-packageserver 4.2.0-0.nightly-2019-07-30-073644 True False False 13h $ oc get csv NAME DISPLAY VERSION REPLACES PHASE packageserver Package Server 0.11.0 Succeeded $ oc get csv packageserver -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: ClusterServiceVersion metadata: annotations: olm.operatorGroup: olm-operators olm.operatorNamespace: openshift-operator-lifecycle-manager olm.targetNamespaces: openshift-operator-lifecycle-manager creationTimestamp: "2019-07-30T09:21:35Z" generation: 328 labels: olm.api.4bca9f23e412d79d: provided olm.clusteroperator.name: operator-lifecycle-manager-packageserver olm.version: 0.11.0 name: packageserver namespace: openshift-operator-lifecycle-manager resourceVersion: "1326273" selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-operator-lifecycle-manager/clusterserviceversions/packageserver uid: 6d3684c3-b2ab-11e9-838b-0050568b69c6 status: certsLastUpdated: "2019-07-30T20:30:46Z" certsRotateAt: "2021-07-28T20:30:43Z" conditions: - lastTransitionTime: "2019-07-30T09:22:33Z" lastUpdateTime: "2019-07-30T09:22:33Z" message: requirements not yet checked phase: Pending reason: RequirementsUnknown - lastTransitionTime: "2019-07-30T09:22:38Z" lastUpdateTime: "2019-07-30T09:22:38Z" message: all requirements found, attempting install phase: InstallReady reason: AllRequirementsMet - lastTransitionTime: "2019-07-30T09:22:59Z" lastUpdateTime: "2019-07-30T09:22:59Z" message: waiting for install components to report healthy phase: Installing reason: InstallSucceeded - lastTransitionTime: "2019-07-30T09:22:59Z" lastUpdateTime: "2019-07-30T09:23:07Z" message: APIServices not installed phase: Installing reason: InstallWaiting - lastTransitionTime: "2019-07-30T09:23:32Z" lastUpdateTime: "2019-07-30T09:23:32Z" message: install strategy completed with no errors phase: Succeeded reason: InstallSucceeded - lastTransitionTime: "2019-07-30T20:29:48Z" lastUpdateTime: "2019-07-30T20:29:48Z" message: APIServices not installed phase: Failed reason: ComponentUnhealthy - lastTransitionTime: "2019-07-30T20:30:43Z" lastUpdateTime: "2019-07-30T20:30:43Z" message: APIServices not installed phase: Pending reason: NeedsReinstall - lastTransitionTime: "2019-07-30T20:30:43Z" lastUpdateTime: "2019-07-30T20:30:43Z" message: all requirements found, attempting install phase: InstallReady reason: AllRequirementsMet - lastTransitionTime: "2019-07-30T20:30:43Z" lastUpdateTime: "2019-07-30T20:30:43Z" message: waiting for install components to report healthy phase: Installing reason: InstallSucceeded - lastTransitionTime: "2019-07-30T20:30:43Z" lastUpdateTime: "2019-07-30T20:30:48Z" message: APIServices not installed phase: Installing reason: InstallWaiting - lastTransitionTime: "2019-07-30T20:31:11Z" lastUpdateTime: "2019-07-30T20:31:11Z" message: install strategy completed with no errors phase: Succeeded reason: InstallSucceeded lastTransitionTime: "2019-07-30T20:31:11Z" lastUpdateTime: "2019-07-30T20:31:11Z" message: install strategy completed with no errors phase: Succeeded reason: InstallSucceeded requirementStatus: - group: operators.coreos.com kind: ClusterServiceVersion message: CSV minKubeVersion (1.11.0) less than server version (v1.14.0+1682e38) name: packageserver status: Present version: v1alpha1 - group: apiregistration.k8s.io kind: APIService message: "" name: v1.packages.operators.coreos.com status: DeploymentFound version: v1 - dependents: - group: rbac.authorization.k8s.io kind: PolicyRule message: cluster rule:{"verbs":["create","get"],"apiGroups":["authorization.k8s.io"],"resources":["subjectaccessreviews"]} status: Satisfied version: v1beta1 - group: rbac.authorization.k8s.io kind: PolicyRule message: cluster rule:{"verbs":["get","list","watch"],"apiGroups":[""],"resources":["configmaps"]} status: Satisfied version: v1beta1 - group: rbac.authorization.k8s.io kind: PolicyRule message: cluster rule:{"verbs":["get","list","watch"],"apiGroups":["operators.coreos.com"],"resources":["catalogsources"]} status: Satisfied version: v1beta1 - group: rbac.authorization.k8s.io kind: PolicyRule message: cluster rule:{"verbs":["get","list"],"apiGroups":["packages.operators.coreos.com"],"resources":["packagemanifests"]} status: Satisfied version: v1beta1 group: "" kind: ServiceAccount message: "" name: olm-operator-serviceaccount status: Present version: v1 $ oc get clusterrole packageserver.v0.9.0-nl2jh -o yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: creationTimestamp: "2019-07-30T01:29:19Z" labels: olm.owner: packageserver.v0.9.0 olm.owner.kind: ClusterServiceVersion olm.owner.namespace: openshift-operator-lifecycle-manager name: packageserver.v0.9.0-nl2jh resourceVersion: "6509" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/packageserver.v0.9.0-nl2jh uid: 73f91eaf-b269-11e9-8c50-0050568b2d02 rules: - apiGroups: - authorization.k8s.io resources: - subjectaccessreviews verbs: - create - get - apiGroups: - "" resources: - configmaps verbs: - get - list - watch - apiGroups: - operators.coreos.com resources: - catalogsources verbs: - get - list - watch - apiGroups: - packages.operators.coreos.com resources: - packagemanifests verbs: - get - list Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |