Description of problem: This problem was observed in the e2e-aws-upgrade-4.1-to-4.2 CI job. https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.1-to-4.2/119 The top level failure message was: Jun 25 03:09:26.477: INFO: Unexpected error occurred: Cluster did not complete upgrade: timed out waiting for the condition: Cluster operator operator-lifecycle-manager-packageserver is still updating I looked in the OLM operator log: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.1-to-4.2/119/artifacts/e2e-aws-upgrade/pods/openshift-operator-lifecycle-manager_olm-operator-6fb8dc66f8-p6ssk_olm-operator.log and found the following error message repeated several times: time="2019-06-25T03:10:44Z" level=info msg="error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com \"packageserver.v0.9.0\": the object has been modified; please apply your changes to the latest version and try again" csv=packageserver.v0.9.0 id=a9I6B namespace=openshift-operator-lifecycle-manager phase=Installing E0625 03:10:44.262560 1 queueinformer_operator.go:274] sync {"update" "openshift-operator-lifecycle-manager/packageserver.v0.9.0"} failed: error updating ClusterServiceVersion status: Operation cannot be fulfilled on clusterserviceversions.operators.coreos.com "packageserver.v0.9.0": the object has been modified; please apply your changes to the latest version and try again Version-Release number of selected component (if applicable): operator-lifecycle-manager commit ID: 586ffaf57b5da9cc2301b01e2ea10ce6117928c9 operator-lifecycle-manager commit ID for the upgrade: 6bf64d01349f8ca67749cb8849edfeebe39b475f
Likely related to a PR merged yesterday - https://github.com/operator-framework/operator-lifecycle-manager/pull/863 where the operator-lifecycle-manager-packageserver ClusterOperator was added
Investigated the issue further, here are the findings. packageserver fails to deploy since it can't adopt ownership of the `APIService` object. The status of the new version of the csv (packageserver.v0.10.1) reflects this. apiVersion: operators.coreos.com/v1alpha1 kind: ClusterServiceVersion metadata: annotations: olm.operatorGroup: olm-operators olm.operatorNamespace: openshift-operator-lifecycle-manager olm.targetNamespaces: openshift-operator-lifecycle-manager creationTimestamp: 2019-06-25T02:05:19Z generation: 16 labels: olm.api.4bca9f23e412d79d: provided olm.clusteroperator.name: operator-lifecycle-manager-packageserver name: packageserver.v0.10.1 namespace: openshift-operator-lifecycle-manager status: certsLastUpdated: null certsRotateAt: null message: unable to adopt APIService phase: Failed conditions: - lastTransitionTime: 2019-06-25T02:05:19Z lastUpdateTime: 2019-06-25T02:05:19Z message: requirements not yet checked phase: Pending reason: RequirementsUnknown - lastTransitionTime: 2019-06-25T02:05:19Z lastUpdateTime: 2019-06-25T02:05:19Z message: unable to adopt APIService phase: Failed reason: OwnerConflict lastTransitionTime: 2019-06-25T02:05:19Z lastUpdateTime: 2019-06-25T02:05:19Z message: unable to adopt APIService phase: Failed reason: OwnerConflict The APIService object has the following labels "labels": { "olm.owner": "packageserver.v0.9.0", "olm.owner.kind": "ClusterServiceVersion", "olm.owner.namespace": "openshift-operator-lifecycle-manager" }, Because of a name mismatch ( csv name 4.1 -> 4.2 has changed from 'packageserver.v0.9.0' to 'packageserver.v0.10.1' ) and this is causing olm to throw an 'OwnerConflict' error. https://github.com/operator-framework/operator-lifecycle-manager/blob/master/pkg/controller/operators/olm/operator.go#L1411-L1413
Next steps: - Reproduce by adding an e2e test that simulates this upgrade scenario where the name of the csv changes. - Make changes to 'apiServiceOwnerConflicts' to make sure we can adopt an APIService when the current owner csv has been removed or is being replaced.
*** Bug 1724801 has been marked as a duplicate of this bug. ***
Setting priority appropriately, all upgrade jobs are blocked by this. If there’s not easy fix (a few hours), let’s revert the breaking change and make sure we test upgrades before we re-introduce it. Did the upgrade job pass when this PR merged? I would have expected it to fail, unless we delivered setting the value and then using it in two separate PRs.
All 4.1 to 4.2 upgrade jobs, that is
Still failing after https://github.com/operator-framework/operator-lifecycle-manager/pull/925#issuecomment-506928237 merged it looks like: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.1-to-4.2/131
This PR ( https://github.com/operator-framework/operator-lifecycle-manager/pull/937 ) should fix this issue. It's merged and included in this release - https://openshift-release.svc.ci.openshift.org/releasestream/4.2.0-0.ci/release/4.2.0-0.ci-2019-06-30-145631.
This failed again: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.1-to-4.2/189
And https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.1-to-4.2/190
Moving this back to New to ensure it gets attention. If you feel it's a new issue(I imagine it is), please open a new BZ and move this back to modified.
CSV status: lastTransitionTime: 2019-07-18T11:17:18Z lastUpdateTime: 2019-07-18T11:17:18Z message: unable to adopt APIService phase: Failed reason: OwnerConflict I think it's a regression, packageserver fails to install since it can't adopt ownership of the APIService object.
We met this issue again, see bug 1731123 for more details. I'd like to change the status back to ASSIGNED since there is a fixed PR before. But, it didn't work. I also change the version to 4.1.z since it's 4.1.z upgrading issue. Correct me if I'm wrong.
*** Bug 1731123 has been marked as a duplicate of this bug. ***
https://github.com/operator-framework/operator-lifecycle-manager/pull/957 should fix this issue. The code to ensure this issue doesn't occur wasn't looking at the right namespace because of the wrong syntax. I will follow up with an e2e test to verify this in the future.
Verification blocked due to 4.1 upgrade to 4.2 failed as https://bugzilla.redhat.com/show_bug.cgi?id=1733015 shows.
Changing Target Release to 4.2 since this issue occured in upgrade from 4.1 to 4.2. QE will double check while block issue is fixed.
Verified. OLM version: 0.11.0 git commit: d2209c409b35f1db4669c474044decc6995f624d $ oc get clusteroperator | grep package operator-lifecycle-manager-packageserver 4.2.0-0.nightly-2019-07-30-073644 True False False 13h $ oc get csv NAME DISPLAY VERSION REPLACES PHASE packageserver Package Server 0.11.0 Succeeded $ oc get csv packageserver -o yaml apiVersion: operators.coreos.com/v1alpha1 kind: ClusterServiceVersion metadata: annotations: olm.operatorGroup: olm-operators olm.operatorNamespace: openshift-operator-lifecycle-manager olm.targetNamespaces: openshift-operator-lifecycle-manager creationTimestamp: "2019-07-30T09:21:35Z" generation: 328 labels: olm.api.4bca9f23e412d79d: provided olm.clusteroperator.name: operator-lifecycle-manager-packageserver olm.version: 0.11.0 name: packageserver namespace: openshift-operator-lifecycle-manager resourceVersion: "1326273" selfLink: /apis/operators.coreos.com/v1alpha1/namespaces/openshift-operator-lifecycle-manager/clusterserviceversions/packageserver uid: 6d3684c3-b2ab-11e9-838b-0050568b69c6 status: certsLastUpdated: "2019-07-30T20:30:46Z" certsRotateAt: "2021-07-28T20:30:43Z" conditions: - lastTransitionTime: "2019-07-30T09:22:33Z" lastUpdateTime: "2019-07-30T09:22:33Z" message: requirements not yet checked phase: Pending reason: RequirementsUnknown - lastTransitionTime: "2019-07-30T09:22:38Z" lastUpdateTime: "2019-07-30T09:22:38Z" message: all requirements found, attempting install phase: InstallReady reason: AllRequirementsMet - lastTransitionTime: "2019-07-30T09:22:59Z" lastUpdateTime: "2019-07-30T09:22:59Z" message: waiting for install components to report healthy phase: Installing reason: InstallSucceeded - lastTransitionTime: "2019-07-30T09:22:59Z" lastUpdateTime: "2019-07-30T09:23:07Z" message: APIServices not installed phase: Installing reason: InstallWaiting - lastTransitionTime: "2019-07-30T09:23:32Z" lastUpdateTime: "2019-07-30T09:23:32Z" message: install strategy completed with no errors phase: Succeeded reason: InstallSucceeded - lastTransitionTime: "2019-07-30T20:29:48Z" lastUpdateTime: "2019-07-30T20:29:48Z" message: APIServices not installed phase: Failed reason: ComponentUnhealthy - lastTransitionTime: "2019-07-30T20:30:43Z" lastUpdateTime: "2019-07-30T20:30:43Z" message: APIServices not installed phase: Pending reason: NeedsReinstall - lastTransitionTime: "2019-07-30T20:30:43Z" lastUpdateTime: "2019-07-30T20:30:43Z" message: all requirements found, attempting install phase: InstallReady reason: AllRequirementsMet - lastTransitionTime: "2019-07-30T20:30:43Z" lastUpdateTime: "2019-07-30T20:30:43Z" message: waiting for install components to report healthy phase: Installing reason: InstallSucceeded - lastTransitionTime: "2019-07-30T20:30:43Z" lastUpdateTime: "2019-07-30T20:30:48Z" message: APIServices not installed phase: Installing reason: InstallWaiting - lastTransitionTime: "2019-07-30T20:31:11Z" lastUpdateTime: "2019-07-30T20:31:11Z" message: install strategy completed with no errors phase: Succeeded reason: InstallSucceeded lastTransitionTime: "2019-07-30T20:31:11Z" lastUpdateTime: "2019-07-30T20:31:11Z" message: install strategy completed with no errors phase: Succeeded reason: InstallSucceeded requirementStatus: - group: operators.coreos.com kind: ClusterServiceVersion message: CSV minKubeVersion (1.11.0) less than server version (v1.14.0+1682e38) name: packageserver status: Present version: v1alpha1 - group: apiregistration.k8s.io kind: APIService message: "" name: v1.packages.operators.coreos.com status: DeploymentFound version: v1 - dependents: - group: rbac.authorization.k8s.io kind: PolicyRule message: cluster rule:{"verbs":["create","get"],"apiGroups":["authorization.k8s.io"],"resources":["subjectaccessreviews"]} status: Satisfied version: v1beta1 - group: rbac.authorization.k8s.io kind: PolicyRule message: cluster rule:{"verbs":["get","list","watch"],"apiGroups":[""],"resources":["configmaps"]} status: Satisfied version: v1beta1 - group: rbac.authorization.k8s.io kind: PolicyRule message: cluster rule:{"verbs":["get","list","watch"],"apiGroups":["operators.coreos.com"],"resources":["catalogsources"]} status: Satisfied version: v1beta1 - group: rbac.authorization.k8s.io kind: PolicyRule message: cluster rule:{"verbs":["get","list"],"apiGroups":["packages.operators.coreos.com"],"resources":["packagemanifests"]} status: Satisfied version: v1beta1 group: "" kind: ServiceAccount message: "" name: olm-operator-serviceaccount status: Present version: v1 $ oc get clusterrole packageserver.v0.9.0-nl2jh -o yaml apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: creationTimestamp: "2019-07-30T01:29:19Z" labels: olm.owner: packageserver.v0.9.0 olm.owner.kind: ClusterServiceVersion olm.owner.namespace: openshift-operator-lifecycle-manager name: packageserver.v0.9.0-nl2jh resourceVersion: "6509" selfLink: /apis/rbac.authorization.k8s.io/v1/clusterroles/packageserver.v0.9.0-nl2jh uid: 73f91eaf-b269-11e9-8c50-0050568b2d02 rules: - apiGroups: - authorization.k8s.io resources: - subjectaccessreviews verbs: - create - get - apiGroups: - "" resources: - configmaps verbs: - get - list - watch - apiGroups: - operators.coreos.com resources: - catalogsources verbs: - get - list - watch - apiGroups: - packages.operators.coreos.com resources: - packagemanifests verbs: - get - list
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922