User-Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36 Build Identifier: Initial install of operator in a namespace succeeds, however if the same operator is installed in another namespace, the install plan can fail from CRDs resource modified error during install. Possibily some event that updates the CRD during the install causing install plan to fail. Reproducible: Sometimes Steps to Reproduce: 1. Install operator into namespace (e.g 3scale from OperatorHub. This will install CRDs onto the cluster 2. Run some script that constantly updates the a CRDs of the installed operator to mimic some event updating the CRD ``` while true; do oc patch crd apimanagers.apps.3scale.net --type=json -p='[{"op" : "add", "path" : "/metadata/labels/test", "value": "test"}]' oc patch crd apimanagers.apps.3scale.net --type=json -p='[{"op" : "remove", "path" : "/metadata/labels/test"}]' done ``` 3. Install the same operator into a different namespace 4. Inspect for failed install plan 5. If there is no failed install plan, uninstall and reinstall operator until failed install plan occurs Actual Results: Install plan sometimes fails due to resource modified Expected Results: Install plan should retry to install if resource is stale This was reproduced on some installs on RHOAM on OSD when installing user sso and observed a similiar error for openshift route monitor operator on the same cluster https://issues.redhat.com/browse/MGDAPI-1098
*** Bug 1919454 has been marked as a duplicate of this bug. ***
This seems like it could be potentially a race condition or some error handling that should be improved in OLM. For now, I'm marking this with the UpcomingSprint label and this will be investigated in a future sprint.
*** Bug 1925113 has been marked as a duplicate of this bug. ***
Given that this has been triaged, we are just waiting on this to be prioritized. Marking as reviewed in sprint.
*** Bug 1975353 has been marked as a duplicate of this bug. ***
What the status of this issue? This looks like a pretty bad bug and it cause serious issues in Dev Sandbox for OpenShift clusters :( Our operators often fail to update because of this bug. Is there anything we can do to help to prioritize this issue?
OLM version: 0.18.3 git commit: cf7140bf3c404454892c9c972b0d9e839a46f619 OCP: 4.9.0-0.nightly-2021-08-02-044755 1. Install operator into a namespace (e.g 3scale from OperatorHub. This will install CRDs onto the cluster oc get csv -n test-1 NAME DISPLAY VERSION REPLACES PHASE etcdoperator.v0.9.4 etcd 0.9.4 Succeeded 2. Run some script that constantly updates the CRDs of the installed operator to mimic some event updating the CRD ``` while true; do oc patch crd etcdclusters.etcd.database.coreos.com --type=json -p='[{"op" : "add", "path" : "/metadata/labels/test", "value": "test"}]' oc patch crd etcdclusters.etcd.database.coreos.com --type=json -p='[{"op" : "remove", "path" : "/metadata/labels/test"}]' done customresourcedefinition.apiextensions.k8s.io/etcdclusters.etcd.database.coreos.com patched customresourcedefinition.apiextensions.k8s.io/etcdclusters.etcd.database.coreos.com patched customresourcedefinition.apiextensions.k8s.io/etcdclusters.etcd.database.coreos.com patched customresourcedefinition.apiextensions.k8s.io/etcdclusters.etcd.database.coreos.com patched customresourcedefinition.apiextensions.k8s.io/etcdclusters.etcd.database.coreos.com patched customresourcedefinition.apiextensions.k8s.io/etcdclusters.etcd.database.coreos.com patched customresourcedefinition.apiextensions.k8s.io/etcdclusters.etcd.database.coreos.com patched customresourcedefinition.apiextensions.k8s.io/etcdclusters.etcd.database.coreos.com patched customresourcedefinition.apiextensions.k8s.io/etcdclusters.etcd.database.coreos.com patched customresourcedefinition.apiextensions.k8s.io/etcdclusters.etcd.database.coreos.com patched The request is invalid customresourcedefinition.apiextensions.k8s.io/etcdclusters.etcd.database.coreos.com patched customresourcedefinition.apiextensions.k8s.io/etcdclusters.etcd.database.coreos.com patched customresourcedefinition.apiextensions.k8s.io/etcdclusters.etcd.database.coreos.com patched customresourcedefinition.apiextensions.k8s.io/etcdclusters.etcd.database.coreos.com patched The requests fail when an Operator is being installed ``` 3. Install the same operator into a different namespace oc get csv --all-namespaces NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-operator-lifecycle-manager packageserver Package Server 0.18.3 Succeeded test-1 etcdoperator.v0.9.4 etcd 0.9.4 Succeeded test-2 etcdoperator.v0.9.4 etcd 0.9.4 Succeeded test-3 etcdoperator.v0.9.4 etcd 0.9.4 Succeeded 4. Inspect for failed install plan oc get ip --all-namespaces NAMESPACE NAME CSV APPROVAL APPROVED test-1 install-bxcm2 etcdoperator.v0.9.4 Automatic true test-2 install-p6qjp etcdoperator.v0.9.4 Automatic true test-3 install-qmv2h etcdoperator.v0.9.4 Automatic true test-4 install-4z7qj etcdoperator.v0.9.4 Automatic true LGTM, marking as VERIFIED
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759