Bug 1975353

Summary: Operator update get stuck on a conflict error
Product: OpenShift Container Platform Reporter: Matous Jobanek <mjobanek>
Component: OLMAssignee: Kevin Rizza <krizza>
OLM sub component: OLM QA Contact: Jian Zhang <jiazha>
Status: CLOSED DUPLICATE Docs Contact:
Severity: urgent    
Priority: urgent CC: alkazako, bluddy
Version: 4.7   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-06-23 16:10:59 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Matous Jobanek 2021-06-23 13:38:01 UTC
Description of problem:

When updating our operator it sometimes get stuck with a conflict error:


 message: 'error updating CRD: toolchainconfigs.toolchain.dev.openshift.com: Operation
      cannot be fulfilled on customresourcedefinitions.apiextensions.k8s.io "toolchainconfigs.toolchain.dev.openshift.com":
      the object has been modified; please apply your changes to the latest version
      and try again'


The failed InstallPlan: https://gist.github.com/MatousJobanek/ea9fe074f8e54f56aa07cdcbdfb4e1b0


How reproducible:

It happened in our OSD cluster after creating these resources:


apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  labels:
    opsrc-provider: codeready-toolchain
  name: dev-sandbox-host
  namespace: toolchain-host-operator
spec:
  displayName: Dev Sandbox Operators
  image: quay.io/codeready-toolchain/hosted-toolchain-index:latest
  sourceType: grpc
  updateStrategy:
    registryPoll:
      interval: 5m0s


apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: dev-sandbox-host
  namespace: toolchain-host-operator
spec:
  targetNamespaces:
  - toolchain-host-operator



apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: dev-sandbox-host
  namespace: toolchain-host-operator
spec:
  channel: staging
  installPlanApproval: Automatic
  name: toolchain-host-operator
  source: dev-sandbox-host
  sourceNamespace: toolchain-host-operator
  startingCSV: toolchain-host-operator.v0.0.421-176-commit-b046d74-a52ab6f

After a few of updates, we hit the error.



Actual results:

The operator update get stuck

Expected results:

The OLM should retry and continue with updating the operator

Comment 3 Ben Luddy 2021-06-23 16:10:59 UTC

*** This bug has been marked as a duplicate of bug 1923111 ***

Comment 4 Matous Jobanek 2021-06-24 07:04:39 UTC
We faced the same problem again - it's for the third time in the last few days. 
This makes our platform unstable and may have significant impact on the end-users and their data. I'm marking the issue as "urgent".

Is there any other way to get rid of the error instead of re-installing the operator over and over again?

Comment 5 Matous Jobanek 2021-06-24 07:24:44 UTC
@bluddy You actually closed this bug, so raising the severity here doesn't make any sense, I guess :-/. Could you please raise the priority/severity at #1923111 ?