Bug 1982046 - CVO gets stuck on resource deletion progress after re-creating the deleted resource
Summary: CVO gets stuck on resource deletion progress after re-creating the deleted re...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Jack Ottofaro
QA Contact: Yang Yang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-14 04:40 UTC by Yang Yang
Modified: 2021-10-18 17:53 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:39:52 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 629 0 None open Bug 1982046: Check for deletes in progress when not in UpdateMode 2021-07-14 18:55:03 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:53:09 UTC

Description Yang Yang 2021-07-14 04:40:06 UTC
Description of problem:
In the middle of cluster upgrade, `oc get is hello-openshift -n openshift` tells the resource Not Found, but CVO does not complete the resource removal. At the moment, re-creating the resource results in upgradeable=false and resource removal is never completed.

Version-Release number of the following components:

4.9.0-0.nightly-2021-07-12-095728

How reproducible:
1/1

Steps to Reproduce:

1. Install a cluster with 4.7

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.19    True        False         28m     Cluster version is 4.7.19

$ oc get is hello-openshift -n openshift
NAME              IMAGE REPOSITORY                                                             TAGS     UPDATED
hello-openshift   image-registry.openshift-image-registry.svc:5000/openshift/hello-openshift   latest   37 minutes ago

Imagestream hello-openshift is present

2. Upgrade it to 4.8, then to 4.9

$ oc adm upgrade --to-image=quay.io/openshift-release-dev/ocp-release@sha256:af238ba15d5dcb839e4f0adff5e87d36f4bae096d31f0f7192c726e3a35c203b --allow-explicit-upgrade 

$ oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:caf080df4b0a0f21c76a6f5385b6c4434090dc98444eddcdc6a2ee76e310b95b --allow-explicit-upgrade --force

Watching the imagestream hello-openshift, CVO deletes the imagestream hello-openshift as expected:

$ oc get is hello-openshift -n openshift
Error from server (NotFound): imagestreams.image.openshift.io "hello-openshift" not found

Then re-create the imagestream hello-openshift

$ cat hello-openshift.yaml 
apiVersion: image.openshift.io/v1
kind: ImageStream
metadata:
  namespace: openshift
  name: hello-openshift
  annotations:
    include.release.openshift.io/ibm-cloud-managed: "true"
    include.release.openshift.io/self-managed-high-availability: "true"
    include.release.openshift.io/single-node-developer: "true"
    include.release.openshift.io/single-node-production-edge: "true"
spec:
  lookupPolicy:
    local: true
  tags:
    - name: latest
      importPolicy:
        scheduled: true
      from:
        kind: DockerImage
        name: quay.io/openshift/origin-hello-openshift:latest


$ oc create -f hello-openshift.yaml 
imagestream.image.openshift.io/hello-openshift created

$ oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-rc.3   True        True          34m     Working towards 4.9.0-0.nightly-2021-07-12-095728: 530 of 680 done (77% complete)

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-07-12-095728   True        False         5m29s   Cluster version is 4.9.0-0.nightly-2021-07-12-095728

$ oc get is hello-openshift -n openshift
Error from server (NotFound): imagestreams.image.openshift.io "hello-openshift" not found

The recreated imagestream hello-openshift is deleted.

Actual results:
The re-created resource is deleted eventually. But CVO gets stuck on Resource deletions in progress and blocks the minor upgrade.

$ oc logs cluster-version-operator-58b44d8dd5-mkntw -n openshift-cluster-version | grep hello | grep -v 'sync\|throll'

I0714 01:58:49.720112       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift"
I0714 02:02:02.279562       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift"
I0714 02:05:14.838945       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift"
I0714 02:08:27.399272       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift"
I0714 02:11:39.959080       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift"
I0714 02:14:52.522081       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift"
I0714 02:18:05.095094       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift"
I0714 02:21:17.655458       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift"

$ oc get clusterversion -ojson | jq -r .items[].status
{
  "availableUpdates": null,
  "conditions": [
    {
      "lastTransitionTime": "2021-07-12T08:53:40Z",
      "message": "Done applying 4.9.0-0.nightly-2021-07-12-095728",
      "status": "True",
      "type": "Available"
    },
    {
      "lastTransitionTime": "2021-07-13T10:08:51Z",
      "status": "False",
      "type": "Failing"
    },
    {
      "lastTransitionTime": "2021-07-13T10:15:40Z",
      "message": "Cluster version is 4.9.0-0.nightly-2021-07-12-095728",
      "status": "False",
      "type": "Progressing"
    },
    {
      "lastTransitionTime": "2021-07-12T09:33:46Z",
      "message": "Unable to retrieve available updates: currently reconciling cluster version 4.9.0-0.nightly-2021-07-12-095728 not found in the \"stable-4.7\" channel",
      "reason": "VersionNotFound",
      "status": "False",
      "type": "RetrievedUpdates"
    },
    {
      "lastTransitionTime": "2021-07-13T10:18:29Z",
      "message": "Cluster minor level upgrades are not allowed while resource deletions are in progress; resources=imagestream \"openshift/hello-openshift\"",
      "reason": "ResourceDeletesInProgress",
      "status": "False",
      "type": "Upgradeable"
    }
  ],
  "desired": {
    "image": "registry.ci.openshift.org/ocp/release@sha256:caf080df4b0a0f21c76a6f5385b6c4434090dc98444eddcdc6a2ee76e310b95b",
    "version": "4.9.0-0.nightly-2021-07-12-095728"
  },
  "history": [
    {
      "completionTime": "2021-07-13T10:15:40Z",
      "image": "registry.ci.openshift.org/ocp/release@sha256:caf080df4b0a0f21c76a6f5385b6c4434090dc98444eddcdc6a2ee76e310b95b",
      "startedTime": "2021-07-13T08:01:52Z",
      "state": "Completed",
      "verified": true,
      "version": "4.9.0-0.nightly-2021-07-12-095728"
    },
    {
      "completionTime": "2021-07-13T07:55:30Z",
      "image": "quay.io/openshift-release-dev/ocp-release@sha256:af238ba15d5dcb839e4f0adff5e87d36f4bae096d31f0f7192c726e3a35c203b",
      "startedTime": "2021-07-13T07:54:48Z",
      "state": "Completed",
      "verified": false,
      "version": "4.8.0-rc.3"
    },
    {
      "completionTime": "2021-07-13T07:54:48Z",
      "image": "registry.ci.openshift.org/ocp/release@sha256:caf080df4b0a0f21c76a6f5385b6c4434090dc98444eddcdc6a2ee76e310b95b",
      "startedTime": "2021-07-13T07:45:05Z",
      "state": "Partial",
      "verified": false,
      "version": ""
    },
    {
      "completionTime": "2021-07-13T06:42:38Z",
      "image": "quay.io/openshift-release-dev/ocp-release@sha256:af238ba15d5dcb839e4f0adff5e87d36f4bae096d31f0f7192c726e3a35c203b",
      "startedTime": "2021-07-13T06:41:59Z",
      "state": "Completed",
      "verified": false,
      "version": "4.8.0-rc.3"
    },
    {
      "completionTime": "2021-07-13T06:41:59Z",
      "image": "registry.ci.openshift.org/ocp/release@sha256:4e6a1b011084ef13cc344af0fb219cc30855f8fb36797581373e8ef1fc0f0e4e",
      "startedTime": "2021-07-13T02:06:06Z",
      "state": "Partial",
      "verified": false,
      "version": ""
    },
    {
      "completionTime": "2021-07-12T10:42:22Z",
      "image": "quay.io/openshift-release-dev/ocp-release@sha256:af238ba15d5dcb839e4f0adff5e87d36f4bae096d31f0f7192c726e3a35c203b",
      "startedTime": "2021-07-12T09:33:01Z",
      "state": "Completed",
      "verified": true,
      "version": "4.8.0-rc.3"
    },
    {
      "completionTime": "2021-07-12T08:53:40Z",
      "image": "quay.io/openshift-release-dev/ocp-release@sha256:eafdac268e1f65053de423ba4a028e8de5133ab78e7954d76ed838bcf5f4f666",
      "startedTime": "2021-07-12T08:28:22Z",
      "state": "Completed",
      "verified": false,
      "version": "4.7.19"
    }
  ],
  "observedGeneration": 7,
  "versionHash": "5PaaCfUW7gM="
}


Expected results:
Either keep the resource or remove it completely.

Additional info:

Comment 1 Yang Yang 2021-07-14 04:50:02 UTC
CVO logs are available online https://drive.google.com/file/d/1H8kq1H903_DEZB2XCUaLaF-1Cv72vLXO/view?usp=sharing.

Comment 2 Jack Ottofaro 2021-07-14 14:10:52 UTC
From the logs provided I see the delete being requested:

I0713 08:34:15.088642       1 helper.go:65] Delete requested for imagestream "openshift/hello-openshift".

Assuming this is the one that results after you have manually re-created the imagestream.

The last delete related log I see is:

I0714 02:34:07.896946       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift"

So never see the "delete completed" log probably because we are probably no longer in UpdatingMode when the delete finally completes and therefore the delete logic, including checking for delete completion, never runs. That's a bug as it will never clear the Upgradeable=false. Good catch. Thanks!

Comment 4 Yang Yang 2021-07-28 02:02:46 UTC
Verifying with 4.9.0-0.nightly-2021-07-26-220837

1. Regression testing on installation passed, imagestream hello-openshift is not created.

2. The issue described in description is fixed.

$ oc adm upgrade --to-image=registry.ci.openshift.org/ocp/release@sha256:174648ea613381880c5f75c63e5dac78e330473ad93a530e434866c49e690a56 --allow-explicit-upgrade --force

$ oc get is hello-openshift -n openshift
Error from server (NotFound): imagestreams.image.openshift.io "hello-openshift" not found

$ oc create -f /tmp/hello-openshift.yaml -n openshift
imagestream.image.openshift.io/hello-openshift created

$ oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-rc.3   True        True          34m     Working towards 4.9.0-0.nightly-2021-07-26-220837: 561 of 716 done (78% complete)

$ oc get is hello-openshift -n openshift
NAME              IMAGE REPOSITORY                                                             TAGS     UPDATED
hello-openshift   image-registry.openshift-image-registry.svc:5000/openshift/hello-openshift   latest   54 seconds ago

$ oc get is hello-openshift -n openshift
Error from server (NotFound): imagestreams.image.openshift.io "hello-openshift" not found

$ oc get clusterversion
NAME      VERSION      AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-rc.3   True        True          56m     Working towards 4.9.0-0.nightly-2021-07-26-220837: 606 of 716 done (84% complete)

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-07-26-220837   True        False         2m19s   Cluster version is 4.9.0-0.nightly-2021-07-26-220837


$ grep hello-openshift * | grep -v 'throt\|sync'
2021-07-27T12:06:44+0000-cluster-version-operator-686d9dffd9-ssl8b.log:I0727 12:21:16.283680       1 helper.go:65] Delete requested for imagestream "openshift/hello-openshift".
2021-07-27T12:06:44+0000-cluster-version-operator-686d9dffd9-ssl8b.log:I0727 12:22:48.736950       1 upgradeable.go:229] Resource deletions in progress; resources=ConsoleQuickStart "ocs-install-tour",imagestream "openshift/hello-openshift"
2021-07-27T12:06:44+0000-cluster-version-operator-686d9dffd9-ssl8b.log:I0727 12:26:01.297677       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift",ConsoleQuickStart "ocs-install-tour"
2021-07-27T12:06:44+0000-cluster-version-operator-686d9dffd9-ssl8b.log:I0727 12:28:26.508633       1 helper.go:116] Delete of imagestream "openshift/hello-openshift" has already been requested.
2021-07-27T12:06:44+0000-cluster-version-operator-686d9dffd9-ssl8b.log:I0727 12:29:13.856989       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift"
2021-07-27T12:06:44+0000-cluster-version-operator-686d9dffd9-ssl8b.log:I0727 12:32:26.416845       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift"
2021-07-27T12:06:44+0000-cluster-version-operator-686d9dffd9-ssl8b.log:I0727 12:35:38.976630       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift"
2021-07-27T12:06:44+0000-cluster-version-operator-686d9dffd9-ssl8b.log:I0727 12:37:16.506803       1 helper.go:116] Delete of imagestream "openshift/hello-openshift" has already been requested.
2021-07-27T12:06:44+0000-cluster-version-operator-686d9dffd9-ssl8b.log:I0727 12:38:51.536174       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift"
2021-07-27T12:06:44+0000-cluster-version-operator-686d9dffd9-ssl8b.log:I0727 12:42:04.096542       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift"
2021-07-27T12:44:19+0000-cluster-version-operator-686d9dffd9-bq6xg.log:I0727 12:45:23.599269       1 helper.go:65] Delete requested for imagestream "openshift/hello-openshift".
2021-07-27T12:44:19+0000-cluster-version-operator-686d9dffd9-bq6xg.log:I0727 12:48:03.866345       1 upgradeable.go:229] Resource deletions in progress; resources=imagestream "openshift/hello-openshift"
2021-07-27T13:00:47+0000-cluster-version-operator-686d9dffd9-snxxc.log:W0727 13:04:15.126164       1 helper.go:97] imagestream "openshift/hello-openshift" has already been removed.

$ oc get clusterversion -ojson | jq -r '.items[].status|select(.type == "Upgradeable")'
null


CVO logs "imagestream "openshift/hello-openshift" has already been removed" and upgradeable is not blocked by imagestream "openshift/hello-openshift.

Comment 5 Yang Yang 2021-07-28 07:22:15 UTC
Moving it to verified state based on comment#4.

Comment 8 errata-xmlrpc 2021-10-18 17:39:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Comment 9 errata-xmlrpc 2021-10-18 17:52:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759


Note You need to log in before you can comment on or make changes to this bug.