Bug 1703158 - CVO takes more than 2 min to ack upgrade request
Summary: CVO takes more than 2 min to ack upgrade request
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.1.0
Assignee: Abhinav Dahiya
QA Contact: Gaoyun Pei
URL:
Whiteboard:
: 1703140 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-25 16:32 UTC by Clayton Coleman
Modified: 2019-06-04 10:48 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:48:02 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 None None None 2019-06-04 10:48:12 UTC

Description Clayton Coleman 2019-04-25 16:32:01 UTC
https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/314/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-aws-upgrade/20

The CVO timed out after 2 minutes without updating observed generation, which implies that the setting of desiredUpdate didn't correctly propagate to the sync worker and then cancel the current rollout.  Changes to desired update should propagate immediately.

I temporarily bumped the timeout to 5 min in origin but this is a serious bug and needs to be investigated and probably fixed before GA.

Comment 1 Abhinav Dahiya 2019-04-25 16:44:21 UTC
Looking at the logs
https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/314/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-aws-upgrade/20?log#log
```
Apr 25 15:37:14.264: INFO: Starting upgrade to version= image=registry.svc.ci.openshift.org/ci-op-c60fjs69/release@sha256:2a927947eac3e08e6b154d84b2b0f678b38087dbed6e8138cf7e492fbd8e9573
Apr 25 15:39:14.362: INFO: Current cluster version:
{
  "metadata": {
    "name": "version",
    "selfLink": "/apis/config.openshift.io/v1/clusterversions/version",
    "uid": "b6a1e5a3-676d-11e9-977b-12496c4a6d96",
    "resourceVersion": "17542",
    "generation": 2,
    "creationTimestamp": "2019-04-25T15:20:52Z"
  },
  "spec": {
    "clusterID": "34d74856-f667-4766-a862-1b6a0dc86d4e",
    "desiredUpdate": {
      "version": "",
      "image": "registry.svc.ci.openshift.org/ci-op-c60fjs69/release@sha256:2a927947eac3e08e6b154d84b2b0f678b38087dbed6e8138cf7e492fbd8e9573",
      "force": true
    },
    "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph",
    "channel": "stable-4.0"
  },
  "status": {
    "desired": {
      "version": "",
      "image": "registry.svc.ci.openshift.org/ci-op-c60fjs69/release@sha256:2a927947eac3e08e6b154d84b2b0f678b38087dbed6e8138cf7e492fbd8e9573",
      "force": false
    },
    "history": [
      {
        "state": "Partial",
        "startedTime": "2019-04-25T15:37:14Z",
        "completionTime": null,
        "version": "",
        "image": "registry.svc.ci.openshift.org/ci-op-c60fjs69/release@sha256:2a927947eac3e08e6b154d84b2b0f678b38087dbed6e8138cf7e492fbd8e9573",
        "verified": false
      },
      {
        "state": "Completed",
        "startedTime": "2019-04-25T15:21:08Z",
        "completionTime": "2019-04-25T15:37:14Z",
        "version": "0.0.1-2019-04-25-150512",
        "image": "registry.svc.ci.openshift.org/ci-op-c60fjs69/release@sha256:2ad201f2bdba0ca66750d43afe7bdbefa236ebf70fc0a659d7d9490be2ece946",
        "verified": false
      }
    ],
    "observedGeneration": 0,
    "versionHash": "3f2ucK9TMPg=",
    "conditions": [
      {
        "type": "Available",
        "status": "True",
        "lastTransitionTime": "2019-04-25T15:34:47Z",
        "message": "Done applying 0.0.1-2019-04-25-150512"
      },
      {
        "type": "Failing",
        "status": "False",
        "lastTransitionTime": "2019-04-25T15:25:40Z"
      },
      {
        "type": "Progressing",
        "status": "True",
        "lastTransitionTime": "2019-04-25T15:37:14Z",
        "reason": "DownloadingUpdate",
        "message": "Working towards registry.svc.ci.openshift.org/ci-op-c60fjs69/release@sha256:2a927947eac3e08e6b154d84b2b0f678b38087dbed6e8138cf7e492fbd8e9573: downloading update"
      },
      {
        "type": "RetrievedUpdates",
        "status": "False",
        "lastTransitionTime": "2019-04-25T15:21:08Z",
        "reason": "RemoteFailed",
        "message": "Unable to retrieve available updates: currently installed version 0.0.1-2019-04-25-150512 not found in the \"stable-4.0\" channel"
      }
    ],
    "availableUpdates": null
  }
}
```

CVO did update the .status.desiredUpdate although the .status.observedGeneration is 0

Comment 2 Clayton Coleman 2019-04-25 16:46:06 UTC
This is 20% of upgrade Ci failures.

Comment 3 W. Trevor King 2019-04-25 17:32:27 UTC
Mitigated via [1] while we wait for a fix.

[1]: https://github.com/openshift/origin/pull/22670

Comment 4 Brenton Leanhardt 2019-04-25 17:34:09 UTC
*** Bug 1703140 has been marked as a duplicate of this bug. ***

Comment 5 Clayton Coleman 2019-04-29 01:04:08 UTC
Fixed in https://github.com/openshift/cluster-version-operator/pull/176

Comment 6 Gaoyun Pei 2019-05-08 11:38:56 UTC
After checking the error log in https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/314/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-aws-upgrade/20/build-log.txt
It seems the error was thrown by the e2e testing framework. So confirmed in some subsequent e2e-aws-upgrade ci testing, no such error found again. For example in:
https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/183/pull-ci-openshift-cluster-version-operator-master-e2e-aws-upgrade/53
https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_cluster-version-operator/182/pull-ci-openshift-cluster-version-operator-master-e2e-aws-upgrade/59

So move this bug to verified since the proposed PR already merged. Feel free to leave comments here if there's some better way for QE to verify this issue, thanks.

Comment 8 errata-xmlrpc 2019-06-04 10:48:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.