Bug 1690162 - Cluster did not acknowledge request to upgrade:
Summary: Cluster did not acknowledge request to upgrade:
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.1.0
Assignee: Sergiusz Urbaniak
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-18 23:05 UTC by Ben Parees
Modified: 2019-10-31 23:24 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-06-04 10:46:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0758 0 None None None 2019-06-04 10:46:10 UTC

Description Ben Parees 2019-03-18 23:05:57 UTC
Description of problem:
fail [github.com/openshift/origin/test/e2e/upgrade/upgrade.go:81]: during upgrade
Expected error:
    <*errors.errorString | 0xc421f86580>: {
        s: "Cluster did not acknowledge request to upgrade: timed out waiting for the condition",
    }
    Cluster did not acknowledge request to upgrade: timed out waiting for the condition
not to have occurred


https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/234/

Version-Release number of the following components:
4.1

How reproducible:
Appear to be a flake.

Comment 2 Abhinav Dahiya 2019-03-19 00:15:50 UTC
From

`clsuterversion.yaml`

```
{
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterVersion",
            "metadata": {
                "creationTimestamp": "2019-03-17T04:27:04Z",
                "generation": 2,
                "name": "version",
                "resourceVersion": "65775",
                "selfLink": "/apis/config.openshift.io/v1/clusterversions/version",
                "uid": "ea967f21-486c-11e9-bd56-120db6c725be"
            },
            "spec": {
                "channel": "stable-4.0",
                "clusterID": "1323fa57-4285-43eb-a338-c36bfb4b0a07",
                "desiredUpdate": {
                    "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                    "version": ""
                },
                "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph"
            },
            "status": {
                "availableUpdates": null,
                "conditions": [
                    {
                        "lastTransitionTime": "2019-03-17T04:40:43Z",
                        "message": "Done applying 4.0.0-0.ci-2019-03-17-010422",
                        "status": "True",
                        "type": "Available"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T05:31:14Z",
                        "status": "False",
                        "type": "Failing"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T04:43:16Z",
                        "message": "Working towards 4.0.0-0.ci-2019-03-17-041405: 90% complete",
                        "status": "True",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T04:27:18Z",
                        "message": "Unable to retrieve available updates: unknown version 4.0.0-0.ci-2019-03-17-041405",
                        "reason": "RemoteFailed",
                        "status": "False",
                        "type": "RetrievedUpdates"
                    }
                ],
                "desired": {
                    "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                    "version": "4.0.0-0.ci-2019-03-17-041405"
                },
                "history": [
                    {
                        "completionTime": null,
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T05:34:01Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T05:34:01Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T04:44:17Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T04:44:17Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T04:43:16Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T04:43:16Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release@sha256:9972758232e864ad3f4756127ff4401ed1e313f4ae956f42e44354c83fa4d23b",
                        "startedTime": "2019-03-17T04:27:18Z",
                        "state": "Completed",
                        "version": "4.0.0-0.ci-2019-03-17-010422"
                    }
                ],
                "observedGeneration": 2,
                "versionHash": "vjzTSO3pwl0="
            }
        }
```

The desired version is `4.0.0-0.ci-2019-03-17-041405`

The only clusteroperator that is not at `4.0.0-0.ci-2019-03-17-041405` is monitoring

```
        {
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterOperator",
            "metadata": {
                "creationTimestamp": "2019-03-17T04:33:34Z",
                "generation": 1,
                "name": "monitoring",
                "resourceVersion": "56407",
                "selfLink": "/apis/config.openshift.io/v1/clusteroperators/monitoring",
                "uid": "d37049c6-486d-11e9-8028-0a28398410f8"
            },
            "spec": {},
            "status": {
                "conditions": [
                    {
                        "lastTransitionTime": "2019-03-17T05:22:26Z",
                        "message": "Successfully rolled out the stack.",
                        "status": "True",
                        "type": "Available"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T05:22:26Z",
                        "status": "False",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T05:22:26Z",
                        "status": "False",
                        "type": "Failing"
                    }
                ],
                "extension": null,
                "relatedObjects": null,
                "versions": [
                    {
                        "name": "operator",
                        "version": "4.0.0-0.ci-2019-03-17-010422"
                    }
                ]
            }
        },
```

Comment 3 Abhinav Dahiya 2019-03-19 00:16:24 UTC
From https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/

`clsuterversion.yaml`

```
{
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterVersion",
            "metadata": {
                "creationTimestamp": "2019-03-17T04:27:04Z",
                "generation": 2,
                "name": "version",
                "resourceVersion": "65775",
                "selfLink": "/apis/config.openshift.io/v1/clusterversions/version",
                "uid": "ea967f21-486c-11e9-bd56-120db6c725be"
            },
            "spec": {
                "channel": "stable-4.0",
                "clusterID": "1323fa57-4285-43eb-a338-c36bfb4b0a07",
                "desiredUpdate": {
                    "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                    "version": ""
                },
                "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph"
            },
            "status": {
                "availableUpdates": null,
                "conditions": [
                    {
                        "lastTransitionTime": "2019-03-17T04:40:43Z",
                        "message": "Done applying 4.0.0-0.ci-2019-03-17-010422",
                        "status": "True",
                        "type": "Available"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T05:31:14Z",
                        "status": "False",
                        "type": "Failing"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T04:43:16Z",
                        "message": "Working towards 4.0.0-0.ci-2019-03-17-041405: 90% complete",
                        "status": "True",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T04:27:18Z",
                        "message": "Unable to retrieve available updates: unknown version 4.0.0-0.ci-2019-03-17-041405",
                        "reason": "RemoteFailed",
                        "status": "False",
                        "type": "RetrievedUpdates"
                    }
                ],
                "desired": {
                    "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                    "version": "4.0.0-0.ci-2019-03-17-041405"
                },
                "history": [
                    {
                        "completionTime": null,
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T05:34:01Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T05:34:01Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T04:44:17Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T04:44:17Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T04:43:16Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T04:43:16Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release@sha256:9972758232e864ad3f4756127ff4401ed1e313f4ae956f42e44354c83fa4d23b",
                        "startedTime": "2019-03-17T04:27:18Z",
                        "state": "Completed",
                        "version": "4.0.0-0.ci-2019-03-17-010422"
                    }
                ],
                "observedGeneration": 2,
                "versionHash": "vjzTSO3pwl0="
            }
        }
```

The desired version is `4.0.0-0.ci-2019-03-17-041405`

The only clusteroperator that is not at `4.0.0-0.ci-2019-03-17-041405` is monitoring

```
        {
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterOperator",
            "metadata": {
                "creationTimestamp": "2019-03-17T04:33:34Z",
                "generation": 1,
                "name": "monitoring",
                "resourceVersion": "56407",
                "selfLink": "/apis/config.openshift.io/v1/clusteroperators/monitoring",
                "uid": "d37049c6-486d-11e9-8028-0a28398410f8"
            },
            "spec": {},
            "status": {
                "conditions": [
                    {
                        "lastTransitionTime": "2019-03-17T05:22:26Z",
                        "message": "Successfully rolled out the stack.",
                        "status": "True",
                        "type": "Available"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T05:22:26Z",
                        "status": "False",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T05:22:26Z",
                        "status": "False",
                        "type": "Failing"
                    }
                ],
                "extension": null,
                "relatedObjects": null,
                "versions": [
                    {
                        "name": "operator",
                        "version": "4.0.0-0.ci-2019-03-17-010422"
                    }
                ]
            }
        },
```

Comment 4 Abhinav Dahiya 2019-03-19 00:30:18 UTC
Also from CVO logs at https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/artifacts/e2e-aws-upgrade/pods/openshift-cluster-version_cluster-version-operator-565d946558-9ztzw_cluster-version-operator.log.gz

```
I0317 05:37:29.940297       1 sync_worker.go:462] Running sync for clusteroperator "monitoring" (197 of 303)
...
E0317 05:38:29.940627       1 task.go:58] error running apply for clusteroperator "monitoring" (197 of 303): Cluster operator monitoring is still updating
```

CVO was ONLY waiting for monitoring towards the end around `05:38` But the upgrade test had defined test failed at `05:37` looking at https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/#operator-run-template-e2e-aws-upgrade---e2e-aws-upgrade-container-test

Also looking at the `clusterversion.yaml` `.status.history` the upgrade was started at 04:43 and still after around 50 mins upgrade had not finished.
```
                    {
                        "completionTime": null,
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T05:34:01Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T05:34:01Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T04:44:17Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T04:44:17Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T04:43:16Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T04:43:16Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release@sha256:9972758232e864ad3f4756127ff4401ed1e313f4ae956f42e44354c83fa4d23b",
                        "startedTime": "2019-03-17T04:27:18Z",
                        "state": "Completed",
                        "version": "4.0.0-0.ci-2019-03-17-010422"
                    }
```

from an upgrade test that was green the history there shows upgrade finished in around similar time 35 mins for 4.0.0-0.ci-2019-03-18-223512

```
                    {
                        "completionTime": "2019-03-18T23:40:40Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-18-223512",
                        "startedTime": "2019-03-18T23:25:10Z",
                        "state": "Completed",
                        "version": "4.0.0-0.ci-2019-03-18-223512"
                    },
                    {
                        "completionTime": "2019-03-18T23:25:10Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-18-223512",
                        "startedTime": "2019-03-18T23:07:10Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-18-223512"
                    },
                    {
                        "completionTime": "2019-03-18T23:07:10Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-18-223512",
                        "startedTime": "2019-03-18T23:06:24Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-18-223512"
                    },
                    {
                        "completionTime": "2019-03-18T23:06:24Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release@sha256:0dfafc0de33a116643c8f89552ed3b82110ef56bf5671f12a16df72bede9e973",
                        "startedTime": "2019-03-18T22:52:31Z",
                        "state": "Completed",
                        "version": "4.0.0-0.ci-2019-03-18-202114"
                    }
```

Comment 5 Ben Parees 2019-03-19 00:54:01 UTC
> CVO was ONLY waiting for monitoring towards the end around `05:38` But the upgrade test had defined test failed at `05:37` 

Abhinav it sounds like you're saying that basically this wasn't a monitoring upgrade failure, it was a case where the entire upgrade took longer than the test was willing to wait, right?  i.e. something else caused the upgrade to take a long time (50+ mins instead of 35), earlier in the process.

Comment 6 W. Trevor King 2019-03-19 08:06:32 UTC
$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/artifacts/e2e-
aws-upgrade/pods/openshift-cluster-version_cluster-version-operator-565d946558-9ztzw_cluster-version-operator.log.gz | gunzip | head -n1
I0317 05:34:00.593130       1 start.go:23] ClusterVersionOperator v3.11.0-226-ga573b856-dirty

But https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/artifacts/e2e-aws-upgrade/pods/openshift-cluster-version_cluster-version-operator-565d946558-9ztzw_cluster-version-operator_previous.log.gz is empty?  Are there logs from the partial deploys started at 2019-03-17T04:43:16Z and 2019-03-17T04:44:17Z?  What causes a partial entry in the history?  Why is the current CVO pod so young?

Comment 10 W. Trevor King 2019-03-22 05:58:30 UTC
This error has not occurred in the 861 *-e2e-aws* CI runs since 2019-03-19T12:28Z (and possibly earlier).  Maybe some subsequent pull fixed it?  Or maybe it's just low-probability.

Comment 16 Junqi Zhao 2019-04-04 02:55:19 UTC
no such issue with our upgrade 
eg: upgrade OCP from 4.0.0-0.9 -> 4.0.0-0.10

Comment 18 errata-xmlrpc 2019-06-04 10:46:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758


Note You need to log in before you can comment on or make changes to this bug.