Bug 1763822 - Canceling the task graph partway though should be an error even if no tasks fail
Summary: Canceling the task graph partway though should be an error even if no tasks fail
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.2.z
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 4.2.z
Assignee: W. Trevor King
QA Contact: liujia
URL:
Whiteboard:
Depends On: 1763821
Blocks: 1763823
TreeView+ depends on / blocked
 
Reported: 2019-10-21 16:57 UTC by W. Trevor King
Modified: 2019-11-13 18:56 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1763821
Environment:
Last Closed: 2019-11-13 18:56:07 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 262 0 'None' closed Bug 1763822: A timeout in applying an upgrade should not result in the CVO thinking that it is reconciling the payload 2020-05-28 05:56:43 UTC
Red Hat Product Errata RHBA-2019:3303 0 None None None 2019-11-13 18:56:13 UTC

Description W. Trevor King 2019-10-21 16:57:09 UTC
+++ This bug was initially created as a clone of Bug #1763821 +++

From [1]:

2019-10-21T10:34:30.63940461Z I1021 10:34:30.639073       1 start.go:19] ClusterVersionOperator v1.0.0-106-g0725bd53-dirty
...
2019-10-21T10:34:31.132673574Z I1021 10:34:31.132635       1 sync_worker.go:453] Running sync quay.io/runcom/origin-release:v4.2-1196 (force=true) on generation 2 in state Updating at attempt 0
...
2019-10-21T10:40:16.168632703Z I1021 10:40:16.168604       1 sync_worker.go:579] Running sync for customresourcedefinition "baremetalhosts.metal3.io" (101 of 432)
2019-10-21T10:40:16.18425522Z I1021 10:40:16.184220       1 task_graph.go:583] Canceled worker 0
2019-10-21T10:40:16.184381244Z I1021 10:40:16.184360       1 task_graph.go:583] Canceled worker 3
...
2019-10-21T10:40:16.21772875Z I1021 10:40:16.217715       1 task_graph.go:603] Workers finished
2019-10-21T10:40:16.217777479Z I1021 10:40:16.217759       1 task_graph.go:611] Result of work: []
2019-10-21T10:40:16.217864206Z I1021 10:40:16.217846       1 task_graph.go:539] Stopped graph walker due to cancel
...
2019-10-21T10:43:08.743798997Z I1021 10:43:08.743740       1 sync_worker.go:453] Running sync quay.io/runcom/origin-release:v4.2-1196 (force=true) on generation 2 in state Reconciling at attempt 0
...

Where the CVO cancels some workers, sees that there are no errors, and decides "upgrade complete" despite never having attempted to push the bulk of its manifests. With this commit, the result of work will include several worker-canceled errors, and we'll take another upgrade round instead of declaring success and moving into reconciling.

[1]: https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.1/754/artifacts/e2e-aws-upgrade/must-gather/registry-svc-ci-openshift-org-origin-4-1-sha256-f8c863ea08d64eea7b3a9ffbbde9c01ca90501afe6c0707e9c35f0ed7e92a9df/namespaces/openshift-cluster-version/pods/cluster-version-operator-5f5d465967-t57b2/cluster-version-operator/cluster-version-operator/logs/current.log

Comment 1 W. Trevor King 2019-10-22 20:36:01 UTC
Manually linking to the PR because Jessica manually override the Bugzilla labels [1] so the Bugzilla bot won't link the PR for us.

[1]: https://github.com/openshift/cluster-version-operator/pull/262#event-2734698440

Comment 3 liujia 2019-10-25 03:14:13 UTC
I tried upgrade from 4.2.1 to 4.3.0-0.nightly-2019-10-24-203507, it failed.
Checked our ci test result on https://openshift-release.svc.ci.openshift.org, there is still not available for 4.2.1 to 4.3 upgrade path. 
So this bug's regression test is blocked.

Comment 4 liujia 2019-10-25 03:15:44 UTC
(In reply to liujia from comment #3)
> I tried upgrade from 4.2.1 to 4.3.0-0.nightly-2019-10-24-203507, it failed.
> Checked our ci test result on
> https://openshift-release.svc.ci.openshift.org, there is still not available
> for 4.2.1 to 4.3 upgrade path. 
> So this bug's regression test is blocked.

ignore this comment from the bug, wrong paste.

Comment 5 liujia 2019-10-29 05:51:32 UTC
Checked several recent ci test(941-945) on https://openshift-gce-devel.appspot.com/builds/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.1. Did not find the weird log.

According to https://bugzilla.redhat.com/show_bug.cgi?id=1763821#c2, do regression test against 4.1-4.2 path.

# ./oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.21    True        False         5m18s   Cluster version is 4.1.21

./oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.2.0-0.nightly-2019-10-28-140411   True        False         55m     Cluster version is 4.2.0-0.nightly-2019-10-28-140411

# oc get clusterversion -o json|jq .items[0].status.history
[
  {
    "completionTime": "2019-10-29T04:51:03Z",
    "image": "registry.svc.ci.openshift.org/ocp/release:4.2.0-0.nightly-2019-10-28-140411",
    "startedTime": "2019-10-29T03:57:50Z",
    "state": "Completed",
    "verified": false,
    "version": "4.2.0-0.nightly-2019-10-28-140411"
  },
  {
    "completionTime": "2019-10-29T03:52:20Z",
    "image": "registry.svc.ci.openshift.org/ocp/release@sha256:a68066e534c41010b3750f18d620abede491965d5b0e860f5717b626cde08e5b",
    "startedTime": "2019-10-29T03:38:20Z",
    "state": "Completed",
    "verified": false,
    "version": "4.1.21"
  }
]

Comment 7 errata-xmlrpc 2019-11-13 18:56:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:3303


Note You need to log in before you can comment on or make changes to this bug.