Bug 1690816

Summary: Progress percentage calculation should be always increasing during upgrading
Product: OpenShift Container Platform Reporter: weiwei jiang <wjiang>
Component: Cluster Version OperatorAssignee: Abhinav Dahiya <adahiya>
Status: CLOSED WONTFIX QA Contact: liujia <jiajliu>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.1.0CC: aos-bugs, bleanhar, jokerman, jupierce, lxia, mmccomas, weliang, wking
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-03 01:04:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description weiwei jiang 2019-03-20 09:56:49 UTC
Description of problem:
During upgrading, found the progress reported in clusterversion is not increasing.
[root@preserved-bind-and-bastion ~]# oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-19-220030   True        True          8m51s   Working towards 4.0.0-0.nightly-2019-03-19-220030: 33% complete                                                                                                                                                                                              
[root@preserved-bind-and-bastion ~]# oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-19-220030   True        True          12m     Working towards 4.0.0-0.nightly-2019-03-19-220030: 46% complete
[root@preserved-bind-and-bastion ~]# oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-19-220030   True        True          15m     Working towards 4.0.0-0.nightly-2019-03-19-220030: 2% complete
[root@preserved-bind-and-bastion ~]# oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-19-220030   True        True          16m     Working towards 4.0.0-0.nightly-2019-03-19-220030: 2% complete
[root@preserved-bind-and-bastion ~]# oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-19-220030   True        True          16m     Working towards 4.0.0-0.nightly-2019-03-19-220030: 2% complete


Version-Release number of the following components:
Upgrade from 4.0.0-0.nightly-2019-03-19-004004 to 4.0.0-0.nightly-2019-03-19-220030

How reproducible:
Always

Steps to Reproduce:
1. Install version 4.0.0-0.nightly-2019-03-19-004004
2. Upgrade it to 4.0.0-0.nightly-2019-03-19-220030
3. Check progress percentage during upgrading

Actual results:
progress percentage is not increasing

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 weiwei jiang 2019-03-22 02:13:28 UTC
Recheck this and this means there are 2 round upgrade happen here? so the percentage is re-calculated?

version   4.0.0-0.nightly-2019-03-22-002648   True        True          17m     Working towards 4.0.0-0.nightly-2019-03-22-002648: 33% complete
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          17m     Working towards 4.0.0-0.nightly-2019-03-22-002648: 33% complete
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          17m     Working towards 4.0.0-0.nightly-2019-03-22-002648: 33% complete
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          17m     Working towards 4.0.0-0.nightly-2019-03-22-002648: 33% complete
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          17m     Working towards 4.0.0-0.nightly-2019-03-22-002648: 33% complete
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          17m     Working towards 4.0.0-0.nightly-2019-03-22-002648: 33% complete
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          17m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          17m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          17m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          18m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          18m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          18m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          18m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          18m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          18m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          18m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          18m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          19m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          19m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          19m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          19m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          19m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          19m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          19m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          19m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          19m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          20m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          20m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          20m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          20m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          20m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          20m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          20m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          20m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          20m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          21m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          21m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          21m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
The connection to the server api.wjiang-ocp.qe.devcluster.openshift.com:6443 was refused - did you specify the right host or port?
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          21m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
Error from server (NotFound): Unable to list {"config.openshift.io" "v1" "clusterversions"}: the server could not find the requested resource (get clusterversions.config.openshift.io)
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          21m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          21m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          22m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          22m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          22m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          22m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          22m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          22m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          22m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          22m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          22m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          22m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          23m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          23m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          23m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          23m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          23m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          23m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          23m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          23m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          23m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          24m     Unable to apply 4.0.0-0.nightly-2019-03-22-002648: the cluster operator machine-config has not yet successfully rolled out
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          24m     Working towards 4.0.0-0.nightly-2019-03-22-002648: 2% complete
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-03-22-002648   True        True          24m     Working towards 4.0.0-0.nightly-2019-03-22-002648: 2% complete

Comment 2 W. Trevor King 2019-03-28 23:00:37 UTC
This is a cluster-version-operator issue, not an installer issue, so I'm reassigning to the Upgrade component.

Comment 3 W. Trevor King 2019-04-01 19:44:35 UTC
Colin points out that OS upgrades [1] can lead to node restarts, which could force cluster-version operator moves (as the node it was running on is drained for the upgrade), which will result in a new sync cycle (as the new CVO pod comes up somewhere else).  So if we want something closer to monotonic progress reports, we'd need each CVO sync cycle to take a look at the existing object state and silently fast-forward through tasks which were already in sync.

[1]: https://github.com/openshift/machine-config-operator/blob/2b28eb287e5bc7b654680a6e85f767ff05604371/docs/OSUpgrades.md

Comment 4 Abhinav Dahiya 2019-06-14 19:46:27 UTC
*** Bug 1720735 has been marked as a duplicate of this bug. ***

Comment 5 Abhinav Dahiya 2019-06-24 18:47:47 UTC
*** Bug 1723540 has been marked as a duplicate of this bug. ***

Comment 6 Weibin Liang 2019-06-24 20:34:45 UTC
Bug 1723540 also report two meaningless errors during upgrading:

[root@dhcp-41-193 ~]# oc get clusterversion
Error from server (NotFound): Unable to list "config.openshift.io/v1, Resource=clusterversions": the server could not find the requested resource (get clusterversions.config.openshift.io)
[root@dhcp-41-193 ~]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.2     True        True          7m27s   Working towards 4.1.3: 67% complete
[root@dhcp-41-193 ~]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.2     True        True          8m11s   Unable to apply 4.1.3: an unknown error has occurred

Comment 7 Abhinav Dahiya 2019-07-03 01:04:01 UTC
(In reply to W. Trevor King from comment #3)
> Colin points out that OS upgrades [1] can lead to node restarts, which could
> force cluster-version operator moves (as the node it was running on is
> drained for the upgrade), which will result in a new sync cycle (as the new
> CVO pod comes up somewhere else).  So if we want something closer to
> monotonic progress reports, we'd need each CVO sync cycle to take a look at
> the existing object state and silently fast-forward through tasks which were
> already in sync.
> 
> [1]:
> https://github.com/openshift/machine-config-operator/blob/
> 2b28eb287e5bc7b654680a6e85f767ff05604371/docs/OSUpgrades.md

CVO progress upgrade progress percent is informational and it is eventually increasing. And we have no plans to provide the never resetting gurantees.

Comment 8 W. Trevor King 2020-06-24 16:07:58 UTC
I'm hoping to address the % resets as part of a general reroll of % handling in bug 1768255.