Bug 1694219

Summary:	cluster upgrade was reported as canceled
Product:	OpenShift Container Platform	Reporter:	Ben Parees <bparees>
Component:	Installer	Assignee:	Abhinav Dahiya <adahiya>
Installer sub component:	openshift-installer	QA Contact:	Johnny Liu <jialiu>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	high	CC:	bleanhar, jialiu, sponnaga
Version:	4.1.0	Keywords:	NeedsTestCase
Target Milestone:	---
Target Release:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-04 10:46:40 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Ben Parees 2019-03-29 19:49:08 UTC

Description of problem:
Mar 29 16:10:13.719: INFO: cluster upgrade is failing: update was cancelled at 150/315

https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/745

Not clear what would have canceled the update, I don't think the upgrade test/job does that.

Comment 1 Abhinav Dahiya 2019-04-10 17:34:00 UTC

https://github.com/openshift/cluster-version-operator/pull/158

Comment 2 Scott Dodson 2019-04-12 13:04:23 UTC

Better error messages should now be presented to the user, the user will need to investigate the failing operators.

Comment 4 Johnny Liu 2019-05-08 11:18:20 UTC

Test this bug from 4.1.0-0.nightly-2019-05-07-233329 to 4.1.0-0.nightly-2019-05-08-012425, and PASS.


1. Before trigger upgrade from 4.1.0-0.nightly-2019-05-07-233329 to 4.1.0-0.nightly-2019-05-08-012425, remove 'tag' for your cluster hostzone, delete *.apps DNS.
# aws route53 list-tags-for-resource --resource-type hostedzone --resource-id  ZM5GW91LZO60L
{
    "ResourceTagSet": {
        "ResourceType": "hostedzone", 
        "ResourceId": "ZM5GW91LZO60L", 
        "Tags": [
            {
                "Value": "2019-05-08T09:45:02.332342+00:00", 
                "Key": "openshift_creationDate"
            }, 
            {
                "Value": "owned", 
                "Key": "kubernetes.io/cluster/jialiu-upi2-2bt5d"
            }, 
            {
                "Value": "2019-05-10T09:45:02.332342+00:00", 
                "Key": "openshift_expirationDate"
            }
        ]
    }
}

2. Trigger upgrade.
3. Watch clusterversion output
[root@preserve-jialiu-ansible 20190508]# oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.1.0-0.nightly-2019-05-08-012425  --force
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.1.0-0.nightly-2019-05-08-012425

[root@preserve-jialiu-ansible 20190508]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             True        True          5s      Working towards registry.svc.ci.openshift.org/ocp/release:4.1.0-0.nightly-2019-05-08-012425: downloading update

[root@preserve-jialiu-ansible 20190508]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-08-012425   True        True          17s     Working towards 4.1.0-0.nightly-2019-05-08-012425: 1% complete

[root@preserve-jialiu-ansible 20190508]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-08-012425   True        True          7m29s   Unable to apply 4.1.0-0.nightly-2019-05-08-012425: an unknown error has occurred

[root@preserve-jialiu-ansible 20190508]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-08-012425   True        True          14m     Working towards 4.1.0-0.nightly-2019-05-08-012425: 83% complete, waiting on authentication, openshift-controller-manager

[root@preserve-jialiu-ansible 20190508]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             True        True          19m     Working towards registry.svc.ci.openshift.org/ocp/release:4.1.0-0.nightly-2019-05-08-012425: downloading update

[root@preserve-jialiu-ansible 20190508]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-08-012425   True        True          22m     Working towards 4.1.0-0.nightly-2019-05-08-012425: 83% complete, waiting on authentication

[root@preserve-jialiu-ansible 20190508]# oc describe clusteroperator authentication
<--snip-->
Status:
  Conditions:
    Last Transition Time:  2019-05-08T10:55:32Z
    Message:               Degraded: error checking current version: unable to check route health: failed to GET route: dial tcp: lookup oauth-openshift.apps.jialiu-upi2.qe1.devcluster.openshift.com on 172.30.0.10:53: no such host
    Reason:                DegradedOperatorSyncLoopError
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2019-05-08T10:35:02Z
    Reason:                AsExpected
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2019-05-08T10:00:30Z
    Reason:                AsExpected
    Status:                True
    Type:                  Available
    Last Transition Time:  2019-05-08T09:43:47Z
    Reason:                NoData
    Status:                Unknown
    Type:                  Upgradeable
<--snip-->

Comment 6 errata-xmlrpc 2019-06-04 10:46:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758