Bug 1694219

Summary: cluster upgrade was reported as canceled
Product: OpenShift Container Platform Reporter: Ben Parees <bparees>
Component: InstallerAssignee: Abhinav Dahiya <adahiya>
Installer sub component: openshift-installer QA Contact: Johnny Liu <jialiu>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: bleanhar, jialiu, sponnaga
Version: 4.1.0Keywords: NeedsTestCase
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:46:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ben Parees 2019-03-29 19:49:08 UTC
Description of problem:
Mar 29 16:10:13.719: INFO: cluster upgrade is failing: update was cancelled at 150/315

https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/745

Not clear what would have canceled the update, I don't think the upgrade test/job does that.

Comment 2 Scott Dodson 2019-04-12 13:04:23 UTC
Better error messages should now be presented to the user, the user will need to investigate the failing operators.

Comment 4 Johnny Liu 2019-05-08 11:18:20 UTC
Test this bug from 4.1.0-0.nightly-2019-05-07-233329 to 4.1.0-0.nightly-2019-05-08-012425, and PASS.


1. Before trigger upgrade from 4.1.0-0.nightly-2019-05-07-233329 to 4.1.0-0.nightly-2019-05-08-012425, remove 'tag' for your cluster hostzone, delete *.apps DNS.
# aws route53 list-tags-for-resource --resource-type hostedzone --resource-id  ZM5GW91LZO60L
{
    "ResourceTagSet": {
        "ResourceType": "hostedzone", 
        "ResourceId": "ZM5GW91LZO60L", 
        "Tags": [
            {
                "Value": "2019-05-08T09:45:02.332342+00:00", 
                "Key": "openshift_creationDate"
            }, 
            {
                "Value": "owned", 
                "Key": "kubernetes.io/cluster/jialiu-upi2-2bt5d"
            }, 
            {
                "Value": "2019-05-10T09:45:02.332342+00:00", 
                "Key": "openshift_expirationDate"
            }
        ]
    }
}

2. Trigger upgrade.
3. Watch clusterversion output
[root@preserve-jialiu-ansible 20190508]# oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.1.0-0.nightly-2019-05-08-012425  --force
Updating to release image registry.svc.ci.openshift.org/ocp/release:4.1.0-0.nightly-2019-05-08-012425

[root@preserve-jialiu-ansible 20190508]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             True        True          5s      Working towards registry.svc.ci.openshift.org/ocp/release:4.1.0-0.nightly-2019-05-08-012425: downloading update

[root@preserve-jialiu-ansible 20190508]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-08-012425   True        True          17s     Working towards 4.1.0-0.nightly-2019-05-08-012425: 1% complete

[root@preserve-jialiu-ansible 20190508]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-08-012425   True        True          7m29s   Unable to apply 4.1.0-0.nightly-2019-05-08-012425: an unknown error has occurred

[root@preserve-jialiu-ansible 20190508]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-08-012425   True        True          14m     Working towards 4.1.0-0.nightly-2019-05-08-012425: 83% complete, waiting on authentication, openshift-controller-manager

[root@preserve-jialiu-ansible 20190508]# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             True        True          19m     Working towards registry.svc.ci.openshift.org/ocp/release:4.1.0-0.nightly-2019-05-08-012425: downloading update

[root@preserve-jialiu-ansible 20190508]# oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.1.0-0.nightly-2019-05-08-012425   True        True          22m     Working towards 4.1.0-0.nightly-2019-05-08-012425: 83% complete, waiting on authentication

[root@preserve-jialiu-ansible 20190508]# oc describe clusteroperator authentication
<--snip-->
Status:
  Conditions:
    Last Transition Time:  2019-05-08T10:55:32Z
    Message:               Degraded: error checking current version: unable to check route health: failed to GET route: dial tcp: lookup oauth-openshift.apps.jialiu-upi2.qe1.devcluster.openshift.com on 172.30.0.10:53: no such host
    Reason:                DegradedOperatorSyncLoopError
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2019-05-08T10:35:02Z
    Reason:                AsExpected
    Status:                False
    Type:                  Progressing
    Last Transition Time:  2019-05-08T10:00:30Z
    Reason:                AsExpected
    Status:                True
    Type:                  Available
    Last Transition Time:  2019-05-08T09:43:47Z
    Reason:                NoData
    Status:                Unknown
    Type:                  Upgradeable
<--snip-->

Comment 6 errata-xmlrpc 2019-06-04 10:46:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758