Bug 1689024

Summary: Upgrade does not complete due to network being unable to roll out changes
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: NetworkingAssignee: Casey Callendrello <cdc>
Status: CLOSED ERRATA QA Contact: Meng Bo <bmeng>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: aos-bugs, bbennett, bmeng, wsun
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:45:52 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clayton Coleman 2019-03-15 00:22:50 UTC
Upgrade of https://openshift-release.svc.ci.openshift.org/releasestream/4.0.0-0.ci/release/4.0.0-0.ci-2019-03-14-221441?from=4.0.0-0.ci-2019-03-14-182738 wedges while upgrading, possibly due to a down node or other issue.

https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/167

Mar 14 23:26:07.448: INFO: cluster upgrade is failing: Cluster operator network has not yet reported success

        {
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterOperator",
            "metadata": {
                "creationTimestamp": "2019-03-14T22:27:56Z",
                "generation": 1,
                "name": "network",
                "resourceVersion": "35912",
                "selfLink": "/apis/config.openshift.io/v1/clusteroperators/network",
                "uid": "6a48e5db-46a8-11e9-b0a8-12211b99c2ac"
            },
            "spec": {},
            "status": {
                "conditions": [
                    {
                        "lastTransitionTime": "2019-03-14T22:28:06Z",
                        "status": "False",
                        "type": "Failing"
                    },
                    {
                        "lastTransitionTime": "2019-03-14T22:59:57Z",
                        "message": "DaemonSet \"openshift-sdn/sdn-controller\" is not available (awaiting 1 nodes)",
                        "reason": "Deploying",
                        "status": "True",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-03-14T22:59:57Z",
                        "message": "DaemonSet \"openshift-sdn/sdn-controller\" is not available (awaiting 1 nodes)",
                        "reason": "Deploying",
                        "status": "False",
                        "type": "Available"
                    }
                ],
                "extension": null,
                "versions": [
                    {
                        "name": "operator",
                        "version": "4.0.0-0.ci-2019-03-14-221441"
                    }
                ]
            }
        },```

Comment 1 Casey Callendrello 2019-03-15 13:21:48 UTC
We tweaked some of the availability logic in https://github.com/openshift/cluster-network-operator/pull/121, which merged about 15 hours ago. Can you re-run the test?

Comment 3 Wei Sun 2019-04-10 02:52:50 UTC
Please check if it could be verified.Thanks!

Comment 4 Meng Bo 2019-04-11 08:45:01 UTC
Try to upgrade from build 4.0.0-0.nightly-2019-04-05-165550 to 4.0.0-0.nightly-2019-04-10-182914
The network operator gets upgraded successfully.


2019/04/11 06:41:57 Updated ClusterOperator with status:
conditions:
- lastTransitionTime: "2019-04-11T05:23:07Z"
  status: "False"
  type: Failing
- lastTransitionTime: "2019-04-11T06:34:21Z"
  message: DaemonSet "openshift-multus/multus" is not available (awaiting 1 nodes)
  reason: Deploying
  status: "True"
  type: Progressing
- lastTransitionTime: "2019-04-11T05:23:32Z"
  status: "True"
  type: Available
extension: null
versions:
- name: operator
  version: 4.0.0-0.nightly-2019-04-05-165550
2019/04/11 06:41:57 Updated ClusterOperator with status:
conditions:
- lastTransitionTime: "2019-04-11T05:23:07Z"
  status: "False"
  type: Failing
- lastTransitionTime: "2019-04-11T06:41:57Z"
  status: "False"
  type: Progressing
- lastTransitionTime: "2019-04-11T05:23:32Z"
  status: "True"
  type: Available
extension: null
versions:
- name: operator
  version: 4.0.0-0.nightly-2019-04-10-182914
2019/04/11 06:43:38 Reconciling Network.config.openshift.io cluster
2019/04/11 06:43:38 Updated ClusterOperator with status:
conditions:
- lastTransitionTime: "2019-04-11T05:23:07Z"
  status: "False"
  type: Failing
- lastTransitionTime: "2019-04-11T06:41:57Z"
  status: "False"
  type: Progressing
- lastTransitionTime: "2019-04-11T05:23:32Z"
  status: "True"
  type: Available
extension: null
versions:
- name: operator
  version: 4.0.0-0.nightly-2019-04-10-182914

Comment 6 errata-xmlrpc 2019-06-04 10:45:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758