Bug 1689024

Summary:	Upgrade does not complete due to network being unable to roll out changes
Product:	OpenShift Container Platform	Reporter:	Clayton Coleman <ccoleman>
Component:	Networking	Assignee:	Casey Callendrello <cdc>
Status:	CLOSED ERRATA	QA Contact:	Meng Bo <bmeng>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	4.1.0	CC:	aos-bugs, bbennett, bmeng, wsun
Target Milestone:	---
Target Release:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-06-04 10:45:52 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Clayton Coleman 2019-03-15 00:22:50 UTC

Upgrade of https://openshift-release.svc.ci.openshift.org/releasestream/4.0.0-0.ci/release/4.0.0-0.ci-2019-03-14-221441?from=4.0.0-0.ci-2019-03-14-182738 wedges while upgrading, possibly due to a down node or other issue.

https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/167

Mar 14 23:26:07.448: INFO: cluster upgrade is failing: Cluster operator network has not yet reported success

        {
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterOperator",
            "metadata": {
                "creationTimestamp": "2019-03-14T22:27:56Z",
                "generation": 1,
                "name": "network",
                "resourceVersion": "35912",
                "selfLink": "/apis/config.openshift.io/v1/clusteroperators/network",
                "uid": "6a48e5db-46a8-11e9-b0a8-12211b99c2ac"
            },
            "spec": {},
            "status": {
                "conditions": [
                    {
                        "lastTransitionTime": "2019-03-14T22:28:06Z",
                        "status": "False",
                        "type": "Failing"
                    },
                    {
                        "lastTransitionTime": "2019-03-14T22:59:57Z",
                        "message": "DaemonSet \"openshift-sdn/sdn-controller\" is not available (awaiting 1 nodes)",
                        "reason": "Deploying",
                        "status": "True",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-03-14T22:59:57Z",
                        "message": "DaemonSet \"openshift-sdn/sdn-controller\" is not available (awaiting 1 nodes)",
                        "reason": "Deploying",
                        "status": "False",
                        "type": "Available"
                    }
                ],
                "extension": null,
                "versions": [
                    {
                        "name": "operator",
                        "version": "4.0.0-0.ci-2019-03-14-221441"
                    }
                ]
            }
        },```

Comment 1 Casey Callendrello 2019-03-15 13:21:48 UTC

We tweaked some of the availability logic in https://github.com/openshift/cluster-network-operator/pull/121, which merged about 15 hours ago. Can you re-run the test?

Comment 3 Wei Sun 2019-04-10 02:52:50 UTC

Please check if it could be verified.Thanks!

Comment 4 Meng Bo 2019-04-11 08:45:01 UTC

Try to upgrade from build 4.0.0-0.nightly-2019-04-05-165550 to 4.0.0-0.nightly-2019-04-10-182914
The network operator gets upgraded successfully.


2019/04/11 06:41:57 Updated ClusterOperator with status:
conditions:
- lastTransitionTime: "2019-04-11T05:23:07Z"
  status: "False"
  type: Failing
- lastTransitionTime: "2019-04-11T06:34:21Z"
  message: DaemonSet "openshift-multus/multus" is not available (awaiting 1 nodes)
  reason: Deploying
  status: "True"
  type: Progressing
- lastTransitionTime: "2019-04-11T05:23:32Z"
  status: "True"
  type: Available
extension: null
versions:
- name: operator
  version: 4.0.0-0.nightly-2019-04-05-165550
2019/04/11 06:41:57 Updated ClusterOperator with status:
conditions:
- lastTransitionTime: "2019-04-11T05:23:07Z"
  status: "False"
  type: Failing
- lastTransitionTime: "2019-04-11T06:41:57Z"
  status: "False"
  type: Progressing
- lastTransitionTime: "2019-04-11T05:23:32Z"
  status: "True"
  type: Available
extension: null
versions:
- name: operator
  version: 4.0.0-0.nightly-2019-04-10-182914
2019/04/11 06:43:38 Reconciling Network.config.openshift.io cluster
2019/04/11 06:43:38 Updated ClusterOperator with status:
conditions:
- lastTransitionTime: "2019-04-11T05:23:07Z"
  status: "False"
  type: Failing
- lastTransitionTime: "2019-04-11T06:41:57Z"
  status: "False"
  type: Progressing
- lastTransitionTime: "2019-04-11T05:23:32Z"
  status: "True"
  type: Available
extension: null
versions:
- name: operator
  version: 4.0.0-0.nightly-2019-04-10-182914

Comment 6 errata-xmlrpc 2019-06-04 10:45:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758