1687973 – UPGRADE network operator reports unavailable during upgrade

Bug 1687973 - UPGRADE network operator reports unavailable during upgrade

Summary: UPGRADE network operator reports unavailable during upgrade

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.1.0
Assignee:	Dan Winship
QA Contact:	Meng Bo
Docs Contact:
URL:
Whiteboard:	beta3blocker
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2019-03-12 19:31 UTC by Derek Carr
Modified:	2019-06-04 10:45 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-04 10:45:33 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-network-operator pull 121	0	None	None	None	2019-03-13 20:19:34 UTC
Red Hat Product Errata	RHBA-2019:0758	0	None	None	None	2019-06-04 10:45:41 UTC

Description Derek Carr 2019-03-12 19:31:18 UTC

Description of problem:

Installed 4.0.0-0.alpha-2019-03-12-143341 
Upgraded to 4.0.0-0.alpha-2019-03-12-153711

During cluster upgrade, the network operator reported unavailable.  It appeared to report unavailable when a master machine is rebooted.  Opening this bug to determine if available=false should occur during a machine reboot.  The network did upgrade, but available toggled.

Expected results:
Available should not go false during an upgrade.

Comment 1 Derek Carr 2019-03-13 03:07:24 UTC

this is the next reason we will fail upgrade tests as we try to ensure no operator goes unavailable during upgrades.

https://github.com/openshift/cluster-network-operator/blob/f4ef74c2d9179c7ccfecafb846f3fc800de01223/pkg/controller/statusmanager/status_manager.go#L239

this appears like problematic logic as the way i understand the flow, if the network operator is progressing to a new version it reports unavailable, even though a network is obviously available during the rollout across release versions.

Comment 2 Derek Carr 2019-03-13 03:10:13 UTC

see sample upgrade job runs:
https://deck-ci.svc.ci.openshift.org/?job=release-openshift-origin-installer-e2e-aws-upgrade-4.0

https://deck-ci.svc.ci.openshift.org/log?job=release-openshift-origin-installer-e2e-aws-upgrade-4.0&id=97

"version changed Failing to True: ClusterOperatorNotAvailable: Cluster operator network is still updating"

Comment 3 Casey Callendrello 2019-03-13 12:13:03 UTC

To danw.

So, we should be setting "available=true progressing=true"? What is the expected state as the daemonset updates roll out?

Comment 4 zhaozhanqi 2019-03-19 10:40:21 UTC

Verified this bug when upgraded from  4.0.0-0.nightly-2019-03-15-063749 to 4.0.0-0.nightly-2019-03-18-200009
 
The AVAILABLE still be 'True' during upgrade.

Comment 6 errata-xmlrpc 2019-06-04 10:45:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Note You need to log in before you can comment on or make changes to this bug.