1687881 – UPGRADE Automated upgrade tests have never passed

Bug 1687881 - UPGRADE Automated upgrade tests have never passed

Summary: UPGRADE Automated upgrade tests have never passed

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.1.0
Assignee:	Clayton Coleman
QA Contact:	liujia
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1683648 (view as bug list)
Depends On:
Blocks:	1664187
TreeView+	depends on / blocked

Reported:	2019-03-12 14:41 UTC by Clayton Coleman
Modified:	2019-06-04 10:45 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-04 10:45:33 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0758	0	None	None	None	2019-06-04 10:45:41 UTC

Description Clayton Coleman 2019-03-12 14:41:31 UTC

The new automated upgrade tests are failing due to what appears to be a certificate rotation / network connectivity issue.

https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/12

During upgrade a number of issues crop up, but one of the root issues is that etcd appears to be unreachable after upgrade.

2019-03-11 12:22:59.676142 I | embed: rejected connection from "127.0.0.1:52746" (error "tls: failed to verify client's certificate: x509: certificate specifies an incompatible key usage", ServerName "")
WARNING: 2019/03/11 12:22:59 Failed to dial 0.0.0.0:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.

Until e2e upgrade jobs have passed more than once this issue will remain open and top priority.

Comment 1 Dan Winship 2019-03-13 00:09:06 UTC

I'm looking at a missing OVS flow problem right now... I'll either reassign this bug to myself or else file a new bug blocking this one once I figure out if that's the entire problem

Comment 2 Dan Winship 2019-03-13 02:28:48 UTC

Clayton, would it be possible to make e2e-aws-upgrade grab a set of logs from the cluster immediately before kicking off the upgrade? Eg, in https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/22302/pull-ci-openshift-origin-master-e2e-aws-upgrade/2/, the cluster-network-operator log starts at 01:28:59, but the cluster was clearly running before that (CNO's first update to its operator status marks it as "Available: True").

Also, in this upgrade, it appears that none of the SDN pods were restarted (and, possibly as a result of that, the test passed). What exactly does the upgrade test do? It seems like it ought to fake an update of every image...

Comment 3 Clayton Coleman 2019-03-14 18:04:03 UTC

Fixed by https://github.com/openshift/origin/pull/22302

https://openshift-release.svc.ci.openshift.org/releasestream/4.0.0-0.ci/release/4.0.0-0.ci-2019-03-14-150906
https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/152

Comment 4 Clayton Coleman 2019-03-14 18:04:39 UTC

The bug you hit with e2e-aws-upgrade the PR job was fixed.  Will follow up with other bugs.

Comment 5 Anurag saxena 2019-03-14 20:28:39 UTC

Hi Clayton, 

Is it expected during the upgrade that the "oc get clusterversion" VERSION should report the version being upgraded? or i beleive it should show the old version until the new version is upgraded successfully 

Exisiting version on cluster
$ oc get clusteroperators.config.openshift.io | grep "NAME\|network"
NAME                                  VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
network                               4.0.0-0.nightly-2019-03-13-233958   True        False         False     15m

After oc adm upgrade,

# oc get clusterversion
NAME      VERSION                        AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.ci-2019-03-14-150906   True        True          6m6s    Working towards 4.0.0-0.ci-2019-03-14-150906: 9% complete


//Anurag

Comment 6 W. Trevor King 2019-03-15 23:48:29 UTC

*** Bug 1683648 has been marked as a duplicate of this bug. ***

Comment 8 errata-xmlrpc 2019-06-04 10:45:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Note You need to log in before you can comment on or make changes to this bug.