Hide Forgot
The new automated upgrade tests are failing due to what appears to be a certificate rotation / network connectivity issue. https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/12 During upgrade a number of issues crop up, but one of the root issues is that etcd appears to be unreachable after upgrade. 2019-03-11 12:22:59.676142 I | embed: rejected connection from "127.0.0.1:52746" (error "tls: failed to verify client's certificate: x509: certificate specifies an incompatible key usage", ServerName "") WARNING: 2019/03/11 12:22:59 Failed to dial 0.0.0.0:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry. Until e2e upgrade jobs have passed more than once this issue will remain open and top priority.
I'm looking at a missing OVS flow problem right now... I'll either reassign this bug to myself or else file a new bug blocking this one once I figure out if that's the entire problem
Clayton, would it be possible to make e2e-aws-upgrade grab a set of logs from the cluster immediately before kicking off the upgrade? Eg, in https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/22302/pull-ci-openshift-origin-master-e2e-aws-upgrade/2/, the cluster-network-operator log starts at 01:28:59, but the cluster was clearly running before that (CNO's first update to its operator status marks it as "Available: True"). Also, in this upgrade, it appears that none of the SDN pods were restarted (and, possibly as a result of that, the test passed). What exactly does the upgrade test do? It seems like it ought to fake an update of every image...
Fixed by https://github.com/openshift/origin/pull/22302 https://openshift-release.svc.ci.openshift.org/releasestream/4.0.0-0.ci/release/4.0.0-0.ci-2019-03-14-150906 https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/152
The bug you hit with e2e-aws-upgrade the PR job was fixed. Will follow up with other bugs.
Hi Clayton, Is it expected during the upgrade that the "oc get clusterversion" VERSION should report the version being upgraded? or i beleive it should show the old version until the new version is upgraded successfully Exisiting version on cluster $ oc get clusteroperators.config.openshift.io | grep "NAME\|network" NAME VERSION AVAILABLE PROGRESSING FAILING SINCE network 4.0.0-0.nightly-2019-03-13-233958 True False False 15m After oc adm upgrade, # oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.0.0-0.ci-2019-03-14-150906 True True 6m6s Working towards 4.0.0-0.ci-2019-03-14-150906: 9% complete //Anurag
*** Bug 1683648 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758