Bug 1819492
Summary: | invalid apiserver certificates causing large blocks of test failures on vsphere | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ben Parees <bparees> | |
Component: | Installer | Assignee: | Joseph Callen <jcallen> | |
Installer sub component: | openshift-installer | QA Contact: | jima | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | aos-bugs, dphillip, jcallen, jima, kgarriso, mfojtik, sdodson | |
Version: | 4.4 | |||
Target Milestone: | --- | |||
Target Release: | 4.5.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1824991 (view as bug list) | Environment: | ||
Last Closed: | 2020-07-13 17:24:27 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1824991 |
Description
Ben Parees
2020-04-01 01:25:02 UTC
This looks like some components connect to the API with an unknown DNS name. The API server uses normal SNI mechanism to select the right cert. api.ci-op-4byd7z0v-3858a.origin-ci-int-aws.dev.rhcloud.com, not api.ci-op-7zg3gn6s-e99c3.origin-ci-int-aws.dev.rhcloud.com This means the internal LB name changes during the execution. This is worrisome and most probably a upi platform issue. For tracking we are seeing the x509 error in the following runs Job url: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.4/1357 Number of test failures: 412 Job url: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.4/1362 Number of test failures: 397 Job url: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.4/1358 Number of test failures: 380 Job url: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.4/1347 Number of test failures: 345 Job url: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.4/1367 <--- today Number of test failures: 304 This is a mixup in CI jobs and not a bug customers would be exposed to. I am actively working on this problem - will update when I have something to report. I think we were mostly just focused on the "certificate is valid for api.ci-op-4byd7z0v-3858a.origin-ci-int-aws.dev.rhcloud.com, not api.ci-op-7zg3gn6s-e99c3.origin-ci-int-aws.dev.rhcloud.com" but you're welcome to fix all the CI testing defects you wish. There's no chance that the pod-Service test can be tied back to replicating this change https://github.com/openshift/machine-config-operator/pull/1628 which has now been applied to both ovirt and OSP? See the linked bugs. > Updates to UPI complete currently getting PRs ready.
can you link the PRs in this bug?
The flaky tests you're seeing are known flaky everywhere, so if you've resolved the cert issue i say we merge your changes.
https://github.com/openshift/installer/pull/3429 This PR to update the metal terrform will also need to be merged: https://github.com/openshift/installer/pull/3235#issuecomment-611627886 Job template changes: https://github.com/openshift/release/pull/8259 @joseph, just to confirm are those PRs the only changes needed to close this BZ? created a clone for this to be backported to 4.4.z (does not have to be 4.4.0) so we can get our CI cleaned up. The issue is rarely reproduced on QE CI job, we only met once on ocp4.5 nightly build, and could not reproduced any more. I just checked on DEV CI job, it seems that issue is still happened after the code is merged. The last night build on 4.5 reproduced the issue is https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-vsphere-upi-4.5/873. So just confirm that issue is fixed? Hi Jinyun, See my comments: https://bugzilla.redhat.com/show_bug.cgi?id=1824991#c4 https://bugzilla.redhat.com/show_bug.cgi?id=1824991#c6 Thanks for info, Joseph. I checked on recent one week, the issue was not raised up again after build number #873, and the issue is fixed on 4.5. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |