Bug 2054426
Summary: | ip-reconciler still fails during initial cluster installs | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | David Eads <deads> |
Component: | Networking | Assignee: | Douglas Smith <dosmith> |
Networking sub component: | multus | QA Contact: | Weibin Liang <weliang> |
Status: | CLOSED DEFERRED | Docs Contact: | |
Severity: | medium | ||
Priority: | high | CC: | bparees, dgoodwin, wking |
Version: | 4.10 | Keywords: | Reopened |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-03-09 01:12:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
David Eads
2022-02-14 23:00:32 UTC
Thanks David. I've got a couple PRs posted for a change that introduces a set of known errors for the ip-reconciler, that is, if an error is matched, the ip-reconciler exits zero. It's a trade off in terms of correctness testing vs. visibility into other issues which match the known error. But, given that it's been giving us some headaches, I think that having the ip-reconciler ignore some errors is the direction I'd tip the scales. https://github.com/openshift/whereabouts-cni/pull/84 https://github.com/openshift/whereabouts-cni/pull/85 I'm looking to get a review from my team tomorrow morning. But if we can check if those improve CI, that's also good feedback. There was also an additional report @ https://bugzilla.redhat.com/show_bug.cgi?id=2050409 -- which is where I have the PRs posted For tracking current hit rate: https://search.ci.openshift.org/?search=alert+KubeJobFailed+fired.*ip-reconciler&maxAge=48h&context=1&type=bug%2Bjunit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job I have filed pr to skip the test for now, https://github.com/openshift/origin/pull/26842. Once merged we'll need a new search CI query to track how often it's occurring. Jira filed to make this it's own test in future. Please reopen or create another BZ if we're still seeing CI results for this problem. This is causing a failed 4.8 payload acceptance due to our 4.7->4.8 upgrade jobs (which likely means any fix that's been applied needs to get into 4.7 to fully avoid this). see: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-upgrade-from-stable-4.7-e2e-gcp-upgrade/1592129606809292800 if you want to close this and create a new jira bug for tracking resolving this in 4.7/4.8 that's ok w/ me OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira. https://issues.redhat.com/browse/OCPBUGS-9119 |