Bug 1838007
| Summary: | Networking issue during OSD service outage 2020-05-19 | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Alexander Constantinescu <aconstan> | |
| Component: | Networking | Assignee: | Alexander Constantinescu <aconstan> | |
| Networking sub component: | openshift-sdn | QA Contact: | zhaozhanqi <zzhao> | |
| Status: | CLOSED DEFERRED | Docs Contact: | ||
| Severity: | urgent | |||
| Priority: | urgent | CC: | aaleman, aarapov, agarcial, akonarde, aos-bugs, apahim, asegundo, bbennett, cattias, cblecker, cdc, dcbw, dhansen, jaharrin, jbeakley, jchevret, jeder, kbsingh, lmohanty, markmc, marobrie, mcambria, nmalik, pbergene, scuppett, sdodson, tparikh, trankin, tsmetana, vrutkovs, wking, yanyang, yufchang | |
| Version: | 4.3.0 | Keywords: | ServiceDeliveryBlocker, Upgrades | |
| Target Milestone: | --- | |||
| Target Release: | 4.6.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1851182 (view as bug list) | Environment: | ||
| Last Closed: | 2020-08-04 08:28:53 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1851182 | |||
| Bug Blocks: | ||||
|
Comment 2
Lalatendu Mohanty
2020-05-21 14:59:26 UTC
Once the problem here is understood please take a moment to answer the questions in Comment #2. We'd like to have a better understanding of whether or not the product has regressed ASAP. This could be a red herring, but https://github.com/openshift/machine-config-operator/pull/1668 was included in 4.3 This changed to NSS_SDB_USE_CACHE=no by default. See also https://bugzilla.redhat.com/show_bug.cgi?id=1820507#c9 It would be worth trying the test case getent/GET test case with NSS_SDB_USE_CACHE to rule out that change as a suspect (In reply to Mark McLoughlin from comment #29) > This could be a red herring, but > https://github.com/openshift/machine-config-operator/pull/1668 was included > in 4.3 Correction - it was included in 4.3.19 https://openshift-release.svc.ci.openshift.org/releasestream/4-stable/release/4.3.19?from=4.3.18 > machine-config-operator: > * Bug 1822269: Add new crio.conf field to the template #1668 Still investigating, moving to 4.6 and will do a backport to 4.5 once fixed. @aleksander can you grab full flow dumps from good and bad nodes, and also try an ofproto/trace for traffic going to the bad pod? Yeah, if syncNamespaceFlows() fails it really should retry. I am closing this issue. Several issues have been fixed with openshift-sdn networking and been back-ported to 4.4 (4.3 will happen this week), see below: https://bugzilla.redhat.com/show_bug.cgi?id=1855118 https://bugzilla.redhat.com/show_bug.cgi?id=1853193 https://bugzilla.redhat.com/show_bug.cgi?id=1857738 Moreover, the effort to investigate the quay outage has stopped. |