Bug 1546170
Summary: | [3.7] missing node-to-node OVS flows | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Dan Winship <danw> |
Component: | Networking | Assignee: | Ben Bennett <bbennett> |
Status: | CLOSED ERRATA | QA Contact: | Meng Bo <bmeng> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.7.1 | CC: | aos-bugs, bbennett, bmeng, eparis, tkimura |
Target Milestone: | --- | ||
Target Release: | 3.7.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: In some (as-yet-undetermined) circumstances, nodes were apparently receiving a duplicate out-of-order HostSubnet "deleted" event from the master.
Consequence: When processing the duplicate event, the node could end up deleting OVS flows corresponding to an active node, causing pods on the two nodes to be unable to communicate with each other. (This was most noticeable when it happened to a node hosting the registry.)
Fix: The HostSubnet event-processing code will now notice that the event is a duplicate and ignore it.
Result: OVS flows are not deleted, and pods can communicate.
|
Story Points: | --- |
Clone Of: | 1546169 | Environment: | |
Last Closed: | 2018-04-05 09:38:31 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1544903, 1546169, 1547599 | ||
Bug Blocks: |
Comment 1
Dan Winship
2018-02-16 13:46:15 UTC
Tested on 3.7.38, there is no replay of hostsubnet delete. (In reply to Meng Bo from comment #3) > Tested on 3.7.38, there is no replay of hostsubnet delete. That's not what the patch fixes. The patch attempts to make it so that if a "replayed hostsubnet delete" occurs, that we do the right thing. But we don't know how to actually cause the "replayed hostsubnet delete" (or even if that really is the right description of what's occurring), so the fix can't really be QA'ed at this point (other than to make sure that it doesn't break anything else). The real test will be when we get this fix deployed to Online, and we see if the node-to-node routing problem goes away. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:0636 |