Bug 1546170

Summary: [3.7] missing node-to-node OVS flows
Product: OpenShift Container Platform Reporter: Dan Winship <danw>
Component: NetworkingAssignee: Ben Bennett <bbennett>
Status: CLOSED ERRATA QA Contact: Meng Bo <bmeng>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.7.1CC: aos-bugs, bbennett, bmeng, eparis, tkimura
Target Milestone: ---   
Target Release: 3.7.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: In some (as-yet-undetermined) circumstances, nodes were apparently receiving a duplicate out-of-order HostSubnet "deleted" event from the master. Consequence: When processing the duplicate event, the node could end up deleting OVS flows corresponding to an active node, causing pods on the two nodes to be unable to communicate with each other. (This was most noticeable when it happened to a node hosting the registry.) Fix: The HostSubnet event-processing code will now notice that the event is a duplicate and ignore it. Result: OVS flows are not deleted, and pods can communicate.
Story Points: ---
Clone Of: 1546169 Environment:
Last Closed: 2018-04-05 09:38:31 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1544903, 1546169, 1547599    
Bug Blocks:    

Comment 1 Dan Winship 2018-02-16 13:46:15 UTC
https://github.com/openshift/ose/pull/1073

Comment 3 Meng Bo 2018-03-12 10:34:08 UTC
Tested on 3.7.38, there is no replay of hostsubnet delete.

Comment 4 Dan Winship 2018-03-12 14:13:14 UTC
(In reply to Meng Bo from comment #3)
> Tested on 3.7.38, there is no replay of hostsubnet delete.

That's not what the patch fixes. The patch attempts to make it so that if a "replayed hostsubnet delete" occurs, that we do the right thing. But we don't know how to actually cause the "replayed hostsubnet delete" (or even if that really is the right description of what's occurring), so the fix can't really be QA'ed at this point (other than to make sure that it doesn't break anything else).

The real test will be when we get this fix deployed to Online, and we see if the node-to-node routing problem goes away.

Comment 8 errata-xmlrpc 2018-04-05 09:38:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0636