Bug 1539187

Summary: Node startup should flush stale ovs rules when hostsubnetlength changes on restart
Product: OpenShift Container Platform Reporter: Robert Bost <rbost>
Component: NetworkingAssignee: Jacob Tanenbaum <jtanenba>
Status: CLOSED ERRATA QA Contact: Meng Bo <bmeng>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.7.0CC: aos-bugs, bbennett, bpritche, hongli, mchappel, rbost, rhowe, zzhao
Target Milestone: ---   
Target Release: 3.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-28 14:23:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Robert Bost 2018-01-26 20:54:46 UTC
Description of problem:
Customer report:

"Upon upgrading to OpenShift 3.7, our pod IP network became unavailable across nodes. This was debugged to the point that OpenShift was handing out colliding hostsubnet values. For example, some hosts may have been given a 10.1.5.0/24 while others already had the 10.1.4.0/23 range (these two subnets collide)."

OpenShift should not allow two hostsubnet ranges to collide. 


Version-Release number of selected component (if applicable): 3.7


Expected results:
"I expect to see Openshift not give colliding subnet values if the master services can be configured in a way to hand out different subnet lengths."

Comment 1 Jacob Tanenbaum 2018-01-31 20:24:45 UTC
Could you post the master-config.yaml file?

Comment 9 Jacob Tanenbaum 2018-02-02 21:24:26 UTC
We want to allow the master to change the network if something gets messed up, that change has not been reflected in the node sdn setup rules and it should be.

Comment 10 openshift-github-bot 2018-02-24 12:25:03 UTC
Commit pushed to master at https://github.com/openshift/origin

https://github.com/openshift/origin/commit/ffc83819c44440e4e1b30aa34a2ce41e3aab8e75
Correctly flush stale ovs rules on Node startup

currently in openshift when creating a new ovs bridge it does so using

ovs-vsctl --if-exists del-br br0 -- add-br br0 -- set Bridge br0 fail-mode=secure protocols=OpenFlow13

which while it does delete the bridge does not clear the flows attached to it. Spliting bridge creation into two steps, deleting the old bridge and creating the new one correctly deletes any stale ovs flows.
Bug 1539187

Comment 12 Hongan Li 2018-03-05 08:02:07 UTC
verified in openshift v3.9.2 and ovs has updated to delete br0 then create new one as below on node startup.

I0305 07:44:52.501637   14512 ovs.go:145] Executing: ovs-vsctl --if-exists del-br br0
I0305 07:44:52.577332   14512 ovs.go:145] Executing: ovs-vsctl add-br br0 -- set Bridge br0 fail-mode=secure protocols=OpenFlow13

Comment 15 errata-xmlrpc 2018-03-28 14:23:11 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489