Bug 1389451 - While upgrading from 3.2.0 to 3.2.1.17 (3.2 latest) ovs flows are added correctly but are missed out while upgrading from 3.2.1.17 (3.2 latest) to 3.3 (3.3.0.35)
Summary: While upgrading from 3.2.0 to 3.2.1.17 (3.2 latest) ovs flows are added corr...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Dan Winship
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-27 15:23 UTC by Miheer Salunke
Modified: 2016-11-11 13:12 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-11 13:12:06 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Miheer Salunke 2016-10-27 15:23:46 UTC
Description of problem:


While upgrading from 3.2.0 to  3.2.1.17 (3.2 latest) hostsubnets in ovs node's flow tables are added correctly but are randomly missed out while upgrading from  3.2.1.17 (3.2 latest) to  3.3 (3.3.0.35) in ovs node's flow tables.

according to the comment 
https://access.redhat.com/support/cases/#/case/01697073?commentId=a0aA000000HzifqIAB
registry is located @10.1.1.2 on node 542.
node 538 can reach the registry, its ovs knows the subnet 10.1.1.0/24.
node 540 is unable to reach the registry because its ovs doesn't know the subnet 10.1.1.0/24, so it can't forward the network frame to the host 542 via the vxlan.

We have deleted, then re-added the node 542.
it has fixed the issue !
Missing rules are now present on node 540.
It's the unique workaround we found.

"oc get hostsubnets" return a hostsubnet for every node since the beginning.


So yesterday we restore a 3.2.0 backup, everything went well. 
Then we updated to 3.2.1.17 (3.2 latest),  everything went well too.
Then we updated to 3.3 (3.3.0.35) and we hit the issue again.

The workaround (deleting and adding again a node) is cool for test environment but not acceptable for production ones, since no or minimal downtimes is required.


PS :
We also retry on one node to :
systemctl stop atomic-openshift-node 
ovs-vsctl del-br br0 
systemctl start atomic-openshift-node

But it had no effects.





Version-Release number of selected component (if applicable):
3.3.0.35

How reproducible:
Always on customer side

Steps to Reproduce:
1.Mentioned in the description
2.
3.

Actual results:
While upgrading from 3.2.0 to  3.2.1.17 (3.2 latest) hostsubnets in ovs node's flow tables are added correctly but are randomly missed out while upgrading from  3.2.1.17 (3.2 latest) to  3.3 (3.3.0.35) in ovs node's flow tables.

Expected results:
Upgrade from 3.2.1.17 (3.2 latest) to  3.3 (3.3.0.35) shall not miss out randomly some hostsubnets in ovs node's flow tables

Additional info:

Comment 18 Ben Bennett 2016-11-11 13:12:06 UTC
The resolution was that they had duplicated a UUID of a hostsubnet when creating manually, and that broke things.


Note You need to log in before you can comment on or make changes to this bug.