Bug 1548080

Summary: [3.8] [egressIP] Add the removed egressIP back to the netnamespace will not make it work again
Product: OpenShift Container Platform Reporter: Dan Winship <danw>
Component: NetworkingAssignee: Dan Winship <danw>
Status: CLOSED DUPLICATE QA Contact: Meng Bo <bmeng>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.8.0CC: aos-bugs, bbennett, bmeng, xtian
Target Milestone: ---   
Target Release: 3.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Bugs in the per-project static IP code Consequence: If you removed the static IP from a project and then re-added it, it would not always work correctly. Fix: Fixed the bugs Result: Removing-and-readding static egress IPs now works
Story Points: ---
Clone Of: 1547899 Environment:
Last Closed: 2018-03-08 06:15:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1547899, 1548081    
Bug Blocks:    

Description Dan Winship 2018-02-22 16:25:47 UTC
+++ This bug was initially created as a clone of Bug #1547899 +++

Description of problem:
Add the egressIP to netnamespace and it will work well. After remove the egressIP from the netnamespace and add it back, it will not work anymore.


Version-Release number of selected component (if applicable):
v3.9.0-0.47.0

How reproducible:
always

Steps to Reproduce:
1. Setup multi node env

2. Create project

3. Add the egressIP to any of the node
# oc patch hostsubnet ose-node1.bmeng.local -p '{"egressIPs":["10.66.140.200"]}'

4. Add the egressIP to the netnamespace of the project above
# oc patch netnamespace a1b1 -p '{"egressIPs":["10.66.140.200"]}'

5. Remvoe the egressIP from the netnamespace
# oc patch netnamespace a1b1 -p '{"egressIPs":[]}'

6. Access outside via the pods

7. Add the egressIP back to the netnamespace
# oc patch netnamespace a1b1 -p '{"egressIPs":["10.66.140.200"]}'

8. Try to access outside via the pods again


Actual results:
6. The pods can access outside with the node's real IP.
8. The pods will lose outside connection.

Expected results:
8. The pods should still use the egressIP for outside access.

Additional info:
> After step 4, the openflow rule will be added
table=90, priority=0 actions=drop
table=100, priority=100,reg0=0x70d9b7 actions=drop
table=100, priority=100,reg0=0xc24e62 actions=drop
table=100, priority=100,ip,reg0=0x9abae2 actions=set_field:3e:e4:30:38:21:29->eth_dst,set_field:0x9abae2->pkt_mark,goto_table:101
table=100, priority=0 actions=goto_table:101
table=101, priority=51,tcp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2
table=101, priority=51,udp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2
table=101, priority=0 actions=output:2

> After step 5, the openflow rule "reg0=0x9abae2 actions=drop" will be added
table=90, priority=0 actions=drop
table=100, priority=100,reg0=0x70d9b7 actions=drop
table=100, priority=100,reg0=0xc24e62 actions=drop
table=100, priority=100,reg0=0x9abae2 actions=drop
table=100, priority=0 actions=goto_table:101
table=101, priority=51,tcp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2
table=101, priority=51,udp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2
table=101, priority=0 actions=output:2

> After step 7, the openflow rule will not change
table=90, priority=0 actions=drop
table=100, priority=100,reg0=0x70d9b7 actions=drop
table=100, priority=100,reg0=0xc24e62 actions=drop
table=100, priority=100,reg0=0x9abae2 actions=drop
table=100, priority=0 actions=goto_table:101
table=101, priority=51,tcp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2
table=101, priority=51,udp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2
table=101, priority=0 actions=output:2


[root@ose-master ~]# oc get hostsubnet 
NAME                    HOST                    HOST IP         SUBNET          EGRESS IPS
ose-node1.bmeng.local   ose-node1.bmeng.local   10.66.141.128   10.129.0.0/23   [10.66.140.200]
ose-node2.bmeng.local   ose-node2.bmeng.local   10.66.140.15    10.128.0.0/23   []
[root@ose-master ~]# oc get netnamespace 
NAME              NETID      EGRESS IPS
a1b1              10140386   [10.66.140.200]
default           0          []
kube-public       8658771    []
kube-system       8407144    []
openshift         15027445   []
openshift-infra   13902263   []
openshift-node    15559391   []

# echo "obase=16 ; 10140386" | bc
9ABAE2

--- Additional comment from Meng Bo on 2018-02-22 03:58:30 EST ---

Related node log when adding the egressIP back:

Feb 22 16:57:23 ose-node1.bmeng.local atomic-openshift-node[31074]: I0222 16:57:23.246703   31074 ovs.go:143] Executing: ovs-ofctl -O OpenFlow13 del-flows br0 table=100, reg0=10140386
Feb 22 16:57:23 ose-node1.bmeng.local atomic-openshift-node[31074]: I0222 16:57:23.251585   31074 ovs.go:143] Executing: ovs-ofctl -O OpenFlow13 add-flow br0 table=100, priority=100, reg0=10140386, actions=drop
Feb 22 16:57:25 ose-node1.bmeng.local atomic-openshift-node[31074]: I0222 16:57:25.066447   31074 ovs.go:143] Executing: ovs-ofctl -O OpenFlow13 dump-flows br0
Feb 22 16:57:25 ose-node1.bmeng.local atomic-openshift-node[31074]: I0222 16:57:25.216166   31074 ovs.go:143] Executing: ovs-ofctl -O OpenFlow13 dump-flows br0 table=253

Comment 1 Dan Winship 2018-02-23 16:45:42 UTC
https://github.com/openshift/ose/pull/1080