Bug 1547899 - [egressIP] Add the removed egressIP back to the netnamespace will not make it work again
Summary: [egressIP] Add the removed egressIP back to the netnamespace will not make it...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.9.0
Assignee: Dan Winship
QA Contact: Meng Bo
URL:
Whiteboard:
: 1548080 (view as bug list)
Depends On:
Blocks: 1548080 1548081
TreeView+ depends on / blocked
 
Reported: 2018-02-22 08:53 UTC by Meng Bo
Modified: 2018-12-13 19:27 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Bugs in the per-project static IP code Consequence: If you removed the static IP from a project and then re-added it, it would not always work correctly. Fix: Fixed the bugs Result: Removing-and-readding static egress IPs now works
Clone Of:
: 1548080 1548081 (view as bug list)
Environment:
Last Closed: 2018-12-13 19:26:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Origin (Github) 18808 0 None None None 2018-03-06 18:56:41 UTC
Red Hat Product Errata RHBA-2018:3748 0 None None None 2018-12-13 19:27:10 UTC

Description Meng Bo 2018-02-22 08:53:41 UTC
Description of problem:
Add the egressIP to netnamespace and it will work well. After remove the egressIP from the netnamespace and add it back, it will not work anymore.


Version-Release number of selected component (if applicable):
v3.9.0-0.47.0

How reproducible:
always

Steps to Reproduce:
1. Setup multi node env

2. Create project

3. Add the egressIP to any of the node
# oc patch hostsubnet ose-node1.bmeng.local -p '{"egressIPs":["10.66.140.200"]}'

4. Add the egressIP to the netnamespace of the project above
# oc patch netnamespace a1b1 -p '{"egressIPs":["10.66.140.200"]}'

5. Remvoe the egressIP from the netnamespace
# oc patch netnamespace a1b1 -p '{"egressIPs":[]}'

6. Access outside via the pods

7. Add the egressIP back to the netnamespace
# oc patch netnamespace a1b1 -p '{"egressIPs":["10.66.140.200"]}'

8. Try to access outside via the pods again


Actual results:
6. The pods can access outside with the node's real IP.
8. The pods will lose outside connection.

Expected results:
8. The pods should still use the egressIP for outside access.

Additional info:
> After step 4, the openflow rule will be added
table=90, priority=0 actions=drop
table=100, priority=100,reg0=0x70d9b7 actions=drop
table=100, priority=100,reg0=0xc24e62 actions=drop
table=100, priority=100,ip,reg0=0x9abae2 actions=set_field:3e:e4:30:38:21:29->eth_dst,set_field:0x9abae2->pkt_mark,goto_table:101
table=100, priority=0 actions=goto_table:101
table=101, priority=51,tcp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2
table=101, priority=51,udp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2
table=101, priority=0 actions=output:2

> After step 5, the openflow rule "reg0=0x9abae2 actions=drop" will be added
table=90, priority=0 actions=drop
table=100, priority=100,reg0=0x70d9b7 actions=drop
table=100, priority=100,reg0=0xc24e62 actions=drop
table=100, priority=100,reg0=0x9abae2 actions=drop
table=100, priority=0 actions=goto_table:101
table=101, priority=51,tcp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2
table=101, priority=51,udp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2
table=101, priority=0 actions=output:2

> After step 7, the openflow rule will not change
table=90, priority=0 actions=drop
table=100, priority=100,reg0=0x70d9b7 actions=drop
table=100, priority=100,reg0=0xc24e62 actions=drop
table=100, priority=100,reg0=0x9abae2 actions=drop
table=100, priority=0 actions=goto_table:101
table=101, priority=51,tcp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2
table=101, priority=51,udp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2
table=101, priority=0 actions=output:2


[root@ose-master ~]# oc get hostsubnet 
NAME                    HOST                    HOST IP         SUBNET          EGRESS IPS
ose-node1.bmeng.local   ose-node1.bmeng.local   10.66.141.128   10.129.0.0/23   [10.66.140.200]
ose-node2.bmeng.local   ose-node2.bmeng.local   10.66.140.15    10.128.0.0/23   []
[root@ose-master ~]# oc get netnamespace 
NAME              NETID      EGRESS IPS
a1b1              10140386   [10.66.140.200]
default           0          []
kube-public       8658771    []
kube-system       8407144    []
openshift         15027445   []
openshift-infra   13902263   []
openshift-node    15559391   []

# echo "obase=16 ; 10140386" | bc
9ABAE2

Comment 1 Meng Bo 2018-02-22 08:58:30 UTC
Related node log when adding the egressIP back:

Feb 22 16:57:23 ose-node1.bmeng.local atomic-openshift-node[31074]: I0222 16:57:23.246703   31074 ovs.go:143] Executing: ovs-ofctl -O OpenFlow13 del-flows br0 table=100, reg0=10140386
Feb 22 16:57:23 ose-node1.bmeng.local atomic-openshift-node[31074]: I0222 16:57:23.251585   31074 ovs.go:143] Executing: ovs-ofctl -O OpenFlow13 add-flow br0 table=100, priority=100, reg0=10140386, actions=drop
Feb 22 16:57:25 ose-node1.bmeng.local atomic-openshift-node[31074]: I0222 16:57:25.066447   31074 ovs.go:143] Executing: ovs-ofctl -O OpenFlow13 dump-flows br0
Feb 22 16:57:25 ose-node1.bmeng.local atomic-openshift-node[31074]: I0222 16:57:25.216166   31074 ovs.go:143] Executing: ovs-ofctl -O OpenFlow13 dump-flows br0 table=253

Comment 2 Dan Winship 2018-02-22 16:27:34 UTC
https://github.com/openshift/origin/pull/18720

Comment 4 Meng Bo 2018-03-06 07:49:26 UTC
Tested on v3.9.2-1, it still has problem.

After add the egressIP back, the pod on the egress node will work well with the egress ip, but the pod on the node other than the egress node will still lose egress access.

> Openflow rules on the egress node:
table=100, priority=100,reg0=0x392368 actions=drop
table=100, priority=100,ip,reg0=0x83e9a4 actions=set_field:f6:bc:c3:46:8a:c0->eth_dst,set_field:0x83e9a4->pkt_mark,goto_table:101
table=100, priority=0 actions=goto_table:101
table=101, priority=51,tcp,nw_dst=10.1.1.3,tp_dst=53 actions=output:2
table=101, priority=51,udp,nw_dst=10.1.1.3,tp_dst=53 actions=output:2
table=101, priority=0 actions=output:2


> Openflow rules on the other node:
table=100, priority=100,reg0=0x392368 actions=drop
table=100, priority=100,reg0=0x83e9a4 actions=drop
table=100, priority=0 actions=goto_table:101
table=101, priority=51,tcp,nw_dst=10.1.1.4,tp_dst=53 actions=output:2
table=101, priority=51,udp,nw_dst=10.1.1.4,tp_dst=53 actions=output:2
table=101, priority=0 actions=output:2


> # oc get netnamespace u1p1
NAME      NETID     EGRESS IPS
u1p1      8645028   [10.1.1.100]

Comment 5 Dan Winship 2018-03-06 13:46:48 UTC
This will be fixed by https://github.com/openshift/origin/pull/18808 / bug 1551028

Comment 6 Meng Bo 2018-03-08 06:15:37 UTC
*** Bug 1548080 has been marked as a duplicate of this bug. ***

Comment 8 Ben Bennett 2018-04-13 14:15:37 UTC
https://github.com/openshift/origin/pull/18861

Comment 9 Meng Bo 2018-05-02 08:22:53 UTC
Tested on v3.9.27
Issue has been fixed.

Comment 12 errata-xmlrpc 2018-12-13 19:26:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3748


Note You need to log in before you can comment on or make changes to this bug.