Description of problem: Add the egressIP to netnamespace and it will work well. After remove the egressIP from the netnamespace and add it back, it will not work anymore. Version-Release number of selected component (if applicable): v3.9.0-0.47.0 How reproducible: always Steps to Reproduce: 1. Setup multi node env 2. Create project 3. Add the egressIP to any of the node # oc patch hostsubnet ose-node1.bmeng.local -p '{"egressIPs":["10.66.140.200"]}' 4. Add the egressIP to the netnamespace of the project above # oc patch netnamespace a1b1 -p '{"egressIPs":["10.66.140.200"]}' 5. Remvoe the egressIP from the netnamespace # oc patch netnamespace a1b1 -p '{"egressIPs":[]}' 6. Access outside via the pods 7. Add the egressIP back to the netnamespace # oc patch netnamespace a1b1 -p '{"egressIPs":["10.66.140.200"]}' 8. Try to access outside via the pods again Actual results: 6. The pods can access outside with the node's real IP. 8. The pods will lose outside connection. Expected results: 8. The pods should still use the egressIP for outside access. Additional info: > After step 4, the openflow rule will be added table=90, priority=0 actions=drop table=100, priority=100,reg0=0x70d9b7 actions=drop table=100, priority=100,reg0=0xc24e62 actions=drop table=100, priority=100,ip,reg0=0x9abae2 actions=set_field:3e:e4:30:38:21:29->eth_dst,set_field:0x9abae2->pkt_mark,goto_table:101 table=100, priority=0 actions=goto_table:101 table=101, priority=51,tcp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2 table=101, priority=51,udp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2 table=101, priority=0 actions=output:2 > After step 5, the openflow rule "reg0=0x9abae2 actions=drop" will be added table=90, priority=0 actions=drop table=100, priority=100,reg0=0x70d9b7 actions=drop table=100, priority=100,reg0=0xc24e62 actions=drop table=100, priority=100,reg0=0x9abae2 actions=drop table=100, priority=0 actions=goto_table:101 table=101, priority=51,tcp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2 table=101, priority=51,udp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2 table=101, priority=0 actions=output:2 > After step 7, the openflow rule will not change table=90, priority=0 actions=drop table=100, priority=100,reg0=0x70d9b7 actions=drop table=100, priority=100,reg0=0xc24e62 actions=drop table=100, priority=100,reg0=0x9abae2 actions=drop table=100, priority=0 actions=goto_table:101 table=101, priority=51,tcp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2 table=101, priority=51,udp,nw_dst=10.66.141.128,tp_dst=53 actions=output:2 table=101, priority=0 actions=output:2 [root@ose-master ~]# oc get hostsubnet NAME HOST HOST IP SUBNET EGRESS IPS ose-node1.bmeng.local ose-node1.bmeng.local 10.66.141.128 10.129.0.0/23 [10.66.140.200] ose-node2.bmeng.local ose-node2.bmeng.local 10.66.140.15 10.128.0.0/23 [] [root@ose-master ~]# oc get netnamespace NAME NETID EGRESS IPS a1b1 10140386 [10.66.140.200] default 0 [] kube-public 8658771 [] kube-system 8407144 [] openshift 15027445 [] openshift-infra 13902263 [] openshift-node 15559391 [] # echo "obase=16 ; 10140386" | bc 9ABAE2
Related node log when adding the egressIP back: Feb 22 16:57:23 ose-node1.bmeng.local atomic-openshift-node[31074]: I0222 16:57:23.246703 31074 ovs.go:143] Executing: ovs-ofctl -O OpenFlow13 del-flows br0 table=100, reg0=10140386 Feb 22 16:57:23 ose-node1.bmeng.local atomic-openshift-node[31074]: I0222 16:57:23.251585 31074 ovs.go:143] Executing: ovs-ofctl -O OpenFlow13 add-flow br0 table=100, priority=100, reg0=10140386, actions=drop Feb 22 16:57:25 ose-node1.bmeng.local atomic-openshift-node[31074]: I0222 16:57:25.066447 31074 ovs.go:143] Executing: ovs-ofctl -O OpenFlow13 dump-flows br0 Feb 22 16:57:25 ose-node1.bmeng.local atomic-openshift-node[31074]: I0222 16:57:25.216166 31074 ovs.go:143] Executing: ovs-ofctl -O OpenFlow13 dump-flows br0 table=253
https://github.com/openshift/origin/pull/18720
Tested on v3.9.2-1, it still has problem. After add the egressIP back, the pod on the egress node will work well with the egress ip, but the pod on the node other than the egress node will still lose egress access. > Openflow rules on the egress node: table=100, priority=100,reg0=0x392368 actions=drop table=100, priority=100,ip,reg0=0x83e9a4 actions=set_field:f6:bc:c3:46:8a:c0->eth_dst,set_field:0x83e9a4->pkt_mark,goto_table:101 table=100, priority=0 actions=goto_table:101 table=101, priority=51,tcp,nw_dst=10.1.1.3,tp_dst=53 actions=output:2 table=101, priority=51,udp,nw_dst=10.1.1.3,tp_dst=53 actions=output:2 table=101, priority=0 actions=output:2 > Openflow rules on the other node: table=100, priority=100,reg0=0x392368 actions=drop table=100, priority=100,reg0=0x83e9a4 actions=drop table=100, priority=0 actions=goto_table:101 table=101, priority=51,tcp,nw_dst=10.1.1.4,tp_dst=53 actions=output:2 table=101, priority=51,udp,nw_dst=10.1.1.4,tp_dst=53 actions=output:2 table=101, priority=0 actions=output:2 > # oc get netnamespace u1p1 NAME NETID EGRESS IPS u1p1 8645028 [10.1.1.100]
This will be fixed by https://github.com/openshift/origin/pull/18808 / bug 1551028
*** Bug 1548080 has been marked as a duplicate of this bug. ***
https://github.com/openshift/origin/pull/18861
Tested on v3.9.27 Issue has been fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3748