Bug 1272295

Summary: Should clean up the orphan openflow rules after no traffic for hard_timeout
Product: OKD Reporter: Meng Bo <bmeng>
Component: NetworkingAssignee: Dan Winship <danw>
Status: CLOSED CURRENTRELEASE QA Contact: Meng Bo <bmeng>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.xCC: aos-bugs, danw, kzhang, rkhan, yadu
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-11-23 21:13:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
openflow-dump
none
node_log_for_legacy_service none

Description Meng Bo 2015-10-16 02:14:33 UTC
Description of problem:
As the SDN setup script, it set the hard_timeout=900 in table0. 

# grep hard_timeout ./Godeps/_workspace/src/github.com/openshift/openshift-sdn/pkg/ovssubnet/controller/multitenant/bin/openshift-sdn-multitenant-setup.sh
    ovs-ofctl -O OpenFlow13 add-flow br0 "table=0, actions=learn(table=8, priority=200, hard_timeout=900, NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[], load:NXM_NX_TUN_IPV4_SRC[]->NXM_NX_TUN_IPV4_DST[], output:NXM_OF_IN_PORT[]), goto_table:1"

But the openflow rules for services and pods will not be deleted automatically even after deleted the pod/service for a long time.

Version-Release number of selected component (if applicable):
openshift v1.0.6-622-g47d1103
kubernetes v1.1.0-alpha.1-653-g86b4e77
openshift-sdn same as openshift

How reproducible:
always

Steps to Reproduce:
1. Setup multi-node env with multi-tenant networking plugin
2. Create some pods services in the env
3. Delete all the created pods and services
4. Check the openflow on the node

Actual results:
The rules for the deleted pods/services will not be deleted automatically.

Expected results:
The rules should be cleaned after no traffic for 900s.

Additional info:
here is the dump of openflow on one of the nodes:

# ovs-ofctl dump-flows br0 -O OpenFlow13
OFPST_FLOW reply (OF1.3) (xid=0x2):
 cookie=0x0, duration=90044.992s, table=0, n_packets=3371, n_bytes=602672, actions=learn(table=8,hard_timeout=900,priority=200,NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:
NXM_NX_TUN_IPV4_SRC[]->NXM_NX_TUN_IPV4_DST[],output:NXM_OF_IN_PORT[]),goto_table:1
 cookie=0x0, duration=90044.985s, table=1, n_packets=868, n_bytes=135149, actions=goto_table:3
 cookie=0x0, duration=90044.986s, table=1, n_packets=33, n_bytes=2802, in_port=3 actions=goto_table:5
 cookie=0x0, duration=90044.989s, table=1, n_packets=792, n_bytes=55440, in_port=1 actions=goto_table:2
 cookie=0x0, duration=90044.988s, table=1, n_packets=657, n_bytes=366399, in_port=2 actions=goto_table:5
 cookie=0x0, duration=90044.990s, table=1, n_packets=1021, n_bytes=42882, arp actions=goto_table:8
 cookie=0x0, duration=90044.981s, table=2, n_packets=792, n_bytes=55440, tun_id=0 actions=goto_table:5
 cookie=0x0, duration=90044.982s, table=2, n_packets=0, n_bytes=0, priority=200,ip,nw_dst=10.1.0.1 actions=output:2
 cookie=0x0, duration=90044.980s, table=2, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.1.0.0/24 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:6
 cookie=0x0, duration=90044.984s, table=2, n_packets=0, n_bytes=0, arp actions=goto_table:8
 cookie=0x7, duration=6113.620s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=7,nw_src=10.1.0.5 actions=load:0xc->NXM_NX_REG0[],goto_table:4
 cookie=0x9, duration=6208.706s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=9,nw_src=10.1.0.7 actions=load:0xf->NXM_NX_REG0[],goto_table:4
 cookie=0x6, duration=87827.417s, table=3, n_packets=595, n_bytes=44030, priority=100,ip,in_port=6,nw_src=10.1.0.4 actions=load:0xc->NXM_NX_REG0[],goto_table:4
 cookie=0x8, duration=6220.681s, table=3, n_packets=0, n_bytes=0, priority=100,ip,in_port=8,nw_src=10.1.0.6 actions=load:0xc->NXM_NX_REG0[],goto_table:4
 cookie=0x4, duration=90030.166s, table=3, n_packets=177, n_bytes=62101, priority=100,ip,in_port=4,nw_src=10.1.0.2 actions=load:0->NXM_NX_REG0[],goto_table:4
 cookie=0x5, duration=90029.787s, table=3, n_packets=60, n_bytes=26186, priority=100,ip,in_port=5,nw_src=10.1.0.3 actions=load:0->NXM_NX_REG0[],goto_table:4
 cookie=0x0, duration=90044.977s, table=4, n_packets=832, n_bytes=132317, priority=0 actions=goto_table:5
 cookie=0x0, duration=87812.103s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.93.213,tp_dst=27017 actions=output:2
 cookie=0x0, duration=6113.601s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xc,nw_dst=172.30.73.136,tp_dst=27017 actions=output:2
 cookie=0x0, duration=6230.917s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xd,nw_dst=172.30.73.136,tp_dst=27017 actions=output:2
 cookie=0x0, duration=87830.497s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xc,nw_dst=172.30.216.225,tp_dst=27017 actions=output:2
 cookie=0x0, duration=6113.585s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xc,nw_dst=172.30.46.104,tp_dst=27017 actions=output:2
 cookie=0x0, duration=6230.117s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xe,nw_dst=172.30.46.104,tp_dst=27017 actions=output:2
 cookie=0x0, duration=16207.129s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xc,nw_dst=172.30.121.35,tp_dst=10086 actions=output:2
 cookie=0x0, duration=6231.698s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xc,nw_dst=172.30.98.208,tp_dst=27017 actions=output:2
 cookie=0x0, duration=6229.243s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,reg0=0xf,nw_dst=172.30.167.163,tp_dst=27017 actions=output:2
 cookie=0x0, duration=16207.143s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,nw_dst=172.30.154.115,tp_dst=5000 actions=output:2
 cookie=0x0, duration=4696.702s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,nw_dst=172.30.0.1,tp_dst=443 actions=output:2
 cookie=0x0, duration=16207.132s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,nw_dst=172.30.247.54,tp_dst=80 actions=output:2
 cookie=0x0, duration=90044.978s, table=4, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=172.30.0.0/16 actions=drop
 cookie=0x0, duration=90044.975s, table=5, n_packets=199, n_bytes=14726, priority=200,ip,nw_dst=10.1.0.1 actions=output:2
 cookie=0x0, duration=90044.973s, table=5, n_packets=396, n_bytes=29304, priority=100,ip,nw_dst=10.1.0.0/16 actions=goto_table:7
 cookie=0x0, duration=90044.971s, table=5, n_packets=237, n_bytes=88287, priority=0,ip actions=output:2
 cookie=0x0, duration=90044.974s, table=5, n_packets=1441, n_bytes=421191, priority=150,ip,nw_dst=10.1.0.0/24 actions=goto_table:6
 cookie=0x0, duration=90044.970s, table=6, n_packets=1441, n_bytes=421191, priority=200,ip,reg0=0 actions=goto_table:8
 cookie=0x4, duration=90030.165s, table=6, n_packets=0, n_bytes=0, priority=150,ip,nw_dst=10.1.0.2 actions=output:4
 cookie=0x5, duration=90029.786s, table=6, n_packets=0, n_bytes=0, priority=150,ip,nw_dst=10.1.0.3 actions=output:5
 cookie=0x8, duration=6220.679s, table=6, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xc,nw_dst=10.1.0.6 actions=output:8
 cookie=0x7, duration=6113.617s, table=6, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xc,nw_dst=10.1.0.5 actions=output:7
 cookie=0x6, duration=87827.415s, table=6, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xc,nw_dst=10.1.0.4 actions=output:6
 cookie=0x9, duration=6208.704s, table=6, n_packets=0, n_bytes=0, priority=100,ip,reg0=0xf,nw_dst=10.1.0.7 actions=output:9
 cookie=0xa42803d, duration=16208.214s, table=7, n_packets=198, n_bytes=14652, priority=100,ip,nw_dst=10.1.1.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.128.61->tun_dst,output:1
 cookie=0xa428039, duration=16208.207s, table=7, n_packets=198, n_bytes=14652, priority=100,ip,nw_dst=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.128.57->tun_dst,output:1
 cookie=0xa428039, duration=16208.204s, table=8, n_packets=837, n_bytes=35154, priority=100,arp,arp_tpa=10.1.2.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.128.57->tun_dst,output:1
 cookie=0xa42803d, duration=16208.209s, table=8, n_packets=0, n_bytes=0, priority=100,arp,arp_tpa=10.1.1.0/24 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.66.128.61->tun_dst,output:1
 cookie=0x0, duration=90044.969s, table=8, n_packets=16, n_bytes=672, priority=0,arp actions=FLOOD

Comment 1 Dan Winship 2015-10-16 14:06:18 UTC
(In reply to Meng Bo from comment #0)
> As the SDN setup script, it set the hard_timeout=900 in table0. 
> 
> # grep hard_timeout
> ./Godeps/_workspace/src/github.com/openshift/openshift-sdn/pkg/ovssubnet/
> controller/multitenant/bin/openshift-sdn-multitenant-setup.sh
>     ovs-ofctl -O OpenFlow13 add-flow br0 "table=0, actions=learn(table=8,
> priority=200, hard_timeout=900, NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],
> load:NXM_NX_TUN_IPV4_SRC[]->NXM_NX_TUN_IPV4_DST[], output:NXM_OF_IN_PORT[]),
> goto_table:1"

That hard_timeout applies only to rules added by that learn() action (eg, rules that have table=8 and priority=200). There are no such rules in your dump-flow output, so the timeout is working.

However, the rules in table 3 and table 6 ought to have been deleted with their associated pods, so if the pods have been deleted then there's some bug here... Can you attach your "journalctl -u openshift-node.service" output? That should show the output from the relevant "ovs-ofctl del-flows" commands...


The priority=200 rules in table 4 are associated with services, not pods, and so they should still be there if you still have services defined (even if there aren't any pods implementing those services). (There should probably be a test that the service rules do get cleaned up properly when services go away.)

All of the rules in table 7 and the two priority=100 rules in table 8 are associated with nodes, not pods, so they should still be there as long as the nodes are still in service. (There should probably be a test that these get cleaned up correctly too.)

All of the remaining rules are permanent.

Comment 2 Meng Bo 2015-10-19 07:22:26 UTC
http://pastebin.test.redhat.com/320760

Here is the openshift-node log when creaing/deleting pod/service.

I have highlighted the lines which look strange to me.

Thanks.

Comment 3 Dan Winship 2015-10-19 14:19:59 UTC
> Oct 19 14:52:11 node1.bmeng.local openshift-node[2328]: W1019 14:52:11.174872    2328 manager.go:1748] Hairpin setup failed for pod "hello-pod_u1p1": open /sys/devices/virtual/net/veth4ede697/brport/hairpin_mode: no such file or directory

That's harmless and has been fixed in upstream kubernetes. I'll submit a PR to pull in that fix.

> Oct 19 15:03:51 node1.bmeng.local openshift-node[2328]: E1019 15:03:51.759344    2328 manager.go:1204] Failed tearing down the infra container: exit status 1

That's the error, but I guess we don't log the output of that script by default; can you add "--loglevel=5" to the openshift command line in openshift-node.service and try again?

Comment 4 Meng Bo 2015-10-20 06:00:12 UTC
http://pastebin.test.redhat.com/321035

Logs for deleting pod with loglevel=5

Comment 5 Meng Bo 2015-10-21 05:02:24 UTC
Found the PR in openshift-sdn repo

https://github.com/openshift/openshift-sdn/pull/186

Comment 6 Meng Bo 2015-10-21 05:19:48 UTC
@Dan

The error in line 44 of this log (http://pastebin.test.redhat.com/320760) is also harmless? Seems it failed to delete openflow rule.

Comment 7 Dan Winship 2015-10-21 14:57:40 UTC
ah, you're right. (I only tested deleting pods, not deleting services)

Comment 8 Meng Bo 2015-10-22 04:39:19 UTC
I have check the issue with sync the latest openshift-sdn to origin.

The service openflow rule can be deleted, but the pods' rules still cannot be cleaned.

node logs as below: 

Oct 22 12:35:21 node1.bmeng.local ovs-vsctl[6385]: ovs|00001|vsctl|INFO|Called as ovs-vsctl --if-exists del-port vethc02aba4
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: I1022 12:35:21.781796    6298 plugins.go:60] TearDownPod 'multitenant' network plugin output: + lockwrap Teard
own
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: + flock 200
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: + Teardown
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: + get_ipaddr_pid_veth
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: ++ docker inspect --format '{{.HostConfig.NetworkMode}}' 4cf8ecc974ce0b1d224b1bcafd6d101d5695383d916430bc91b1e
326bdfd0c5a
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: + network_mode=bridge
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: + '[' bridge == host ']'
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: + [[ bridge =~ container:.* ]]
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: ++ docker inspect --format '{{.NetworkSettings.IPAddress}}' 4cf8ecc974ce0b1d224b1bcafd6d101d5695383d916430bc91
b1e326bdfd0c5a
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: + ipaddr=10.1.0.6
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: ++ docker inspect --format '{{.State.Pid}}' 4cf8ecc974ce0b1d224b1bcafd6d101d5695383d916430bc91b1e326bdfd0c5a
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: + pid=6215
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: ++ get_veth_host 6215
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: ++ local pid=6215
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: +++ nsenter -n -t 6215 -- ethtool -S eth0
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: +++ sed -n -e 's/.*peer_ifindex: //p'
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: ++ local veth_ifindex=21
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: ++ ip link show
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: ++ sed -ne 's/^21: \([^:@]*\).*/\1/p'
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: + veth_host=vethc02aba4
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: + del_ovs_port
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: + ovs-vsctl --if-exists del-port vethc02aba4
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: + del_ovs_flows
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: ++ ovs-ofctl -O OpenFlow13 dump-ports-desc br0
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: ++ grep vethc02aba4
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: ++ tr -d ' '
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: ++ cut -d '(' -f 1
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: + ovs_port=
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: + ovs-ofctl -O OpenFlow13 del-flows br0 table=3,cookie=0x/0xffffffff
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: ovs-ofctl: invalid numeric format 0x
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: , exit status 1
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: E1022 12:35:21.781830    6298 manager.go:1336] Failed tearing down the infra container: exit status 1
Oct 22 12:35:21 node1.bmeng.local openshift-node[6298]: I1022 12:35:21.786531    6298 manager.go:1413] Killing container "4cf8ecc974ce0b1d224b1bcafd6d101d5695383d9164
30bc91b1e326bdfd0c5a u1p1/hello-pod" with 30 second grace period

Comment 9 Meng Bo 2015-10-22 04:43:23 UTC
BTW, I found only the openflow rules of user's namespace can be cleaned once the service/pod is deleted by user.


All the rules in default namespace will not be cleaned.

You can see there are two rules point to port 5000 and 80, since I deleted and recreated both the docker-registry and router once.

 cookie=0x0, duration=4845.973s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,nw_dst=172.30.206.20,tp_dst=5000 actions=output:2
 cookie=0x0, duration=5984.690s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,nw_dst=172.30.132.224,tp_dst=5000 actions=output:2
 cookie=0x0, duration=159.736s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,nw_dst=172.30.0.1,tp_dst=443 actions=output:2
 cookie=0x0, duration=4845.963s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,nw_dst=172.30.183.25,tp_dst=80 actions=output:2
 cookie=0x0, duration=5984.679s, table=4, n_packets=0, n_bytes=0, priority=200,tcp,nw_dst=172.30.87.133,tp_dst=80 actions=output:2

Comment 10 Dan Winship 2015-10-22 14:49:33 UTC
(In reply to Meng Bo from comment #9)
> BTW, I found only the openflow rules of user's namespace can be cleaned once
> the service/pod is deleted by user.
> 
> 
> All the rules in default namespace will not be cleaned.

I'm not able to reproduce this. Can you give me an exact sequence of commands?

Comment 11 Meng Bo 2015-10-23 03:02:10 UTC
(In reply to Dan Winship from comment #10)
> (In reply to Meng Bo from comment #9)
> > BTW, I found only the openflow rules of user's namespace can be cleaned once
> > the service/pod is deleted by user.
> > 
> > 
> > All the rules in default namespace will not be cleaned.
> 
> I'm not able to reproduce this. Can you give me an exact sequence of
> commands?

Ok, I cannot reproduce it neither.... maybe some of the rules are created before I upgrade the binary.

Comment 12 Meng Bo 2015-10-23 06:43:13 UTC
Checked with the latest origin code and openshift-sdn version 1e4edc9abb6bb8ac7e5cd946ddec4c10cc714d67

The openflow rules can be deleted now after the pod deleted.

Will verify the bug once the changes merged into origin.

Comment 13 Meng Bo 2015-10-26 03:21:57 UTC
Created attachment 1086359 [details]
openflow-dump

@danw

Seems it still has some problem for the removing openflow rules for deleted services.

I do not have the exact reproduce steps. Just using the script to create/delete resource in my env for the weekend.

And there are 180 rules for deleted services on one of my node. 106 rules on another node. 

script:
===========
for i in 1 2 3 4; do oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/master/networking/list_for_pods.json -n u1p$i ; done
sleep 5
oadm pod-network join-projects --to u1p1 u1p2 u1p3
sleep 5
for i in 1 2 3 4; do oc delete rc --all -n u1p$i ; done
for i in 1 2 3 4; do oc delete svc --all -n u1p$i ; done
sleep 5
===========

Openflow dump attached.

Comment 14 Dan Winship 2015-10-26 20:40:18 UTC
(In reply to Meng Bo from comment #13)
> oadm pod-network join-projects --to u1p1 u1p2 u1p3

do you still run into the problem if you don't do any pod-network reorganizing?

> Openflow dump attached.

Can you attach the journal output? It logs every service rule it deletes, along with the output from the ovs-ofctl command if it fails. (Look for "Output of deleting", but note that the log message may continue on the following lines.)

Comment 15 Meng Bo 2015-10-27 10:04:55 UTC
Created attachment 1086790 [details]
node_log_for_legacy_service

Hi Dan,

I can reproduce the problem without the step `oadm pod-network join-projects`, but I still don't know what's the exact steps to reproduce this.

The attached log seems does not contain everything it should have, some of the log cannot be logged into journal log. The service with IP 172.30.231.118, failed to be deleted from OVS. But no "Output of deleting" related log for the IP.

I think the issue in comment#13 should be race condition related problem.
How do you think if we only focus on the functionality of the openflow rules deleting here? I will raise the bug in other scope.

Comment 16 Dan Winship 2015-10-27 14:17:24 UTC
(In reply to Meng Bo from comment #15)
> The attached log seems does not contain everything it should have, some of
> the log cannot be logged into journal log. The service with IP
> 172.30.231.118, failed to be deleted from OVS. But no "Output of deleting"
> related log for the IP.

OK, so for some reason we're not even calling DelServiceOFRules() for that deletion...

> I think the issue in comment#13 should be race condition related problem.
> How do you think if we only focus on the functionality of the openflow rules
> deleting here? I will raise the bug in other scope.

Sounds good.

Comment 17 Dan Winship 2015-10-27 18:29:18 UTC
other than the apparent race condition with deleting services, this should now be fixed in origin master.

Comment 18 Meng Bo 2015-10-28 02:44:04 UTC
The issue has been fixed on openshift v1.0.6-997-gff3b522.


Move bug to verified.