Bug 1453190
| Summary: | Isolate the network still can be accessed for the project which already make network global | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | zhaozhanqi <zzhao> |
| Component: | Networking | Assignee: | Ravi Sankar <rpenta> |
| Status: | CLOSED ERRATA | QA Contact: | Meng Bo <bmeng> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 3.6.0 | CC: | aos-bugs, bbennett, dcbw, eparis, xtian |
| Target Milestone: | --- | Keywords: | Reopened |
| Target Release: | 3.7.0 | ||
| Hardware: | All | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: pod update op failed with k8s rebase.
Consequence: None of the oadm join-projects/isolate-projects/etc. that depends on update pod network op will work.
Fix: Fix pod update by fetching needed info from k8s CRI directly.
Result: Pod update op fixed and oadm pod network ops works as expected.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-11-28 21:55:46 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
zhaozhanqi
2017-05-22 11:13:43 UTC
more into: # oc get pod -o wide -n z1 NAME READY STATUS RESTARTS AGE IP NODE test-rc-472j3 1/1 Running 0 23m 10.128.0.13 host-8-175-238.host.centralci.eng.rdu2.redhat.com test-rc-bkzwr 1/1 Running 0 23m 10.129.0.27 host-8-175-18.host.centralci.eng.rdu2.redhat.com [root@host-8-175-238 ~]# oc get pod -o wide -n z2 NAME READY STATUS RESTARTS AGE IP NODE test-rc-lmlwm 1/1 Running 0 24m 10.129.0.26 host-8-175-18.host.centralci.eng.rdu2.redhat.com test-rc-qd6hk 1/1 Running 0 24m 10.128.0.12 host-8-175-238.host.centralci.eng.rdu2.redhat.com # ovs-ofctl -O openflow13 dump-flows br0 OFPST_FLOW reply (OF1.3) (xid=0x2): cookie=0x0, duration=16322.096s, table=0, n_packets=0, n_bytes=0, priority=250,ip,in_port=2,nw_dst=224.0.0.0/4 actions=drop cookie=0x0, duration=16322.124s, table=0, n_packets=5, n_bytes=210, priority=200,arp,in_port=1,arp_spa=10.128.0.0/14,arp_tpa=10.128.0.0/23 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:10 cookie=0x0, duration=16322.111s, table=0, n_packets=6, n_bytes=588, priority=200,ip,in_port=1,nw_src=10.128.0.0/14,nw_dst=10.128.0.0/23 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:10 cookie=0x0, duration=16322.106s, table=0, n_packets=0, n_bytes=0, priority=200,ip,in_port=1,nw_src=10.128.0.0/14,nw_dst=224.0.0.0/4 actions=move:NXM_NX_TUN_ID[0..31]->NXM_NX_REG0[],goto_table:10 cookie=0x0, duration=16322.092s, table=0, n_packets=11, n_bytes=462, priority=200,arp,in_port=2,arp_spa=10.128.0.1,arp_tpa=10.128.0.0/14 actions=goto_table:30 cookie=0x0, duration=16322.089s, table=0, n_packets=20, n_bytes=1786, priority=200,ip,in_port=2 actions=goto_table:30 cookie=0x0, duration=16322.099s, table=0, n_packets=0, n_bytes=0, priority=150,in_port=1 actions=drop cookie=0x0, duration=16322.085s, table=0, n_packets=8, n_bytes=648, priority=150,in_port=2 actions=drop cookie=0x0, duration=16322.081s, table=0, n_packets=44, n_bytes=1848, priority=100,arp actions=goto_table:20 cookie=0x0, duration=16322.076s, table=0, n_packets=143, n_bytes=12668, priority=100,ip actions=goto_table:20 cookie=0x0, duration=16322.072s, table=0, n_packets=84, n_bytes=6696, priority=0 actions=drop cookie=0x0, duration=16321.953s, table=10, n_packets=11, n_bytes=798, priority=100,tun_src=10.8.175.18 actions=goto_table:30 cookie=0x0, duration=16322.066s, table=10, n_packets=0, n_bytes=0, priority=0 actions=drop cookie=0x0, duration=6092.935s, table=20, n_packets=2, n_bytes=84, priority=100,arp,in_port=6,arp_spa=10.128.0.5,arp_sha=fe:95:a3:dc:9c:5a actions=load:0xb6162b->NXM_NX_REG0[],goto_table:21 cookie=0x0, duration=1564.390s, table=20, n_packets=6, n_bytes=252, priority=100,arp,in_port=13,arp_spa=10.128.0.12,arp_sha=de:47:de:7e:52:a0 actions=load:0->NXM_NX_REG0[],goto_table:21 cookie=0x0, duration=1556.284s, table=20, n_packets=8, n_bytes=336, priority=100,arp,in_port=14,arp_spa=10.128.0.13,arp_sha=76:2e:4a:7a:81:71 actions=load:0xff6aac->NXM_NX_REG0[],goto_table:21 cookie=0x0, duration=6092.929s, table=20, n_packets=7, n_bytes=607, priority=100,ip,in_port=6,nw_src=10.128.0.5 actions=load:0xb6162b->NXM_NX_REG0[],goto_table:21 cookie=0x0, duration=1564.386s, table=20, n_packets=26, n_bytes=2632, priority=100,ip,in_port=13,nw_src=10.128.0.12 actions=load:0->NXM_NX_REG0[],goto_table:21 cookie=0x0, duration=1556.280s, table=20, n_packets=40, n_bytes=3296, priority=100,ip,in_port=14,nw_src=10.128.0.13 actions=load:0xff6aac->NXM_NX_REG0[],goto_table:21 cookie=0x0, duration=16322.062s, table=20, n_packets=0, n_bytes=0, priority=0 actions=drop cookie=0x0, duration=16322.059s, table=21, n_packets=187, n_bytes=14516, priority=0 actions=goto_table:30 cookie=0x0, duration=16322.054s, table=30, n_packets=11, n_bytes=462, priority=300,arp,arp_tpa=10.128.0.1 actions=output:2 cookie=0x0, duration=16322.040s, table=30, n_packets=8, n_bytes=812, priority=300,ip,nw_dst=10.128.0.1 actions=output:2 cookie=0x0, duration=16322.051s, table=30, n_packets=44, n_bytes=1848, priority=200,arp,arp_tpa=10.128.0.0/23 actions=goto_table:40 cookie=0x0, duration=16322.032s, table=30, n_packets=130, n_bytes=11665, priority=200,ip,nw_dst=10.128.0.0/23 actions=goto_table:70 cookie=0x0, duration=16322.047s, table=30, n_packets=5, n_bytes=210, priority=100,arp,arp_tpa=10.128.0.0/14 actions=goto_table:50 cookie=0x0, duration=16322.027s, table=30, n_packets=4, n_bytes=392, priority=100,ip,nw_dst=10.128.0.0/14 actions=goto_table:90 cookie=0x0, duration=16322.036s, table=30, n_packets=20, n_bytes=1566, priority=100,ip,nw_dst=172.30.0.0/16 actions=goto_table:60 cookie=0x0, duration=16322.024s, table=30, n_packets=0, n_bytes=0, priority=50,ip,in_port=1,nw_dst=224.0.0.0/4 actions=goto_table:120 cookie=0x0, duration=16322.020s, table=30, n_packets=0, n_bytes=0, priority=25,ip,nw_dst=224.0.0.0/4 actions=goto_table:110 cookie=0x0, duration=16322.017s, table=30, n_packets=7, n_bytes=607, priority=0,ip actions=goto_table:100 cookie=0x0, duration=16322.014s, table=30, n_packets=0, n_bytes=0, priority=0,arp actions=drop cookie=0x0, duration=6092.924s, table=40, n_packets=2, n_bytes=84, priority=100,arp,arp_tpa=10.128.0.5 actions=output:6 cookie=0x0, duration=1564.382s, table=40, n_packets=6, n_bytes=252, priority=100,arp,arp_tpa=10.128.0.12 actions=output:13 cookie=0x0, duration=1556.277s, table=40, n_packets=8, n_bytes=336, priority=100,arp,arp_tpa=10.128.0.13 actions=output:14 cookie=0x0, duration=16322.011s, table=40, n_packets=0, n_bytes=0, priority=0 actions=drop cookie=0x0, duration=16321.944s, table=50, n_packets=5, n_bytes=210, priority=100,arp,arp_tpa=10.129.0.0/23 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.8.175.18->tun_dst,output:1 cookie=0x0, duration=16322.008s, table=50, n_packets=0, n_bytes=0, priority=0 actions=drop cookie=0x0, duration=16322.005s, table=60, n_packets=12, n_bytes=974, priority=200,reg0=0 actions=output:2 cookie=0x0, duration=16321.935s, table=60, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=172.30.0.1,nw_frag=later actions=load:0->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=16102.179s, table=60, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=172.30.35.20,nw_frag=later actions=load:0->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=16061.011s, table=60, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=172.30.128.182,nw_frag=later actions=load:0->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=16032.719s, table=60, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=172.30.226.249,nw_frag=later actions=load:0->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=15973.833s, table=60, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=172.30.4.16,nw_frag=later actions=load:0xa66fde->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=15973.723s, table=60, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=172.30.121.81,nw_frag=later actions=load:0xa66fde->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=1556.498s, table=60, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=172.30.19.253,nw_frag=later actions=load:0xff6aac->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=1403.432s, table=60, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=172.30.84.242,nw_frag=later actions=load:0x752461->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=16321.930s, table=60, n_packets=0, n_bytes=0, priority=100,tcp,nw_dst=172.30.0.1,tp_dst=443 actions=load:0->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=16321.926s, table=60, n_packets=0, n_bytes=0, priority=100,udp,nw_dst=172.30.0.1,tp_dst=53 actions=load:0->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=16321.923s, table=60, n_packets=0, n_bytes=0, priority=100,tcp,nw_dst=172.30.0.1,tp_dst=53 actions=load:0->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=16102.172s, table=60, n_packets=0, n_bytes=0, priority=100,tcp,nw_dst=172.30.35.20,tp_dst=80 actions=load:0->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=16102.159s, table=60, n_packets=0, n_bytes=0, priority=100,tcp,nw_dst=172.30.35.20,tp_dst=443 actions=load:0->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=16102.145s, table=60, n_packets=0, n_bytes=0, priority=100,tcp,nw_dst=172.30.35.20,tp_dst=1935 actions=load:0->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=16102.139s, table=60, n_packets=0, n_bytes=0, priority=100,tcp,nw_dst=172.30.35.20,tp_dst=1936 actions=load:0->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=16060.986s, table=60, n_packets=0, n_bytes=0, priority=100,tcp,nw_dst=172.30.128.182,tp_dst=5000 actions=load:0->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=16032.711s, table=60, n_packets=0, n_bytes=0, priority=100,tcp,nw_dst=172.30.226.249,tp_dst=9000 actions=load:0->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=15973.826s, table=60, n_packets=0, n_bytes=0, priority=100,tcp,nw_dst=172.30.4.16,tp_dst=8080 actions=load:0xa66fde->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=15973.700s, table=60, n_packets=0, n_bytes=0, priority=100,tcp,nw_dst=172.30.121.81,tp_dst=3306 actions=load:0xa66fde->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=1556.488s, table=60, n_packets=0, n_bytes=0, priority=100,tcp,nw_dst=172.30.19.253,tp_dst=27017 actions=load:0xff6aac->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=1403.428s, table=60, n_packets=0, n_bytes=0, priority=100,tcp,nw_dst=172.30.84.242,tp_dst=27017 actions=load:0x752461->NXM_NX_REG1[],load:0x2->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=16322.003s, table=60, n_packets=0, n_bytes=0, priority=0 actions=drop cookie=0x0, duration=6092.919s, table=70, n_packets=0, n_bytes=0, priority=100,ip,nw_dst=10.128.0.5 actions=load:0xb6162b->NXM_NX_REG1[],load:0x6->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=1564.379s, table=70, n_packets=38, n_bytes=3100, priority=100,ip,nw_dst=10.128.0.12 actions=load:0->NXM_NX_REG1[],load:0xd->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=1556.273s, table=70, n_packets=28, n_bytes=2828, priority=100,ip,nw_dst=10.128.0.13 actions=load:0xff6aac->NXM_NX_REG1[],load:0xe->NXM_NX_REG2[],goto_table:80 cookie=0x0, duration=16322s, table=70, n_packets=0, n_bytes=0, priority=0 actions=drop cookie=0x0, duration=16321.997s, table=80, n_packets=12, n_bytes=974, priority=300,ip,nw_src=10.128.0.1 actions=output:NXM_NX_REG2[] cookie=0x0, duration=16321.953s, table=80, n_packets=52, n_bytes=5264, priority=200,reg0=0 actions=output:NXM_NX_REG2[] cookie=0x0, duration=16321.948s, table=80, n_packets=62, n_bytes=5035, priority=200,reg1=0 actions=output:NXM_NX_REG2[] cookie=0x0, duration=15973.809s, table=80, n_packets=0, n_bytes=0, priority=100,reg0=0xa66fde,reg1=0xa66fde actions=output:NXM_NX_REG2[] cookie=0x0, duration=6092.907s, table=80, n_packets=0, n_bytes=0, priority=100,reg0=0xb6162b,reg1=0xb6162b actions=output:NXM_NX_REG2[] cookie=0x0, duration=5232.620s, table=80, n_packets=0, n_bytes=0, priority=100,reg0=0x8a081,reg1=0x8a081 actions=output:NXM_NX_REG2[] cookie=0x0, duration=1556.484s, table=80, n_packets=0, n_bytes=0, priority=100,reg0=0xff6aac,reg1=0xff6aac actions=output:NXM_NX_REG2[] cookie=0x0, duration=1403.419s, table=80, n_packets=0, n_bytes=0, priority=100,reg0=0x752461,reg1=0x752461 actions=output:NXM_NX_REG2[] cookie=0x0, duration=16321.993s, table=80, n_packets=10, n_bytes=788, priority=0 actions=drop cookie=0x0, duration=16321.940s, table=90, n_packets=4, n_bytes=392, priority=100,ip,nw_dst=10.129.0.0/23 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.8.175.18->tun_dst,output:1 cookie=0x0, duration=16321.990s, table=90, n_packets=0, n_bytes=0, priority=0 actions=drop cookie=0x0, duration=571.527s, table=100, n_packets=0, n_bytes=0, priority=2,ip,reg0=0xb6162b,nw_dst=103.235.46.39 actions=output:2 cookie=0x0, duration=571.523s, table=100, n_packets=0, n_bytes=0, priority=1,ip,reg0=0xb6162b actions=drop cookie=0x0, duration=16321.987s, table=100, n_packets=0, n_bytes=0, priority=0 actions=output:2 cookie=0x0, duration=16321.984s, table=110, n_packets=0, n_bytes=0, priority=0 actions=drop cookie=0x0, duration=571.550s, table=111, n_packets=0, n_bytes=0, priority=100 actions=move:NXM_NX_REG0[]->NXM_NX_TUN_ID[0..31],set_field:10.8.175.18->tun_dst,output:1,goto_table:120 cookie=0x0, duration=16321.978s, table=120, n_packets=0, n_bytes=0, priority=0 actions=drop cookie=0x0, duration=16321.975s, table=253, n_packets=0, n_bytes=0, actions=note:01.03.00.00.00.00 [root@host-8-175-238 ~]# ovs-ofctl -O openflow13 dump-flows br0 | grep 10.128.0.12 cookie=0x0, duration=1611.183s, table=20, n_packets=6, n_bytes=252, priority=100,arp,in_port=13,arp_spa=10.128.0.12,arp_sha=de:47:de:7e:52:a0 actions=load:0->NXM_NX_REG0[],goto_table:21 cookie=0x0, duration=1611.179s, table=20, n_packets=26, n_bytes=2632, priority=100,ip,in_port=13,nw_src=10.128.0.12 actions=load:0->NXM_NX_REG0[],goto_table:21 cookie=0x0, duration=1611.175s, table=40, n_packets=6, n_bytes=252, priority=100,arp,arp_tpa=10.128.0.12 actions=output:13 cookie=0x0, duration=1611.172s, table=70, n_packets=38, n_bytes=3100, priority=100,ip,nw_dst=10.128.0.12 actions=load:0->NXM_NX_REG1[],load:0xd->NXM_NX_REG2[],goto_table:80 This issue is not specific to pod-network isolate-projects. Pod network Join/Isolate/MakeGlobal operations are also affected, anything that involves updating netID for the pod. The issue was introduced couple of weeks back when we bumped kubernetes to 1.6.1 (https://github.com/openshift/origin/commit/e8d67d3ea07a726c1bbd34d5c0a86872c54567de). Problem Control Flow: - Issue NetNamespace update through pod-network join-projects/isolate-projects/make-projects-global - NetNamespace resource gets updated with new netID - On nodes, watchNetNamespaces gets this event and calls UpdateNetNamespace() -> updatePodNetwork [multitenant.go] -> UpdatePod() -> handleCNIRequest() [node.go] -> podManager.update() -> getContainerNetnsPath() [pod_linux.go] Fails at runtime, ok := m.host.GetRuntime().(*dockertools.DockerManager) - In kube 1.6.1, kubecfg.EnableCRI is set to true by default and that changes the kubelet runtime to use KubeGenericRuntimeManager() instead of DockerManager() [kubelet.go] - Typecasting runtime to generic manager instead of docker manager will not help as GetNetNS() is not supported in kube manager [kuberuntime_manager.go] There's a trello card for fixing update btw: https://trello.com/c/IP58IhfF/504-fix-pod-update-after-kube-1-6-rebase *** This bug has been marked as a duplicate of bug 1453113 *** hi, Ravi I checked the bug 1453113 and do not think this is duplicate bug from the content as least. since this bug affect all networking Join/Isolate/MakeGlobal function. So I'd like to open and trace this in case missing it. thanks. Should be fixed by https://github.com/openshift/origin/pull/14446 seems this PR did not fixed this issue. see the comment in the card: https://trello.com/c/IP58IhfF/504-fix-pod-update-after-kube-16-rebase Tested it again today and this time UpdatePod failed with error 'failed to find pod details from OVS flows'. This is an issue in https://github.com/openshift/origin/pull/14446 Problem summary: When we are adding table=20 ovs flows with note generated from ContainerID, we are actually using pod SandboxID (origin-pod or pause container id) which is good as it is holding the network namespace but when pod Update op is performed we are searching the flows that match pod ContainerID not the SandboxID. So no ovs flows will match and pod update op fails. Potential fix: Pass SandboxID for update op. If this requires docker inspect or sth, may be we can use pod UID as flow note which should be unique? @Dan, Do you foresee any similar issues with pod delete/teardown operation? Commit pushed to master at https://github.com/openshift/origin https://github.com/openshift/origin/commit/3ef342d42a30f4b8f4cb55dc67b3e93be9daecd0 Bug 1453190 - Fix pod update operation Use pod sandbox ID to update the pod as opposed to pod container ID. OVS flow note identified by sandbox ID is desired as network namespace is held by the pod sandbox and pod could have many containers and single container ID may not represent all the pod ovs flows. Since we can't use kubelet 'Host' (explanation refer commit: f1118459), we use runtime shim endpoint to connect to runtime service using gRPC. This is the same mechanism used by kubelet(GenericKubeletRuntimeManager) to talk to runtime service(docker/rkt). Verified this bug on v3.6.133 when the netnamespace is changed. the related pod openflows rule will also be updated. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188 |