Bug 1454948

Summary:

pod-to-pod connectivity lost after rescaling with ovs-multitenant

Product:

OpenShift Container Platform

Reporter:

Ruben Romero Montes <rromerom>

Component:

Networking

Assignee:

Ben Bennett <bbennett>

Status:

CLOSED ERRATA

QA Contact:

Meng Bo <bmeng>

Severity:

urgent

Docs Contact:

Priority:

urgent

Version:

3.5.0

CC:

akaiser, aos-bugs, bbennett, danw, dcbw, eparis, gsapienz, javier.ramirez, jkaur, mark.vinkx, mifiedle, misalunk, nbhatt, pdwyer, rhowe, sjr, smunilla, tcarlin, tibrahim, tmanor, vcorrea, wabouham, weliang

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

aos-scalability-36

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Cause: We were incorrectly removing VNID allow rules before they were really unused. It appears that when containers had startup errors it can cause the tracking to get out of sync. Consequence: The rules that allowed communication for a namespace were removed early, so that if there were still pod in that namespace on the node, they could not communicate with one another. Fix: Change the way that the tracking is done so that we avoid the nasty edge cases around pod creation / deletion failures. Result: The VNID tracking does not fail so traffic flows.

Story Points:

---

Clone Of:

Clones:

1462338 (view as bug list)

Environment:

Last Closed:

2017-08-10 05:25:32 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1267746, 1462338

Attachments:

Description	Flags
iptables 10.254.185.49	none
iptables 10.254.250.55	none
oadm diagnostics	none
ovs-dump-pew05	none
ovs-dump-azuur05	none

Description Ruben Romero Montes 2017-05-23 20:49:43 UTC

Description of problem:
Inside the same namespace having two applications (A and B). After scaling down B, pods from application A have no longer connectivity to the pods. However connectivity to the service endpoint works or from outside the pods.

Version-Release number of selected component (if applicable):
openshift v3.5.5.15
kubernetes v1.5.2+43a9be4
networkPlugin: ovs-multitenant

How reproducible:
Sometimes

Steps to Reproduce:
1. oc new-project pod2pod
2. Create example Perl applications "panda" and "koala"
 $ oc new-app perl~https://github.com/openshift/dancer-ex.git --name=panda
 $ oc new-app perl~https://github.com/openshift/dancer-ex.git --name=koala
3. Label nodes to ensure each app is deployed in different nodes
 $ oc label node infra-0.rromerorhsso.quicklab.pnq2.cee.redhat.com app=panda
 $ oc label node node-0.rromerorhsso.quicklab.pnq2.cee.redhat.com app=koala
4. Patch both deployment configs
 $ oc patch dc panda -p '{"spec": {"template": {"spec": {"nodeSelector": {"app": "panda"}}}}}'
 $ oc patch dc koala -p '{"spec": {"template": {"spec": {"nodeSelector": {"app": "koala"}}}}}'
5. Scale up koala and test connectivity
 $ oc scale dc/koala --replicas=3
 $ for panda in `oc get po | grep Running | grep panda  | awk '{print$1}'`; do for koala in `oc get po -o wide | grep Running | grep koala | awk '{print$6}'`; do echo "$panda to $koala"; oc exec $panda -- curl -ILs http://$koala:8080 ; done ; done
panda-2-6292g to 10.130.0.47
HTTP/1.1 200 OK
Date: Tue, 23 May 2017 20:25:51 GMT
Server: Apache/2.4.18 (Red Hat) mod_perl/2.0.9 Perl/v5.24.0
Content-Length: 42494
Content-Type: text/html; charset=UTF-8

panda-2-6292g to 10.130.0.50
HTTP/1.1 200 OK
Date: Tue, 23 May 2017 20:25:51 GMT
Server: Apache/2.4.18 (Red Hat) mod_perl/2.0.9 Perl/v5.24.0
Content-Length: 42494
Content-Type: text/html; charset=UTF-8

panda-2-6292g to 10.130.0.49
HTTP/1.1 200 OK
Date: Tue, 23 May 2017 20:25:52 GMT
Server: Apache/2.4.18 (Red Hat) mod_perl/2.0.9 Perl/v5.24.0
Content-Length: 42494
Content-Type: text/html; charset=UTF-8

6. Scale down to 1 and repeat connectivity
 $ oc scale dc/koala --replicas=1
 $ for koala in `oc get po -o wide | grep Running | grep koala | awk '{print$6}'`; do echo "$panda to $koala"; oc exec $panda -- curl -ILs http://$koala:8080 ; done ; done

Actual results:
sh-4.2$ curl -IL http://10.129.0.40:8080
curl: (7) Failed connect to 10.129.0.40:8080; Connection timed out

Expected results:
sh-4.2$ curl -IL http://10.129.0.40:8080
HTTP/1.1 200 OK

Additional info:
I tried to replicate the problem in the same cluster and I couldn't. Besides, after trying the initial project was fixed and I have not been able to reproduce it again.
Here is all the information I could retrieve.

[root@infra-0 ~]# docker inspect -f '{{.State.Pid}}' d493cfd23b27
122423
[root@infra-0 ~]# nsenter -n -t 122423
[root@infra-0 ~]# iptables-save > nodejs.iptables
[root@infra-0 ~]# tcpdump -i any -w nodejs.pcap
[root@infra-0 ~]# ip neigh 
10.130.0.41 dev eth0 lladdr 7e:24:b3:77:cc:47 STALE
10.130.0.42 dev eth0 lladdr e6:98:2f:43:5e:bf STALE
10.130.0.38 dev eth0 lladdr 0a:85:3c:d2:52:8e STALE
10.129.0.1 dev eth0 lladdr 42:4a:16:78:ed:aa REACHABLE
10.130.0.39 dev eth0 lladdr 8e:1c:95:72:6d:56 STALE
10.130.0.40 dev eth0 lladdr d2:e7:78:9e:8c:20 STALE
[root@infra-0 ~]# ip route
default via 10.129.0.1 dev eth0 
10.128.0.0/14 dev eth0 
10.129.0.0/23 dev eth0  proto kernel  scope link  src 10.129.0.40 
224.0.0.0/4 dev eth0 

[root@node-0 ~]# docker inspect -f '{{.State.Pid}}' 6c02fd0db3d4 
4605
[root@node-0 ~]# nsenter -n -t 4605
[root@node-0 ~]# iptables-save > dancer.iptables
[root@node-0 ~]# tcpdump -i any -w dancer.pcap
[root@node-0 ~]# ip neigh
10.130.0.42 dev eth0 lladdr e6:98:2f:43:5e:bf STALE
10.129.0.43 dev eth0 lladdr ae:2c:78:8c:c3:8a STALE
10.129.0.44 dev eth0 lladdr 0e:45:54:ab:19:e7 STALE
10.129.0.41 dev eth0 lladdr 42:4c:90:91:32:01 STALE
10.130.0.1 dev eth0 lladdr fa:ea:35:8d:8e:ba STALE
10.129.0.42 dev eth0 lladdr 56:1d:83:69:00:84 STALE
10.129.0.40 dev eth0 lladdr 4a:3d:9c:2f:60:a7 STALE
10.129.0.1 dev eth0 lladdr 42:4a:16:78:ed:aa STALE
[root@node-0 ~]# ip route
default via 10.130.0.1 dev eth0 
10.128.0.0/14 dev eth0 
10.130.0.0/23 dev eth0  proto kernel  scope link  src 10.130.0.38 
224.0.0.0/4 dev eth0

Comment 4 Ruben Romero Montes 2017-05-29 12:36:54 UTC

As I said in the problem description, I managed to reproduce it the first time but not the second. 
The "connection timeout" in step 6 was retrieved from the initial reproducer with "dancer" and "nodejs" applications. So are the outuput of the iptables, tcpdump, ip neigh and ip route commands.

Comment 5 Ruben Romero Montes 2017-05-30 13:23:25 UTC

Created attachment 1283466 [details]
iptables 10.254.185.49

Comment 6 Ruben Romero Montes 2017-05-30 13:24:02 UTC

Created attachment 1283467 [details]
iptables 10.254.250.55

Comment 7 Ruben Romero Montes 2017-05-30 13:25:08 UTC

Created attachment 1283468 [details]
oadm diagnostics

Comment 8 Weibin Liang 2017-05-30 20:43:10 UTC

Can reproduce this issue in my env:

Below is the steps I used:

oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/OCP/deployment-with-pod.yaml

oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/3b3859001d64e0a1aba78ff20646a2fc29078bf3/deployment/deployment-with-service.yaml

for pod in `oc get po | grep Running | grep hello-pod  | awk '{print$1}'`; do for service in `oc get po -o wide | grep Running | grep openshift | awk '{print$6}'`; do echo "$pod to $service"; oc exec $pod -- curl -ILs http://$service:8080 ; done ; done

oc scale dc/hello-pod --replicas=5
oc scale dc/hello-openshift --replicas=5

for pod in `oc get po | grep Running | grep hello-pod  | awk '{print$1}'`; do for service in `oc get po -o wide | grep Running | grep openshift | awk '{print$6}'`; do echo "$pod to $service"; oc exec $pod -- curl -ILs http://$service:8080 ; done ; done

oc scale dc/hello-pod --replicas=1
oc scale dc/hello-openshift --replicas=1

for pod in `oc get po | grep Running | grep hello-pod  | awk '{print$1}'`; do for service in `oc get po -o wide | grep Running | grep openshift | awk '{print$6}'`; do echo "$pod to $service"; oc exec $pod -- curl -ILs http://$service:8080 ; done ; done

oc scale dc/hello-pod --replicas=5
oc scale dc/hello-openshift --replicas=5

for pod in `oc get po | grep Running | grep hello-pod  | awk '{print$1}'`; do for service in `oc get po -o wide | grep Running | grep openshift | awk '{print$6}'`; do echo "$pod to $service"; oc exec $pod -- curl -ILs http://$service:8080 ; done ; done

oc rollout latest hello-openshift
oc rollout latest hello-pod

for pod in `oc get po | grep Running | grep hello-pod  | awk '{print$1}'`; do for service in `oc get po -o wide | grep Running | grep openshift | awk '{print$6}'`; do echo "$pod to $service"; oc exec $pod -- curl -ILs http://$service:8080 ; done ; done


Even I scale high number as below still can not see the issue:
oc scale dc/hello-pod --replicas=20
oc scale dc/hello-openshift --replicas=20
oc rollout latest hello-openshift
oc rollout latest hello-pod

Comment 9 Dan Winship 2017-05-30 20:54:27 UTC

> Can reproduce this issue in my env:
...
> Even I scale high number as below still can not see the issue:

Did you mean to say "CAN'T reproduce this" in the first sentence?

Comment 10 Weibin Liang 2017-05-31 12:17:44 UTC

Yes, I want to say I CAN'T reproduce it in my env.

Comment 12 Weibin Liang 2017-06-02 18:34:10 UTC

I reproduce this pods connectivity issue in my env after run checking script instead of manual testing.

Comment 13 Weibin Liang 2017-06-02 20:09:17 UTC

Reproduce Steps:

oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/OCP/deployment-with-pod.yaml
oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/OCP/test.yaml
sleep 10

oc scale dc/hello-pod --replicas=5
oc scale dc/hello-openshift --replicas=5

sleep 20
for pod in `oc get po | grep Running | grep hello-pod  | awk '{print$1}'`; do for service in `oc get po -o wide | grep Running | grep openshift | awk '{print$6}'`; do echo "$pod to $service"; oc exec $pod -- curl -ILs http://$service:8080 ; done ; done

while true
do
for pod in `oc get po | grep Running | grep hello-pod  | awk '{print$1}'`; do for service in `oc get po -o wide | grep Running | grep openshift | awk '{print$6}'`; do echo "$pod to $service"; oc exec $pod -- curl -ILs http://$service:8080 ; done ; done
oc rollout latest hello-openshift
oc rollout latest hello-pod
sleep 35
done

Comment 15 Weibin Liang 2017-06-07 18:46:44 UTC

My testing env: AWS, mulitenant plugin, containerized, one master, two nodes.

So far I can not reproduce this issue when I use NON containerized env.

Comment 16 Ben Bennett 2017-06-08 14:01:40 UTC

Please note that the original issue was reported against a setup that is _not_ containerized.  So, whatever the race condition is, it may happen more often when containerized, but that's not the root cause of the problem.

Comment 17 Ruben Romero Montes 2017-06-08 15:07:30 UTC

Created attachment 1286179 [details]
ovs-dump-pew05

Comment 18 Ruben Romero Montes 2017-06-08 15:08:47 UTC

Created attachment 1286180 [details]
ovs-dump-azuur05

Comment 19 Ruben Romero Montes 2017-06-08 15:13:05 UTC

I have attached the output of the following command run on the two nodes:
  # ovs-ofctl -O OpenFlow13 dump-flows br0

Source pod/node
uzl-rhel-apache-ipam-115-d58qf   2/2       Running     0          20m       10.1.17.177   osclu1-azuur-05.uz.kuleuven.ac.be

target pod/node
uzl-rhel-perl-ipam-102-hbwhb     1/1       Running     0          20m       10.1.11.151   osclu1-pew-05.uz.kuleuven.ac.be

Nodes IP
osclu1-pew-05.uz.kuleuven.ac.be= 10.254.185.49
osclu1-azuur-05.uz.kuleuven.ac.be=10.254.250.55

As expected pod to pod connectivity fails but source-node to pod and target-node to pod connectivity works.

* note that the connectivity is affected in both directions

Comment 20 Ben Bennett 2017-06-09 13:16:39 UTC

Based on those traces, the OVS is wrong in the same way it was with Weibin's case.

The VNID for the project that the pods are in is 0x39d500, and that does not exist in table 80 in the ovs-dump-azuur05 dump.

Comment 23 Eric Paris 2017-06-14 21:46:50 UTC

https://github.com/openshift/origin/pull/14560

Comment 25 Ben Bennett 2017-06-20 18:42:43 UTC

*** Bug 1452225 has been marked as a duplicate of this bug. ***

Comment 26 Weibin Liang 2017-06-27 14:43:03 UTC

Tested and verified in "atomic-openshift-3.6.96-1.git.0.381dd63.el7" image

Comment 35 errata-xmlrpc 2017-08-10 05:25:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716