Bug 1454948 - pod-to-pod connectivity lost after rescaling with ovs-multitenant
Summary: pod-to-pod connectivity lost after rescaling with ovs-multitenant
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Ben Bennett
QA Contact: Meng Bo
URL:
Whiteboard: aos-scalability-36
: 1452225 (view as bug list)
Depends On:
Blocks: 1267746 1462338
TreeView+ depends on / blocked
 
Reported: 2017-05-23 20:49 UTC by Ruben Romero Montes
Modified: 2017-09-04 06:56 UTC (History)
23 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: We were incorrectly removing VNID allow rules before they were really unused. It appears that when containers had startup errors it can cause the tracking to get out of sync. Consequence: The rules that allowed communication for a namespace were removed early, so that if there were still pod in that namespace on the node, they could not communicate with one another. Fix: Change the way that the tracking is done so that we avoid the nasty edge cases around pod creation / deletion failures. Result: The VNID tracking does not fail so traffic flows.
Clone Of:
: 1462338 (view as bug list)
Environment:
Last Closed: 2017-08-10 05:25:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
iptables 10.254.185.49 (120.05 KB, text/plain)
2017-05-30 13:23 UTC, Ruben Romero Montes
no flags Details
iptables 10.254.250.55 (120.05 KB, text/plain)
2017-05-30 13:24 UTC, Ruben Romero Montes
no flags Details
oadm diagnostics (3.51 MB, application/octet-stream)
2017-05-30 13:25 UTC, Ruben Romero Montes
no flags Details
ovs-dump-pew05 (45.81 KB, text/plain)
2017-06-08 15:07 UTC, Ruben Romero Montes
no flags Details
ovs-dump-azuur05 (47.33 KB, text/plain)
2017-06-08 15:08 UTC, Ruben Romero Montes
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Origin (Github) 14092 0 None None None 2017-06-02 18:32:11 UTC
Origin (Github) 14560 0 None None None 2017-06-12 14:38:15 UTC
Red Hat Knowledge Base (Solution) 3077711 0 None None None 2017-06-14 16:37:48 UTC
Red Hat Product Errata RHEA-2017:1716 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 09:02:50 UTC

Description Ruben Romero Montes 2017-05-23 20:49:43 UTC
Description of problem:
Inside the same namespace having two applications (A and B). After scaling down B, pods from application A have no longer connectivity to the pods. However connectivity to the service endpoint works or from outside the pods.

Version-Release number of selected component (if applicable):
openshift v3.5.5.15
kubernetes v1.5.2+43a9be4
networkPlugin: ovs-multitenant

How reproducible:
Sometimes

Steps to Reproduce:
1. oc new-project pod2pod
2. Create example Perl applications "panda" and "koala"
 $ oc new-app perl~https://github.com/openshift/dancer-ex.git --name=panda
 $ oc new-app perl~https://github.com/openshift/dancer-ex.git --name=koala
3. Label nodes to ensure each app is deployed in different nodes
 $ oc label node infra-0.rromerorhsso.quicklab.pnq2.cee.redhat.com app=panda
 $ oc label node node-0.rromerorhsso.quicklab.pnq2.cee.redhat.com app=koala
4. Patch both deployment configs
 $ oc patch dc panda -p '{"spec": {"template": {"spec": {"nodeSelector": {"app": "panda"}}}}}'
 $ oc patch dc koala -p '{"spec": {"template": {"spec": {"nodeSelector": {"app": "koala"}}}}}'
5. Scale up koala and test connectivity
 $ oc scale dc/koala --replicas=3
 $ for panda in `oc get po | grep Running | grep panda  | awk '{print$1}'`; do for koala in `oc get po -o wide | grep Running | grep koala | awk '{print$6}'`; do echo "$panda to $koala"; oc exec $panda -- curl -ILs http://$koala:8080 ; done ; done
panda-2-6292g to 10.130.0.47
HTTP/1.1 200 OK
Date: Tue, 23 May 2017 20:25:51 GMT
Server: Apache/2.4.18 (Red Hat) mod_perl/2.0.9 Perl/v5.24.0
Content-Length: 42494
Content-Type: text/html; charset=UTF-8

panda-2-6292g to 10.130.0.50
HTTP/1.1 200 OK
Date: Tue, 23 May 2017 20:25:51 GMT
Server: Apache/2.4.18 (Red Hat) mod_perl/2.0.9 Perl/v5.24.0
Content-Length: 42494
Content-Type: text/html; charset=UTF-8

panda-2-6292g to 10.130.0.49
HTTP/1.1 200 OK
Date: Tue, 23 May 2017 20:25:52 GMT
Server: Apache/2.4.18 (Red Hat) mod_perl/2.0.9 Perl/v5.24.0
Content-Length: 42494
Content-Type: text/html; charset=UTF-8

6. Scale down to 1 and repeat connectivity
 $ oc scale dc/koala --replicas=1
 $ for koala in `oc get po -o wide | grep Running | grep koala | awk '{print$6}'`; do echo "$panda to $koala"; oc exec $panda -- curl -ILs http://$koala:8080 ; done ; done

Actual results:
sh-4.2$ curl -IL http://10.129.0.40:8080
curl: (7) Failed connect to 10.129.0.40:8080; Connection timed out

Expected results:
sh-4.2$ curl -IL http://10.129.0.40:8080
HTTP/1.1 200 OK

Additional info:
I tried to replicate the problem in the same cluster and I couldn't. Besides, after trying the initial project was fixed and I have not been able to reproduce it again.
Here is all the information I could retrieve.

[root@infra-0 ~]# docker inspect -f '{{.State.Pid}}' d493cfd23b27
122423
[root@infra-0 ~]# nsenter -n -t 122423
[root@infra-0 ~]# iptables-save > nodejs.iptables
[root@infra-0 ~]# tcpdump -i any -w nodejs.pcap
[root@infra-0 ~]# ip neigh 
10.130.0.41 dev eth0 lladdr 7e:24:b3:77:cc:47 STALE
10.130.0.42 dev eth0 lladdr e6:98:2f:43:5e:bf STALE
10.130.0.38 dev eth0 lladdr 0a:85:3c:d2:52:8e STALE
10.129.0.1 dev eth0 lladdr 42:4a:16:78:ed:aa REACHABLE
10.130.0.39 dev eth0 lladdr 8e:1c:95:72:6d:56 STALE
10.130.0.40 dev eth0 lladdr d2:e7:78:9e:8c:20 STALE
[root@infra-0 ~]# ip route
default via 10.129.0.1 dev eth0 
10.128.0.0/14 dev eth0 
10.129.0.0/23 dev eth0  proto kernel  scope link  src 10.129.0.40 
224.0.0.0/4 dev eth0 

[root@node-0 ~]# docker inspect -f '{{.State.Pid}}' 6c02fd0db3d4 
4605
[root@node-0 ~]# nsenter -n -t 4605
[root@node-0 ~]# iptables-save > dancer.iptables
[root@node-0 ~]# tcpdump -i any -w dancer.pcap
[root@node-0 ~]# ip neigh
10.130.0.42 dev eth0 lladdr e6:98:2f:43:5e:bf STALE
10.129.0.43 dev eth0 lladdr ae:2c:78:8c:c3:8a STALE
10.129.0.44 dev eth0 lladdr 0e:45:54:ab:19:e7 STALE
10.129.0.41 dev eth0 lladdr 42:4c:90:91:32:01 STALE
10.130.0.1 dev eth0 lladdr fa:ea:35:8d:8e:ba STALE
10.129.0.42 dev eth0 lladdr 56:1d:83:69:00:84 STALE
10.129.0.40 dev eth0 lladdr 4a:3d:9c:2f:60:a7 STALE
10.129.0.1 dev eth0 lladdr 42:4a:16:78:ed:aa STALE
[root@node-0 ~]# ip route
default via 10.130.0.1 dev eth0 
10.128.0.0/14 dev eth0 
10.130.0.0/23 dev eth0  proto kernel  scope link  src 10.130.0.38 
224.0.0.0/4 dev eth0

Comment 4 Ruben Romero Montes 2017-05-29 12:36:54 UTC
As I said in the problem description, I managed to reproduce it the first time but not the second. 
The "connection timeout" in step 6 was retrieved from the initial reproducer with "dancer" and "nodejs" applications. So are the outuput of the iptables, tcpdump, ip neigh and ip route commands.

Comment 5 Ruben Romero Montes 2017-05-30 13:23:25 UTC
Created attachment 1283466 [details]
iptables 10.254.185.49

Comment 6 Ruben Romero Montes 2017-05-30 13:24:02 UTC
Created attachment 1283467 [details]
iptables 10.254.250.55

Comment 7 Ruben Romero Montes 2017-05-30 13:25:08 UTC
Created attachment 1283468 [details]
oadm diagnostics

Comment 8 Weibin Liang 2017-05-30 20:43:10 UTC
Can reproduce this issue in my env:

Below is the steps I used:

oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/OCP/deployment-with-pod.yaml

oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/3b3859001d64e0a1aba78ff20646a2fc29078bf3/deployment/deployment-with-service.yaml

for pod in `oc get po | grep Running | grep hello-pod  | awk '{print$1}'`; do for service in `oc get po -o wide | grep Running | grep openshift | awk '{print$6}'`; do echo "$pod to $service"; oc exec $pod -- curl -ILs http://$service:8080 ; done ; done

oc scale dc/hello-pod --replicas=5
oc scale dc/hello-openshift --replicas=5

for pod in `oc get po | grep Running | grep hello-pod  | awk '{print$1}'`; do for service in `oc get po -o wide | grep Running | grep openshift | awk '{print$6}'`; do echo "$pod to $service"; oc exec $pod -- curl -ILs http://$service:8080 ; done ; done

oc scale dc/hello-pod --replicas=1
oc scale dc/hello-openshift --replicas=1

for pod in `oc get po | grep Running | grep hello-pod  | awk '{print$1}'`; do for service in `oc get po -o wide | grep Running | grep openshift | awk '{print$6}'`; do echo "$pod to $service"; oc exec $pod -- curl -ILs http://$service:8080 ; done ; done

oc scale dc/hello-pod --replicas=5
oc scale dc/hello-openshift --replicas=5

for pod in `oc get po | grep Running | grep hello-pod  | awk '{print$1}'`; do for service in `oc get po -o wide | grep Running | grep openshift | awk '{print$6}'`; do echo "$pod to $service"; oc exec $pod -- curl -ILs http://$service:8080 ; done ; done

oc rollout latest hello-openshift
oc rollout latest hello-pod

for pod in `oc get po | grep Running | grep hello-pod  | awk '{print$1}'`; do for service in `oc get po -o wide | grep Running | grep openshift | awk '{print$6}'`; do echo "$pod to $service"; oc exec $pod -- curl -ILs http://$service:8080 ; done ; done


Even I scale high number as below still can not see the issue:
oc scale dc/hello-pod --replicas=20
oc scale dc/hello-openshift --replicas=20
oc rollout latest hello-openshift
oc rollout latest hello-pod

Comment 9 Dan Winship 2017-05-30 20:54:27 UTC
> Can reproduce this issue in my env:
...
> Even I scale high number as below still can not see the issue:

Did you mean to say "CAN'T reproduce this" in the first sentence?

Comment 10 Weibin Liang 2017-05-31 12:17:44 UTC
Yes, I want to say I CAN'T reproduce it in my env.

Comment 12 Weibin Liang 2017-06-02 18:34:10 UTC
I reproduce this pods connectivity issue in my env after run checking script instead of manual testing.

Comment 13 Weibin Liang 2017-06-02 20:09:17 UTC
Reproduce Steps:

oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/OCP/deployment-with-pod.yaml
oc create -f https://raw.githubusercontent.com/weliang1/Openshift_Networking/master/OCP/test.yaml
sleep 10

oc scale dc/hello-pod --replicas=5
oc scale dc/hello-openshift --replicas=5

sleep 20
for pod in `oc get po | grep Running | grep hello-pod  | awk '{print$1}'`; do for service in `oc get po -o wide | grep Running | grep openshift | awk '{print$6}'`; do echo "$pod to $service"; oc exec $pod -- curl -ILs http://$service:8080 ; done ; done

while true
do
for pod in `oc get po | grep Running | grep hello-pod  | awk '{print$1}'`; do for service in `oc get po -o wide | grep Running | grep openshift | awk '{print$6}'`; do echo "$pod to $service"; oc exec $pod -- curl -ILs http://$service:8080 ; done ; done
oc rollout latest hello-openshift
oc rollout latest hello-pod
sleep 35
done

Comment 15 Weibin Liang 2017-06-07 18:46:44 UTC
My testing env: AWS, mulitenant plugin, containerized, one master, two nodes.

So far I can not reproduce this issue when I use NON containerized env.

Comment 16 Ben Bennett 2017-06-08 14:01:40 UTC
Please note that the original issue was reported against a setup that is _not_ containerized.  So, whatever the race condition is, it may happen more often when containerized, but that's not the root cause of the problem.

Comment 17 Ruben Romero Montes 2017-06-08 15:07:30 UTC
Created attachment 1286179 [details]
ovs-dump-pew05

Comment 18 Ruben Romero Montes 2017-06-08 15:08:47 UTC
Created attachment 1286180 [details]
ovs-dump-azuur05

Comment 19 Ruben Romero Montes 2017-06-08 15:13:05 UTC
I have attached the output of the following command run on the two nodes:
  # ovs-ofctl -O OpenFlow13 dump-flows br0

Source pod/node
uzl-rhel-apache-ipam-115-d58qf   2/2       Running     0          20m       10.1.17.177   osclu1-azuur-05.uz.kuleuven.ac.be

target pod/node
uzl-rhel-perl-ipam-102-hbwhb     1/1       Running     0          20m       10.1.11.151   osclu1-pew-05.uz.kuleuven.ac.be

Nodes IP
osclu1-pew-05.uz.kuleuven.ac.be= 10.254.185.49
osclu1-azuur-05.uz.kuleuven.ac.be=10.254.250.55

As expected pod to pod connectivity fails but source-node to pod and target-node to pod connectivity works.

* note that the connectivity is affected in both directions

Comment 20 Ben Bennett 2017-06-09 13:16:39 UTC
Based on those traces, the OVS is wrong in the same way it was with Weibin's case.

The VNID for the project that the pods are in is 0x39d500, and that does not exist in table 80 in the ovs-dump-azuur05 dump.

Comment 23 Eric Paris 2017-06-14 21:46:50 UTC
https://github.com/openshift/origin/pull/14560

Comment 25 Ben Bennett 2017-06-20 18:42:43 UTC
*** Bug 1452225 has been marked as a duplicate of this bug. ***

Comment 26 Weibin Liang 2017-06-27 14:43:03 UTC
Tested and verified in "atomic-openshift-3.6.96-1.git.0.381dd63.el7" image

Comment 35 errata-xmlrpc 2017-08-10 05:25:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716


Note You need to log in before you can comment on or make changes to this bug.