Bug 1659864 - pods ip route is not correct, didn't have default route [NEEDINFO]
Summary: pods ip route is not correct, didn't have default route
Keywords:
Status: CLOSED DUPLICATE of bug 1654044
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.11.z
Assignee: Phil Cameron
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-12-17 04:35 UTC by Wang Haoran
Modified: 2019-04-15 16:32 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-15 16:32:56 UTC
Target Upstream Version:
haowang: needinfo? (bmeng)


Attachments (Terms of Use)
sdn pod logs on the problem node (814.17 KB, text/plain)
2018-12-17 04:35 UTC, Wang Haoran
no flags Details
iptables on the problem node (66.70 KB, text/plain)
2018-12-17 04:37 UTC, Wang Haoran
no flags Details
openflow rules on the node (14.75 KB, text/plain)
2018-12-17 04:38 UTC, Wang Haoran
no flags Details

Description Wang Haoran 2018-12-17 04:35:40 UTC
Created attachment 1514964 [details]
sdn pod logs on the problem node

Description of problem:

We have pods that didn't have correct ip route inside the pod that causing network problem.
Version-Release number of selected component (if applicable):

oc v3.11.43
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

network plubin: redhat/openshift-ovs-networkpolicy

Problem pod: there is no default route
sh-4.2$ ip route
10.123.172.0/23 dev eth0 proto kernel scope link src 10.123.172.12 

Good pod:
h-4.2$ ip route
default via 10.123.168.1 dev eth0 
10.123.128.0/17 dev eth0 
10.123.168.0/23 dev eth0 proto kernel scope link src 10.123.168.19 
224.0.0.0/4 dev eth0 

How reproducible:
sometimes

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Wang Haoran 2018-12-17 04:37:37 UTC
Created attachment 1514965 [details]
iptables on the problem node

Comment 2 Wang Haoran 2018-12-17 04:38:03 UTC
Created attachment 1514966 [details]
openflow rules on the node

Comment 4 Casey Callendrello 2018-12-17 18:58:50 UTC
Wait, that's strange - the problem pod seems to be in the wrong cidr? Did you maybe change plugins?

Comment 5 Wang Haoran 2018-12-18 00:13:36 UTC
(In reply to Casey Callendrello from comment #4)
> Wait, that's strange - the problem pod seems to be in the wrong cidr? Did
> you maybe change plugins?

we didn't change the plugin, the cidr is correct, the two pods I listed are in different nodes.


#oc get pods -n default -o wide |grep zagg
oso-rhel7-zagg-web-1-tbz4p   1/1       Running   1          3d        10.123.168.19   ip-172-31-50-89.ec2.internal    <none>
oso-rhel7-zagg-web-1-whnsg   1/1       Running   1          3d        10.123.172.12   ip-172-31-48-241.ec2.internal   <none>

#oc get hostsubnet ip-172-31-48-241.ec2.internal ip-172-31-50-89.ec2.internal
NAME                            HOST                            HOST IP         SUBNET            EGRESS CIDRS   EGRESS IPS
ip-172-31-48-241.ec2.internal   ip-172-31-48-241.ec2.internal   172.31.48.241   10.123.172.0/23   []             []
NAME                            HOST                            HOST IP         SUBNET            EGRESS CIDRS   EGRESS IPS
ip-172-31-50-89.ec2.internal    ip-172-31-50-89.ec2.internal    172.31.50.89    10.123.168.0/23   []             []

Comment 6 Casey Callendrello 2019-01-03 22:37:39 UTC
Hi there,
Would it be possible to get:

- kubelet logs
- sdn logs
- and an "ip route" from inside the problem pod

All on the same node, and all around the time of problem pod creation?

Comment 12 wangzhida 2019-03-11 01:22:48 UTC
any update ?

Comment 30 Ryan Howe 2019-04-15 15:42:01 UTC
Bug is the same as the 3.10 bug 1654044 but this one is open for 3.11. 

This should be resolved by: 
https://github.com/openshift/openshift-ansible/pull/11409

Currently not available in the latest OpenShift Ansible package  openshift-ansible-3.11.98-1.git.0.3cfa7c3.el7.noarch.rpm 

Workaround:
On nodes run the following:
~~~
echo -e "r /etc/cni/net.d/80-openshift-network.conf\nr /etc/origin/openvswitch/conf.db"  > /usr/lib/tmpfiles.d/cleanup-cni.conf
~~~

Comment 32 Stephen Cuppett 2019-04-15 16:32:56 UTC
Marking duplicate so QE can verify based on above comments.

*** This bug has been marked as a duplicate of bug 1654044 ***


Note You need to log in before you can comment on or make changes to this bug.