Created attachment 1514964 [details] sdn pod logs on the problem node Description of problem: We have pods that didn't have correct ip route inside the pod that causing network problem. Version-Release number of selected component (if applicable): oc v3.11.43 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO network plubin: redhat/openshift-ovs-networkpolicy Problem pod: there is no default route sh-4.2$ ip route 10.123.172.0/23 dev eth0 proto kernel scope link src 10.123.172.12 Good pod: h-4.2$ ip route default via 10.123.168.1 dev eth0 10.123.128.0/17 dev eth0 10.123.168.0/23 dev eth0 proto kernel scope link src 10.123.168.19 224.0.0.0/4 dev eth0 How reproducible: sometimes Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 1514965 [details] iptables on the problem node
Created attachment 1514966 [details] openflow rules on the node
Wait, that's strange - the problem pod seems to be in the wrong cidr? Did you maybe change plugins?
(In reply to Casey Callendrello from comment #4) > Wait, that's strange - the problem pod seems to be in the wrong cidr? Did > you maybe change plugins? we didn't change the plugin, the cidr is correct, the two pods I listed are in different nodes. #oc get pods -n default -o wide |grep zagg oso-rhel7-zagg-web-1-tbz4p 1/1 Running 1 3d 10.123.168.19 ip-172-31-50-89.ec2.internal <none> oso-rhel7-zagg-web-1-whnsg 1/1 Running 1 3d 10.123.172.12 ip-172-31-48-241.ec2.internal <none> #oc get hostsubnet ip-172-31-48-241.ec2.internal ip-172-31-50-89.ec2.internal NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS ip-172-31-48-241.ec2.internal ip-172-31-48-241.ec2.internal 172.31.48.241 10.123.172.0/23 [] [] NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS ip-172-31-50-89.ec2.internal ip-172-31-50-89.ec2.internal 172.31.50.89 10.123.168.0/23 [] []
Hi there, Would it be possible to get: - kubelet logs - sdn logs - and an "ip route" from inside the problem pod All on the same node, and all around the time of problem pod creation?
any update ?
Bug is the same as the 3.10 bug 1654044 but this one is open for 3.11. This should be resolved by: https://github.com/openshift/openshift-ansible/pull/11409 Currently not available in the latest OpenShift Ansible package openshift-ansible-3.11.98-1.git.0.3cfa7c3.el7.noarch.rpm Workaround: On nodes run the following: ~~~ echo -e "r /etc/cni/net.d/80-openshift-network.conf\nr /etc/origin/openvswitch/conf.db" > /usr/lib/tmpfiles.d/cleanup-cni.conf ~~~
Marking duplicate so QE can verify based on above comments. *** This bug has been marked as a duplicate of bug 1654044 ***