Bug 1576798 - request to kubernetes api doesn't work on bond interfaces
Summary: request to kubernetes api doesn't work on bond interfaces
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: ---
: 3.10.0
Assignee: Ben Bennett
QA Contact: Meng Bo
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-10 12:12 UTC by Vladislav Walek
Modified: 2018-05-16 18:07 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-16 18:07:59 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Vladislav Walek 2018-05-10 12:12:19 UTC
Description of problem:

customer has bond interface from 2 network interfaces, bond if with one IP address (XXXX/24). 
The kubernetes api https://172.30.0.1 doesn't work from within the pod (therefore router/registry can't be deployed) - the reason is timeout:

error: couldn't get deployment router: Get https://172.30.0.1:443/api/v1/namespaces/default/replicationcontrollers/router: dial tcp 172.30.0.1:443: i/o timeout

Also reproducible with master api directly (only 1 master in the cluster).
The error occurs on every deploy pod running on every node (tested on master, one nodes).
The api request works directly on the node (not using tun0 interface).
Endpoints are ok - k8s api and master api works from the node directly.
Ping between the tun0 interfaces doesn't work.
After checking the tcpdump, the connection is coming from tun0 - but nothing is on the bond interface (only arp requests).
Disconnected environment from the internet.
No firewall between the nodes.

Version-Release number of selected component (if applicable):
OpenShift Container Platform 3.9 (latest)
Containerized
OVS multitenant/subnet (tested with both)

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
I will attach the logs in private comment.
Possibly - could be wrong bond configuration.

Comment 4 Ben Bennett 2018-05-11 14:21:23 UTC
Weibin: can you try to reproduce this please?

Comment 9 Ben Bennett 2018-05-14 17:48:35 UTC
What are the endpoints for the kubernetes service:
  oc get ep -n default kubernetes

And can they curl one of the endpoints directly?

e.g.:
 $ oc get ep -n default kubernetes
NAME         ENDPOINTS                                         AGE
kubernetes   172.17.0.2:8053,172.17.0.2:8443,172.17.0.2:8053   2h

 $ curl -k https://172.17.0.2:8443
...

Comment 11 Weibin Liang 2018-05-15 14:36:24 UTC
(In reply to Ben Bennett from comment #4)
> Weibin: can you try to reproduce this please?

Ben,

Both Beijing and westford openshift networking QE do not have hardware and setup
to reproduce above issue.

Comment 12 Martin Eggen 2018-05-16 08:26:17 UTC
The linked support case has been resolved by the customer. 

Their hosts had been configured with routing entries in a separate routing table (1) in addition to the main table. The extra routes caused some traffic to be routed through the bond0 interface rather than tun0.

These routes were not shown by normal "ip route list" used f.ex by our sdn-debug script.

When this was discovered the extra routing configuration has been removed from their hosts and deployment using bond interface runs as expected.


Note You need to log in before you can comment on or make changes to this bug.