1576798 – request to kubernetes api doesn't work on bond interfaces

Bug 1576798 - request to kubernetes api doesn't work on bond interfaces

Summary: request to kubernetes api doesn't work on bond interfaces

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	3.9.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	3.10.0
Assignee:	Ben Bennett
QA Contact:	Meng Bo
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-05-10 12:12 UTC by Vladislav Walek
Modified:	2018-05-16 18:07 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-05-16 18:07:59 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Vladislav Walek 2018-05-10 12:12:19 UTC

Description of problem:

customer has bond interface from 2 network interfaces, bond if with one IP address (XXXX/24). 
The kubernetes api https://172.30.0.1 doesn't work from within the pod (therefore router/registry can't be deployed) - the reason is timeout:

error: couldn't get deployment router: Get https://172.30.0.1:443/api/v1/namespaces/default/replicationcontrollers/router: dial tcp 172.30.0.1:443: i/o timeout

Also reproducible with master api directly (only 1 master in the cluster).
The error occurs on every deploy pod running on every node (tested on master, one nodes).
The api request works directly on the node (not using tun0 interface).
Endpoints are ok - k8s api and master api works from the node directly.
Ping between the tun0 interfaces doesn't work.
After checking the tcpdump, the connection is coming from tun0 - but nothing is on the bond interface (only arp requests).
Disconnected environment from the internet.
No firewall between the nodes.

Version-Release number of selected component (if applicable):
OpenShift Container Platform 3.9 (latest)
Containerized
OVS multitenant/subnet (tested with both)

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
I will attach the logs in private comment.
Possibly - could be wrong bond configuration.

Comment 4 Ben Bennett 2018-05-11 14:21:23 UTC

Weibin: can you try to reproduce this please?

Comment 9 Ben Bennett 2018-05-14 17:48:35 UTC

What are the endpoints for the kubernetes service:
  oc get ep -n default kubernetes

And can they curl one of the endpoints directly?

e.g.:
 $ oc get ep -n default kubernetes
NAME         ENDPOINTS                                         AGE
kubernetes   172.17.0.2:8053,172.17.0.2:8443,172.17.0.2:8053   2h

 $ curl -k https://172.17.0.2:8443
...

Comment 11 Weibin Liang 2018-05-15 14:36:24 UTC

(In reply to Ben Bennett from comment #4)
> Weibin: can you try to reproduce this please?

Ben,

Both Beijing and westford openshift networking QE do not have hardware and setup
to reproduce above issue.

Comment 12 Martin Eggen 2018-05-16 08:26:17 UTC

The linked support case has been resolved by the customer. 

Their hosts had been configured with routing entries in a separate routing table (1) in addition to the main table. The extra routes caused some traffic to be routed through the bond0 interface rather than tun0.

These routes were not shown by normal "ip route list" used f.ex by our sdn-debug script.

When this was discovered the extra routing configuration has been removed from their hosts and deployment using bond interface runs as expected.

Note You need to log in before you can comment on or make changes to this bug.