Bug 1719653
Summary: | Network mode Multitenant - apiserver can not connect to etcd because of netnamespaces | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Robert Bohne <rbohne> | ||||||
Component: | Networking | Assignee: | Ricardo Carrillo Cruz <ricarril> | ||||||
Status: | CLOSED ERRATA | QA Contact: | zhaozhanqi <zzhao> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 4.1.0 | CC: | aos-bugs, danw, ricarril, toshio.oya, weliang | ||||||
Target Milestone: | --- | ||||||||
Target Release: | 4.2.0 | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: |
Cause: Etcd namespace were not put on net id 1
Consequence: API server components were not able to connect to etcd
Fix: Put Etcd namespace on net id 1
Result: API Server and etcd can communicate succesfully
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2019-10-16 06:31:56 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1726679 | ||||||||
Attachments: |
|
Description
Robert Bohne
2019-06-12 09:41:13 UTC
> $ curl -k -I --connect-timeout 1 https://etcd.openshift-etcd.svc:2379/
> curl: (28) Resolving timed out after 1510 milliseconds
That's a connecting-to-your-DNS-server problem, not a connecting-to-etcd problem.
Weibin: do we have QE tests for Multitenant? I thought we were testing that this worked...
And DNS won't come up until after the control-plane pivot. The control plane connects to the host IPs directly. Do you have logs from any control plane components that indicate the issue? cc'ing ricky, who added some multitenant CI. hi (In reply to Dan Winship from comment #1) > > $ curl -k -I --connect-timeout 1 https://etcd.openshift-etcd.svc:2379/ > > curl: (28) Resolving timed out after 1510 milliseconds > > That's a connecting-to-your-DNS-server problem, not a connecting-to-etcd > problem. > > Weibin: do we have QE tests for Multitenant? I thought we were testing that > this worked... hi, Dan Winship, we have test cases for multitenant and subnet plugin installation. I remembered I got message those two plugin almostly deprecated since networkpolicy can cover those two by creating networkpolicy. So the test matrix for multitenant and subnet had been remove too. Confirmed with Zhanqi that QE have executed SDN automation testing using Multitenant at the beginning of v4.0 testing, then dropped those testing due to some miscommunication. From now on, QE will continue to run automation regression testing for Networkpolicy, Multitenant and Subnet. In cluster-network-03-config.yml, configure network mode to be Subnet, Multitenant and NetworkPolicy Tested in 4.1.0-0.ci-2019-07-01-170207: Installation passed whey using: Subnet or NetworkPolicy Installation failed whey using: Multitenant Failed test log is attached. Created attachment 1586436 [details]
Testing log
Ricardo, can you take a look? Most likely change is that we just need to add NetNamespace for openshift-etcd in 004-multitenant.yaml So, yeah, the issue is the netid. I installed a cluster with multitenant and got the apiservers crashlooping due inability to connect to etcd. Editing the etcd netnamspaces netid to 1, killed the apiservers, got them redeployed and they started fine. Will push a patch to add the netnamespace for etcd on netid 1. I pushed https://github.com/openshift/cluster-network-operator/pull/224 for master. Will create a backport for 4.1. Verified this bug on 4.2.0-0.nightly-2019-07-10-062553 [root@preserve-zzhao 0710]# oc get netnamespaces | grep -E '(openshift-apiserver|openshift-etcd) ' openshift-apiserver 1 openshift-etcd 1 [root@preserve-zzhao 0710]# oc get clusternetwork NAME CLUSTER NETWORK SERVICE NETWORK PLUGIN NAME default 10.128.0.0/14 172.30.0.0/16 redhat/openshift-ovs-multitenant [root@preserve-zzhao 0710]# oc get pod -n openshift-apiserver NAME READY STATUS RESTARTS AGE apiserver-8mcfm 1/1 Running 0 16h apiserver-jqnkm 1/1 Running 0 16h apiserver-wff76 1/1 Running 0 16h Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |