1659376 – [Next_gen_installer]apiserver pod cannot be running in multitenant plugin mode

Bug 1659376 - [Next_gen_installer]apiserver pod cannot be running in multitenant plugin mode

Summary: [Next_gen_installer]apiserver pod cannot be running in multitenant plugin mode

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.1.0
Hardware:	All
OS:	All
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.1.0
Assignee:	Casey Callendrello
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-12-14 08:19 UTC by zhaozhanqi
Modified:	2019-06-04 10:41 UTC (History)
CC List:	6 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2019-06-04 10:41:16 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2019:0758	0	None	None	None	2019-06-04 10:41:21 UTC

Description zhaozhanqi 2018-12-14 08:19:33 UTC

Description of problem:
Setup the env with multitenant mode. the apiserver pod cannot be running due to it cannot be accessed the etcd service. the reason is they are in different namespaces and the netnamespacesid is not 0

Version-Release number of selected component (if applicable):
oc v4.0.0-alpha.0+793dcb0-773
kubernetes v1.11.0+793dcb0
features: Basic-Auth GSSAPI Kerberos SPNEGO

How reproducible:
always

Steps to Reproduce:
1. setup the env using multitenant mode by edit the mode to 'Multitenant' in file cluster-network-02-config.yml
2. check the apiservice pod
3.

Actual results:

the api server pod cannot be running

 oc logs apiserver-q99fx -n openshift-apiserver
I1214 08:02:27.474616       1 plugins.go:158] Loaded 1 mutating admission controller(s) successfully in the following order: NamespaceLifecycle.
I1214 08:02:27.474876       1 plugins.go:158] Loaded 1 mutating admission controller(s) successfully in the following order: openshift.io/BuildConfigSecretInjector.
I1214 08:02:27.474889       1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: openshift.io/BuildConfigSecretInjector.
I1214 08:02:27.475145       1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: BuildByStrategy.
I1214 08:02:27.475320       1 plugins.go:158] Loaded 1 mutating admission controller(s) successfully in the following order: openshift.io/ImageLimitRange.
I1214 08:02:27.475335       1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: openshift.io/ImageLimitRange.
I1214 08:02:27.475478       1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: OwnerReferencesPermissionEnforcement.
I1214 08:02:27.475796       1 plugins.go:158] Loaded 1 mutating admission controller(s) successfully in the following order: MutatingAdmissionWebhook.
I1214 08:02:27.476063       1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: ValidatingAdmissionWebhook.
I1214 08:02:27.476389       1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: ResourceQuota.
I1214 08:02:27.476683       1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: openshift.io/ClusterResourceQuota.
I1214 08:02:27.483191       1 balancer_conn_wrappers.go:190] ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc421bd4660
I1214 08:02:27.483225       1 resolver_conn_wrapper.go:64] dialing to target with scheme: ""
I1214 08:02:27.483237       1 resolver_conn_wrapper.go:70] could not get resolver for scheme: ""
I1214 08:02:27.483296       1 balancer_v1_wrapper.go:91] balancerWrapper: is pickfirst: false
I1214 08:02:27.483323       1 balancer_v1_wrapper.go:116] balancerWrapper: got update addr from Notify: [{etcd.kube-system.svc:2379 <nil>}]
I1214 08:02:27.483344       1 balancer_conn_wrappers.go:168] ccBalancerWrapper: new subconn: [{etcd.kube-system.svc:2379 0  <nil>}]
I1214 08:02:27.483411       1 balancer_v1_wrapper.go:224] balancerWrapper: handle subconn state change: 0xc420b85c60, CONNECTING
I1214 08:02:27.483470       1 balancer_conn_wrappers.go:190] ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc421bd4660
W1214 08:02:37.483489       1 clientconn.go:944] grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp: operation was canceled"; Reconnecting to {etcd.kube-system.svc:2379 0  <nil>}
W1214 08:02:37.483652       1 clientconn.go:696] Failed to dial etcd.kube-system.svc:2379: grpc: the connection is closing; please retry.
F1214 08:02:37.483468       1 storage_decorator.go:57] Unable to create storage backend: config (&{etcd3 openshift.io [https://etcd.kube-system.svc:2379] /var/run/secrets/etcd-client/tls.key /var/run/secrets/etcd-client/tls.crt /var/run/configmaps/etcd-serving-ca/ca-bundle.crt true true 0 {0xc4211a7a70 0xc4211a7b00} <nil> 5m0s 1m0s}), err (context deadline exceeded)

Expected results:

the api server pod can be running.

Additional info:

Comment 1 Casey Callendrello 2018-12-17 18:33:15 UTC

Good catch!

Looks like we might need to set up some default joins. Either this code will need to be in the operator or the controller.

Jacob, can you take a look at this? Dan Winship can help figure out how to fix it.

Comment 2 Casey Callendrello 2019-01-03 22:30:53 UTC

Created SDN-284 to track this.

Comment 5 Ben Bennett 2019-01-11 16:55:11 UTC

This is not a beta blocker since the default network plugin for 4.0 is the network policy one, and that one works.  The guidance is "if the problem does not get exposed by the high-touch beta lab manual (https://docs.google.com/document/d/1wcOglOXsfHjcXearZhxnWnyeCJ_y1c3qC4_ZE18b00c/edit#) then it is a bug, but not a beta blocker".

Comment 6 Casey Callendrello 2019-01-30 15:56:34 UTC

This is fixed in https://github.com/openshift/cluster-network-operator/pull/82

Comment 7 Casey Callendrello 2019-02-18 13:45:12 UTC

PR is merged, ready for QA.

Comment 8 zhaozhanqi 2019-02-19 11:04:38 UTC

this issue had been fixed on 4.0.0-0.nightly-2019-02-18-224151
this bug can be verified.

Comment 9 zhaozhanqi 2019-02-19 11:05:56 UTC

Tested this issue on 4.0.0-0.nightly-2019-02-18-224151,it had been fixed.

this bug can be verified.

Comment 11 zhaozhanqi 2019-02-21 09:55:54 UTC

Verified this bug according to comment#8

Comment 14 errata-xmlrpc 2019-06-04 10:41:16 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758

Note You need to log in before you can comment on or make changes to this bug.