Hide Forgot
Description of problem: Setup the env with multitenant mode. the apiserver pod cannot be running due to it cannot be accessed the etcd service. the reason is they are in different namespaces and the netnamespacesid is not 0 Version-Release number of selected component (if applicable): oc v4.0.0-alpha.0+793dcb0-773 kubernetes v1.11.0+793dcb0 features: Basic-Auth GSSAPI Kerberos SPNEGO How reproducible: always Steps to Reproduce: 1. setup the env using multitenant mode by edit the mode to 'Multitenant' in file cluster-network-02-config.yml 2. check the apiservice pod 3. Actual results: the api server pod cannot be running oc logs apiserver-q99fx -n openshift-apiserver I1214 08:02:27.474616 1 plugins.go:158] Loaded 1 mutating admission controller(s) successfully in the following order: NamespaceLifecycle. I1214 08:02:27.474876 1 plugins.go:158] Loaded 1 mutating admission controller(s) successfully in the following order: openshift.io/BuildConfigSecretInjector. I1214 08:02:27.474889 1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: openshift.io/BuildConfigSecretInjector. I1214 08:02:27.475145 1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: BuildByStrategy. I1214 08:02:27.475320 1 plugins.go:158] Loaded 1 mutating admission controller(s) successfully in the following order: openshift.io/ImageLimitRange. I1214 08:02:27.475335 1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: openshift.io/ImageLimitRange. I1214 08:02:27.475478 1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: OwnerReferencesPermissionEnforcement. I1214 08:02:27.475796 1 plugins.go:158] Loaded 1 mutating admission controller(s) successfully in the following order: MutatingAdmissionWebhook. I1214 08:02:27.476063 1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: ValidatingAdmissionWebhook. I1214 08:02:27.476389 1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: ResourceQuota. I1214 08:02:27.476683 1 plugins.go:161] Loaded 1 validating admission controller(s) successfully in the following order: openshift.io/ClusterResourceQuota. I1214 08:02:27.483191 1 balancer_conn_wrappers.go:190] ccBalancerWrapper: updating state and picker called by balancer: IDLE, 0xc421bd4660 I1214 08:02:27.483225 1 resolver_conn_wrapper.go:64] dialing to target with scheme: "" I1214 08:02:27.483237 1 resolver_conn_wrapper.go:70] could not get resolver for scheme: "" I1214 08:02:27.483296 1 balancer_v1_wrapper.go:91] balancerWrapper: is pickfirst: false I1214 08:02:27.483323 1 balancer_v1_wrapper.go:116] balancerWrapper: got update addr from Notify: [{etcd.kube-system.svc:2379 <nil>}] I1214 08:02:27.483344 1 balancer_conn_wrappers.go:168] ccBalancerWrapper: new subconn: [{etcd.kube-system.svc:2379 0 <nil>}] I1214 08:02:27.483411 1 balancer_v1_wrapper.go:224] balancerWrapper: handle subconn state change: 0xc420b85c60, CONNECTING I1214 08:02:27.483470 1 balancer_conn_wrappers.go:190] ccBalancerWrapper: updating state and picker called by balancer: CONNECTING, 0xc421bd4660 W1214 08:02:37.483489 1 clientconn.go:944] grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: Error while dialing dial tcp: operation was canceled"; Reconnecting to {etcd.kube-system.svc:2379 0 <nil>} W1214 08:02:37.483652 1 clientconn.go:696] Failed to dial etcd.kube-system.svc:2379: grpc: the connection is closing; please retry. F1214 08:02:37.483468 1 storage_decorator.go:57] Unable to create storage backend: config (&{etcd3 openshift.io [https://etcd.kube-system.svc:2379] /var/run/secrets/etcd-client/tls.key /var/run/secrets/etcd-client/tls.crt /var/run/configmaps/etcd-serving-ca/ca-bundle.crt true true 0 {0xc4211a7a70 0xc4211a7b00} <nil> 5m0s 1m0s}), err (context deadline exceeded) Expected results: the api server pod can be running. Additional info:
Good catch! Looks like we might need to set up some default joins. Either this code will need to be in the operator or the controller. Jacob, can you take a look at this? Dan Winship can help figure out how to fix it.
Created SDN-284 to track this.
This is not a beta blocker since the default network plugin for 4.0 is the network policy one, and that one works. The guidance is "if the problem does not get exposed by the high-touch beta lab manual (https://docs.google.com/document/d/1wcOglOXsfHjcXearZhxnWnyeCJ_y1c3qC4_ZE18b00c/edit#) then it is a bug, but not a beta blocker".
This is fixed in https://github.com/openshift/cluster-network-operator/pull/82
PR is merged, ready for QA.
this issue had been fixed on 4.0.0-0.nightly-2019-02-18-224151 this bug can be verified.
Tested this issue on 4.0.0-0.nightly-2019-02-18-224151,it had been fixed. this bug can be verified.
Verified this bug according to comment#8
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758