Bug 1679511

Summary: The apiserver pods of Service Catalog crashed in the cluster with Multitenant
Product: OpenShift Container Platform Reporter: Jian Zhang <jiazha>
Component: NetworkingAssignee: Casey Callendrello <cdc>
Status: CLOSED ERRATA QA Contact: Jian Zhang <jiazha>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.1.0CC: aos-bugs, chezhang, danw, dyan, jfan, jiazha, sponnaga, zitang
Target Milestone: ---Keywords: Reopened
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:44:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 4 Jay Boyd 2019-02-21 21:51:53 UTC
Please reopen if you can reproduce

Comment 5 Jian Zhang 2019-02-22 04:24:59 UTC
Jay,

Thanks for your information! 

> I suspect a networking issue.

Yes, you're right. This cluster used the Multitenant.

For now, the apiserver need to access the etcd running in the kube-system namespaces.
But, the NETID of the kube-system is 1, and the NETID of the kube-service-catalog is 8747814,
So, the pods of apiserver cannot access the etcd.

$ curl -k https://etcd.kube-system.svc.cluster.local:2379 

[jzhang@dhcp-140-18 test]$ oc get netnamespaces
NAME                                          NETID
bmengp1                                       1838761
default                                       0
dfp5e                                         1306495
hpa                                           6315676
kube-public                                   4402285
kube-service-catalog                          8747814
kube-service-catalog-controller-manager       15176173
kube-system                                   1

So, the kube-service-catalog/kube-service-catalog-controller-manager/kube-system should be the same NETID.

Workaround:
$ oc adm pod-network join-projects kube-service-catalog --to kube-system

[jzhang@dhcp-140-18 test]$ oc get pods -n kube-service-catalog 
NAME              READY   STATUS    RESTARTS   AGE
apiserver-89pg8   1/1     Running   0          10m
apiserver-grcb2   1/1     Running   0          10m
apiserver-rf6b4   1/1     Running   0          10m

Transfer this bug to Network component.

Comment 6 Meng Bo 2019-02-22 08:37:38 UTC
Looks like we can pre-create the netnamespace for kube-service-catalog like other projects.

But the project kube-service-catalog will not be created automatically after the cluster setup. It will be created with some more user side operations. Not sure if the netid should be handle in network operator or service catalog side.

Comment 7 Casey Callendrello 2019-02-22 12:52:50 UTC
I'll fix this in the operator.

Comment 8 Casey Callendrello 2019-02-22 15:30:41 UTC
Filed https://github.com/openshift/cluster-network-operator/pull/106

Comment 9 Dan Winship 2019-02-22 15:39:09 UTC
So has it been confirmed that this is the only remaining problem with ovs-multitenant? Everything else works after this?

Comment 11 Jian Zhang 2019-02-26 07:42:21 UTC
LGTM, verify it. Detail steps as below:

SDN image info:
[jzhang@dhcp-140-18 ocp-26]$ oc get pods -n openshift-sdn -o yaml |grep image
      image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cfac5973deabd649304ce8d741c8989bd2cd92e2eb019a1cc53490f8db8e8fde

1, create the apiserver of Service Catalog:
[jzhang@dhcp-140-18 ocp-26]$ oc create -f api.cr.yaml 
servicecatalogapiserver.operator.openshift.io/cluster created
[jzhang@dhcp-140-18 ocp-26]$ cat api.cr.yaml 
apiVersion: operator.openshift.io/v1
kind: ServiceCatalogAPIServer
metadata:
  name: cluster
spec:
  logLevel: Debug
  managementState: Managed 

2, Check the netid of the "kube-service-catalog". The same as the netid of "kube-system".
[jzhang@dhcp-140-18 ocp-26]$ oc get netnamespaces
NAME                                          NETID
default                                       0
kakatest                                      14056789
kube-public                                   2458915
kube-service-catalog                          1
kube-system                                   1

3, Check the apiserver pods:
[jzhang@dhcp-140-18 ocp-26]$ oc get pods -n kube-service-catalog
NAME              READY   STATUS    RESTARTS   AGE
apiserver-4xt6t   1/1     Running   0          7m18s
apiserver-bbd59   1/1     Running   0          7m18s
apiserver-cb675   1/1     Running   0          7m18s

Further test:
4, Install the controller-manager of the Service Catalog:
[jzhang@dhcp-140-18 ocp-26]$ oc create -f controller.cr.yaml 
servicecatalogcontrollermanager.operator.openshift.io/cluster created
[jzhang@dhcp-140-18 ocp-26]$ cat controller.cr.yaml 
apiVersion: operator.openshift.io/v1
kind: ServiceCatalogControllerManager
metadata:
  name: cluster
spec:
  managementState: Managed
  logLevel: Debug

[jzhang@dhcp-140-18 ocp-26]$ oc get pods -n kube-service-catalog-controller-manager
NAME                       READY   STATUS    RESTARTS   AGE
controller-manager-6qs8l   1/1     Running   0          7m10s
controller-manager-jqvsx   1/1     Running   0          7m10s
controller-manager-mpf4t   1/1     Running   0          7m10s

[jzhang@dhcp-140-18 ocp-26]$ oc get netnamespaces |grep catalog
kube-service-catalog                          1
kube-service-catalog-controller-manager       2706770


5, Create a broker in project called "test". Add the "test" to "kube-service-catalog-controller-manager"
[jzhang@dhcp-140-18 ocp-26]$ oc adm pod-network join-projects test --to kube-service-catalog-controller-manager

[jzhang@dhcp-140-18 ocp-26]$ oc get netnamespaces
NAME                                          NETID
default                                       0
kakatest                                      14056789
kube-public                                   2458915
kube-service-catalog                          1
kube-service-catalog-controller-manager       2706770
...
test                                          2706770

[jzhang@dhcp-140-18 ocp-26]$ oc get clusterservicebroker
NAME         URL                                        STATUS   AGE
ups-broker   http://ups-broker.test.svc.cluster.local   Ready    5m
[jzhang@dhcp-140-18 ocp-26]$ oc get pods -n test
NAME                          READY   STATUS    RESTARTS   AGE
ups-broker-779c4fd54c-vvgl7   1/1     Running   0          3m49s

Comment 12 Jay Boyd 2019-02-26 13:47:54 UTC
*** Bug 1679510 has been marked as a duplicate of this bug. ***

Comment 15 errata-xmlrpc 2019-06-04 10:44:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758