Bug 1680214

Summary: Service catalog cannot access brokers in the cluster with Multitenant
Product: OpenShift Container Platform Reporter: Jian Zhang <jiazha>
Component: NetworkingAssignee: Casey Callendrello <cdc>
Status: CLOSED ERRATA QA Contact: Jian Zhang <jiazha>
Severity: high Docs Contact:
Priority: high    
Version: 4.1.0CC: aos-bugs, chezhang, dyan, jfan, zitang
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-04 10:44:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jian Zhang 2019-02-23 04:43:30 UTC
Description of problem:
The brokers' net namespaces are different from the Service Catalogs. This lead to the Service Catalog cannot connect them. Errors:
  Warning  ErrorFetchingCatalog         61m (x57 over 157m)    service-catalog-controller-manager  Error getting broker catalog: Get https://asb.openshift-ansible-service-broker.svc:1338/osb/v2/catalog: dial tcp 172.30.197.247:1338: i/o timeout

Version-Release number of selected component (if applicable):
[jzhang@dhcp-140-18 test]$ oc get pods -n openshift-sdn -o yaml |grep image
      image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:264328fd9d294345de7c730ef74d8bd8aee503bd934aa77450f692b9ac89b6f4

How reproducible:
always

Steps to Reproduce:
1. Install the cluster with Multitenat.
2. Install the Service catalog. Its namespaces:
kube-service-catalog                          
kube-service-catalog-controller-manager

3. Install the Ansible-Service-Broker and Template-Service-Broker. Their namespaces:
openshift-ansible-service-broker 
openshift-template-service-broker

4, Check their netnamespaces:
[jzhang@dhcp-140-18 test]$ oc get netnamespaces
NAME                                          NETID
kube-public                                   4402285
kube-service-catalog                          1
kube-service-catalog-controller-manager       1
kube-system                                   1
openshift                                     13051904
openshift-ansible-service-broker              16557791
openshift-apiserver                           1
openshift-template-service-broker             1641502

5, Check the status of the ClusterServiceBroker.

Actual results:
[jzhang@dhcp-140-18 test]$ oc get clusterservicebroker
NAME                      URL                                                                                         STATUS                 AGE
ansible-service-broker    https://asb.openshift-ansible-service-broker.svc:1338/osb/                                  ErrorFetchingCatalog   21h
template-service-broker   https://apiserver.openshift-template-service-broker.svc:443/brokers/template.openshift.io   ErrorFetchingCatalog   21h
[jzhang@dhcp-140-18 test]$ oc describe clusterservicebroker ansible-service-broker
...
  Warning  ErrorFetchingCatalog         3m38s (x25 over 37m)   service-catalog-controller-manager  Error getting broker catalog: Get https://asb.openshift-ansible-service-broker.svc:1338/osb/v2/catalog: dial tcp 172.30.197.247:1338: i/o timeout

Expected results:
kube-service-catalog-controller-manager, openshift-ansible-service-broker, openshift-template-service-broker should have the same netnamespaces.
 
Tne service-catalog-controller-manager can connect brokers successfully.

Additional info:
Workaround:
1) oc adm pod-network join-projects openshift-template-service-broker --to kube-service-catalog-controller-manager
2) $ oc adm pod-network join-projects openshift-ansible-service-broker --to kube-service-catalog-controller-manager

[jzhang@dhcp-140-18 test]$ oc get netnamespaces
NAME                                          NETID
kube-public                                   4402285
kube-service-catalog                          1
kube-service-catalog-controller-manager       1
kube-system                                   1
openshift                                     13051904
openshift-ansible-service-broker              1
openshift-apiserver                           1
openshift-template-service-broker             1


[jzhang@dhcp-140-18 test]$ oc get clusterservicebroker
NAME                      URL                                                                                         STATUS   AGE
ansible-service-broker    https://asb.openshift-ansible-service-broker.svc:1338/osb/                                  Ready    21h
template-service-broker   https://apiserver.openshift-template-service-broker.svc:443/brokers/template.openshift.io   Ready    21h

Comment 1 Jian Zhang 2019-02-23 04:48:08 UTC
@Jay,

Based on my understanding, kube-service-catalog and kube-service-catalog-controller-manager can have different netid.
But, the kube-service-catalog must have the same netid with kube-system, the kube-service-catalog-controller-manager must have the same netid with brokers.
Correct me if I'm wrong.

Comment 2 Jay Boyd 2019-02-24 02:47:41 UTC
@Jian this sounds right.  Certainly the pods in kube-service-catalog-controller-manager must be able to communicate with all service brokers.

Comment 3 Casey Callendrello 2019-02-25 12:25:11 UTC
Understood. Can you outline exactly which namespaces need access to which? I can update the netnamespaces.

Comment 4 Jian Zhang 2019-02-26 08:15:30 UTC
Casey,

The broker namespaces including: openshift-template-service-broker, openshift-ansible-service-broker 
Their netids should be as the same as the netid of the "kube-service-catalog-controller-manager".

If possible, we can set all of them to "1", as the same as the netid of "kube-system".

Comment 5 Jay Boyd 2019-02-26 13:33:01 UTC
Additionally the pods in kube-service-catalog must be able to communicate with etcd (kube-system).

Comment 6 Casey Callendrello 2019-02-26 17:59:22 UTC
PR https://github.com/openshift/cluster-network-operator/pull/108 filed

Comment 8 Jian Zhang 2019-03-01 07:27:31 UTC
The openshift-network-operator image info that the fixed PR merged in. As below:
             io.openshift.build.commit.id=15204e63ace4000afa531193af95894371d4cfe0
             io.openshift.build.commit.url=https://github.com/openshift/cluster-network-operator/commit/15204e63ace4000afa531193af95894371d4cfe0
             io.openshift.build.source-location=https://github.com/openshift/cluster-network-operator

[jzhang@dhcp-140-18 multitenant]$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.nightly-2019-02-28-054829   True        False         3h49m   Cluster version is 4.0.0-0.nightly-2019-02-28-054829

Check their netnamespaces, all of them is "1", looks good.

[jzhang@dhcp-140-18 multitenant]$ oc get netnamespaces
NAME                                          NETID
kube-service-catalog                          1
kube-service-catalog-controller-manager       1
kube-system                                   1
openshift-ansible-service-broker              1
...
openshift-template-service-broker             1


Install the ServiceCatalog and fake broker in corresponding namespaces.
[jzhang@dhcp-140-18 multitenant]$ oc get pods -n kube-service-catalog 
NAME              READY   STATUS    RESTARTS   AGE
apiserver-752j7   1/1     Running   0          21m
apiserver-gtz24   1/1     Running   0          21m
apiserver-nwp6h   1/1     Running   0          21m
[jzhang@dhcp-140-18 multitenant]$ oc get pods -n kube-service-catalog-controller-manager 
NAME                       READY   STATUS    RESTARTS   AGE
controller-manager-d5x5m   1/1     Running   0          21m
controller-manager-fq7b4   1/1     Running   0          21m
controller-manager-pbr9c   1/1     Running   0          21m
[jzhang@dhcp-140-18 multitenant]$ oc get clusterservicebroker
NAME         URL                                                                    STATUS   AGE
ups-broker   http://ups-broker.openshift-ansible-service-broker.svc.cluster.local   Ready    15m
[jzhang@dhcp-140-18 multitenant]$ oc get pods -n openshift-ansible-service-broker 
NAME                          READY   STATUS    RESTARTS   AGE
ups-broker-779c4fd54c-cj2tq   1/1     Running   0          15m

LGTM, verify it. Thanks!

Comment 11 errata-xmlrpc 2019-06-04 10:44:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0758