Description of problem: Every request to our broker causes a new connection to be establisted from the catalog. These never get closed, eventually exhausting resources and causing the broker to become unresponsive. tcp6 0 0 10.128.1.168:1338 10.128.0.1:56106 ESTABLISHED tcp6 0 0 10.128.1.168:1338 10.128.0.1:56958 ESTABLISHED tcp6 0 0 10.128.1.168:1338 10.128.0.1:52064 ESTABLISHED tcp6 0 0 10.128.1.168:1338 10.128.0.1:53778 ESTABLISHED tcp6 0 0 10.128.1.168:1338 10.128.0.1:55298 ESTABLISHED tcp6 0 0 10.128.1.168:1338 10.128.0.1:52136 ESTABLISHED tcp6 0 0 10.128.1.168:1338 10.128.0.1:55194 ESTABLISHED tcp6 0 0 10.128.1.168:1338 10.128.0.1:54974 ESTABLISHED tcp6 0 0 10.128.1.168:1338 10.128.0.1:52404 ESTABLISHED tcp6 0 0 10.128.1.168:1338 10.128.0.1:52050 ESTABLISHED netstat -an | grep ESTABLISHED | grep 10.128.0.1 | wc -l 681 Version-Release number of selected component (if applicable): rpm -q openshift-ansible openshift-ansible-3.11.0-0.32.0.git.0.b27b349.el7.noarch oc exec -it -n kube-service-catalog controller-manager-cbrw7 /bin/bash bash-4.2$ service-catalog --version v3.11.0-0.28.0;Upstream:v0.1.30 oc exec -it -n openshift-ansible-service-broker asb-1-977lr -- rpm -q ansible-service-broker ansible-service-broker-1.3.16-1.el7.x86_64 How reproducible: Seems always. The problem is especially evident when a bad APB is provisioned and deprovision requests end up endlessly being requested. Steps to Reproduce: 1. Perform a task that spawns a request from the service catalog to broker. 2. Note the number of open connections 3. Perform another tasks that spawns a request. 4. Not the number of connections increases and so on. Actual results: new connections are endlessly established. Expected results: a single connection is reused or old connections are closed after the request is completed. Additional info:
From the controller manager side I see: [root@controller-manager-d5cqs /]# netstat -an Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.128.0.167:53502 172.30.24.188:1338 ESTABLISHED tcp 0 0 10.128.0.167:53486 172.30.24.188:1338 ESTABLISHED tcp 0 0 10.128.0.167:35434 172.30.0.1:443 ESTABLISHED tcp 0 0 10.128.0.167:53232 172.30.24.188:1338 ESTABLISHED tcp 0 0 10.128.0.167:53358 172.30.24.188:1338 ESTABLISHED tcp 0 0 10.128.0.167:53362 172.30.24.188:1338 ESTABLISHED tcp 0 0 10.128.0.167:53156 172.30.24.188:1338 ESTABLISHED oc describe service --all-namespaces | grep -w 172.30.24.188 -B8 Name: asb Namespace: openshift-ansible-service-broker Labels: app=openshift-ansible-service-broker service=asb Annotations: service.alpha.openshift.io/serving-cert-secret-name=asb-tls service.alpha.openshift.io/serving-cert-signed-by=openshift-service-serving-signer@1536674889 Selector: app=openshift-ansible-service-broker,service=asb Type: ClusterIP IP: 172.30.24.188 Endlessly growing as well: netstat -an | wc -l 123 # netstat -an | wc -l 134 And so on.
controller-manager side for the template service broker without having performed any requests, so this is probably just relists: [root@controller-manager-d5cqs /]# netstat -an | grep -w 172.30.68.208 tcp 0 0 10.128.0.167:48316 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:55248 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:40592 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:44852 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:34402 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:51478 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:59026 172.30.68.208:443 ESTABLISHED oc describe services -n openshift-template-service-broker Name: apiserver Namespace: openshift-template-service-broker Labels: <none> Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"service.alpha.openshift.io/serving-cert-secret-name":"apiserver-serving-cert"},"name":"... service.alpha.openshift.io/serving-cert-secret-name=apiserver-serving-cert service.alpha.openshift.io/serving-cert-signed-by=openshift-service-serving-signer@1536674889 Selector: apiserver=true Type: ClusterIP IP: 172.30.68.208 Port: <unset> 443/TCP TargetPort: 8443/TCP Endpoints: 10.128.0.13:8443 Session Affinity: None Events: <none>
After restarting the template service broker to add the net-tools package to the image I see similar after running a few provisions: Controller Manager: [root@controller-manager-d5cqs /]# netstat -an | grep -w 172.30.68.208 tcp 0 0 10.128.0.167:36818 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:37568 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:36848 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:38504 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:37430 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:38370 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:38510 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:36756 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:37918 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:37672 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:38984 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:37470 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:36806 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:37488 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:38538 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:37618 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:36894 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:38450 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:38502 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:38580 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:38312 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:36996 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:37522 172.30.68.208:443 ESTABLISHED tcp 0 0 10.128.0.167:37194 172.30.68.208:443 ESTABLISHED TSB: bash-4.2$ netstat -an | grep ESTABLISHED tcp 0 0 10.128.1.37:36654 172.30.0.1:443 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:36818 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:37470 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:38984 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:37918 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:37672 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:38538 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:38370 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:38504 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:37430 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:38510 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:36848 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:38312 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:36996 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:36756 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:37522 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:37618 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:37488 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:36894 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:38580 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:38502 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:37568 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:37194 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:38450 ESTABLISHED tcp6 0 0 10.128.1.37:8443 10.128.0.1:36806 ESTABLISHED
The error you'll see from the controller-manager when things become unresponsive is: W0912 13:10:38.790205 1 controller_clusterservicebroker.go:216] ClusterServiceBroker "ansible-service-broker" v566580: Error getting broker catalog: Get https://asb.openshift-ansible-service-broker.svc:1338/osb/v2/catalog: dial tcp 172.30.24.188:1338: connect: cannot assign requested address
hopefully fixed by https://github.com/pmorie/go-open-service-broker-client/pull/131 which would still require a new Service Catalog release.
We tested with this patch and from what we could see it didn't cause the number of connections to ever decrease.
We need to do more investigation into this issue but there is general agreement this is not a stop shop issue but we should have a fix to address this immediately post the 3.11 release.
This issue will be fixed with PR https://github.com/kubernetes-incubator/service-catalog/pull/2337 I've confirmed that connections keep piling up without the patch in said PR: $ sudo nsenter -t $BROKER_PID --net netstat -n | grep ESTABLISHED | wc 19 114 1520 $ svcat sync broker ups-broker Synchronization requested for broker: ups-broker // repeat above command a few times $ sudo nsenter -t $BROKER_PID --net netstat -n | grep ESTABLISHED | wc 23 138 1840 After I apply the patch, the service catalog controller keeps only a single connection open to the broker (no matter how many times I sync the broker): $ sudo nsenter -t $BROKER_PID --net netstat -n | grep ESTABLISHED tcp6 0 0 172.17.0.6:8080 172.17.0.4:59306 ESTABLISHED
Fixed in 3.11 by https://github.com/openshift/service-catalog/pull/29 and included in atomic-enterprise-service-catalog-3.11.29-1
Test steps: The version info: [root@ip-172-18-0-56 ~]# oc exec controller-manager-x8jfr -- service-catalog --version v3.11.36;Upstream:v0.1.35 1, Record the connection num about the service catalog controller and apiserver. [root@ip-172-18-0-56 ~]# oc get svc -n kube-service-catalog NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE apiserver ClusterIP 172.30.3.135 <none> 443/TCP 7h controller-manager ClusterIP 172.30.58.56 <none> 443/TCP 7h [root@ip-172-18-0-56 ~]# netstat -an|grep -w 172.30.3.135 tcp 0 0 10.128.0.1:51940 172.30.3.135:443 ESTABLISHED tcp 0 0 10.128.0.1:55834 172.30.3.135:443 ESTABLISHED tcp 0 0 10.128.0.1:55842 172.30.3.135:443 ESTABLISHED [root@ip-172-18-0-56 ~]# netstat -an|grep -w 172.30.58.56 2, Access the slave node to record the connection number of the ansible-service-broker. [root@ip-172-18-3-151 ~]# ps -elf | grep broker 4 S 1000240+ 1853 1837 0 80 0 - 115988 ep_pol 02:37 ? 00:00:10 asbd -c /etc/ansible-service-broker/config.yaml [root@ip-172-18-3-151 ~]# nsenter -t 1853 --net netstat -n | grep ESTABLISHED tcp 0 0 10.130.0.10:57394 172.30.0.1:443 ESTABLISHED tcp6 0 0 10.130.0.10:1338 10.128.0.1:50876 ESTABLISHED [root@ip-172-18-3-151 ~]# nsenter -t 1853 --net netstat -n | grep ESTABLISHED | wc 2 12 160 3, In the master node, do some sync operation with the ansible-service-broker via the svcat tool. [root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster synced clusterservicebroker (ansible-service-broker) Synchronization requested for broker: ansible-service-broker [root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster synced clusterservicebroker (ansible-service-broker) Synchronization requested for broker: ansible-service-broker [root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster \synced clusterservicebroker (ansible-service-broker) Synchronization requested for broker: ansible-service-broker [root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster \synced clusterservicebroker (ansible-service-broker) Synchronization requested for broker: ansible-service-broker [root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster synced clusterservicebroker (ansible-service-broker) Synchronization requested for broker: ansible-service-broker [root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster \synced clusterservicebroker (ansible-service-broker) Synchronization requested for broker: ansible-service-broker [root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster synced clusterservicebroker (ansible-service-broker) Synchronization requested for broker: ansible-service-broker [root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster synced clusterservicebroker (ansible-service-broker) Synchronization requested for broker: ansible-service-broker [root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster synced clusterservicebroker (ansible-service-broker) Synchronization requested for broker: ansible-service-broker [root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster synced clusterservicebroker (ansible-service-broker) Synchronization requested for broker: ansible-service-broker 4, Do some Provision/Deprovision opertions. 5, Check the connection between service-catalog and the ansible-service-broker. It's still the 2, LGTM. [root@ip-172-18-3-151 ~]# nsenter -t 1853 --net netstat -n | grep ESTABLISHED | wc 2 12 160 And, check the controller-manager and apiserver connection number. No increase. [root@ip-172-18-0-56 ~]# netstat -an|grep -w 172.30.58.56 [root@ip-172-18-0-56 ~]# netstat -an|grep -w 172.30.3.135 tcp 0 0 10.128.0.1:51940 172.30.3.135:443 ESTABLISHED tcp 0 0 10.128.0.1:42680 172.30.3.135:443 ESTABLISHED tcp 0 0 10.128.0.1:42676 172.30.3.135:443 ESTABLISHED [root@ip-172-18-0-56 ~]# lsof -i:42680 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME openshift 14846 root 177u IPv4 1492771 0t0 TCP ip-10-128-0-1.ec2.internal:42680->apiserver.kube-service-catalog.svc.cluster.local:https (ESTABLISHED) [root@ip-172-18-0-56 ~]# lsof -i:42676 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME openshift 14846 root 173u IPv4 1492763 0t0 TCP ip-10-128-0-1.ec2.internal:42676->apiserver.kube-service-catalog.svc.cluster.local:https (ESTABLISHED) [root@ip-172-18-0-56 ~]# lsof -i:51940 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME openshift 14846 root 99u IPv4 895865 0t0 TCP ip-10-128-0-1.ec2.internal:51940->apiserver.kube-service-catalog.svc.cluster.local:https (ESTABLISHED) [root@ip-172-18-0-56 ~]# ps aux | grep 14846 root 14846 12.1 12.2 1795564 917560 ? Ssl Nov01 58:12 openshift start master api --config=/etc/origin/master/master-config.yaml --loglevel=5 One more question, the connection number of the apiserver is 3. Is it as excepted? @Jay PS: I know this issue has nothing with this bug, so verify this bug.
Jay, I created a bug 1645465 for 3.10 version. Correct me if I'm wrong.
> One more question, the connection number of the apiserver is 3. Is it as excepted? I'm not surprised to see some connections open on the apiserver. 14846 looks to be from the master API server, not surprising. Master API will be communicating with our API server on a regular basis including readiness & liveness probes. As long as this number is low and not continuously increasing I think it's good.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3537