Bug 1628235

Summary: Service Catalog establishes a new connection for every request and never closes them.
Product: OpenShift Container Platform Reporter: Jason Montleon <jmontleo>
Component: Service CatalogAssignee: Jay Boyd <jaboyd>
Status: CLOSED ERRATA QA Contact: Jian Zhang <jiazha>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 3.11.0CC: awestbro, chezhang, jaboyd, jmatthew, mluksa, spadgett, travi, zitang
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-11-20 03:10:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1627473    

Description Jason Montleon 2018-09-12 13:53:05 UTC
Description of problem:
Every request to our broker causes a new connection to be establisted from the catalog. These never get closed, eventually exhausting resources and causing the broker to become unresponsive.

tcp6       0      0 10.128.1.168:1338       10.128.0.1:56106        ESTABLISHED
tcp6       0      0 10.128.1.168:1338       10.128.0.1:56958        ESTABLISHED
tcp6       0      0 10.128.1.168:1338       10.128.0.1:52064        ESTABLISHED
tcp6       0      0 10.128.1.168:1338       10.128.0.1:53778        ESTABLISHED
tcp6       0      0 10.128.1.168:1338       10.128.0.1:55298        ESTABLISHED
tcp6       0      0 10.128.1.168:1338       10.128.0.1:52136        ESTABLISHED
tcp6       0      0 10.128.1.168:1338       10.128.0.1:55194        ESTABLISHED
tcp6       0      0 10.128.1.168:1338       10.128.0.1:54974        ESTABLISHED
tcp6       0      0 10.128.1.168:1338       10.128.0.1:52404        ESTABLISHED
tcp6       0      0 10.128.1.168:1338       10.128.0.1:52050        ESTABLISHED

netstat -an | grep ESTABLISHED | grep 10.128.0.1 | wc -l
681

Version-Release number of selected component (if applicable):
rpm -q openshift-ansible 
openshift-ansible-3.11.0-0.32.0.git.0.b27b349.el7.noarch

oc exec -it -n kube-service-catalog controller-manager-cbrw7 /bin/bash
bash-4.2$ service-catalog --version
v3.11.0-0.28.0;Upstream:v0.1.30

oc exec -it -n openshift-ansible-service-broker    asb-1-977lr -- rpm -q ansible-service-broker
ansible-service-broker-1.3.16-1.el7.x86_64

How reproducible:
Seems always. The problem is especially evident when a bad APB is provisioned and deprovision requests end up endlessly being requested.

Steps to Reproduce:
1. Perform a task that spawns a request from the service catalog to broker.
2. Note the number of open connections
3. Perform another tasks that spawns a request.
4. Not the number of connections increases and so on.

Actual results:
new connections are endlessly established.

Expected results:
a single connection is reused or old connections are closed after the request is completed.

Additional info:

Comment 1 Jason Montleon 2018-09-12 14:19:49 UTC
From the controller manager side I see:

[root@controller-manager-d5cqs /]# netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State      
tcp        0      0 10.128.0.167:53502      172.30.24.188:1338      ESTABLISHED
tcp        0      0 10.128.0.167:53486      172.30.24.188:1338      ESTABLISHED
tcp        0      0 10.128.0.167:35434      172.30.0.1:443          ESTABLISHED
tcp        0      0 10.128.0.167:53232      172.30.24.188:1338      ESTABLISHED
tcp        0      0 10.128.0.167:53358      172.30.24.188:1338      ESTABLISHED
tcp        0      0 10.128.0.167:53362      172.30.24.188:1338      ESTABLISHED
tcp        0      0 10.128.0.167:53156      172.30.24.188:1338      ESTABLISHED


oc describe service --all-namespaces | grep -w 172.30.24.188 -B8
Name:              asb
Namespace:         openshift-ansible-service-broker
Labels:            app=openshift-ansible-service-broker
                   service=asb
Annotations:       service.alpha.openshift.io/serving-cert-secret-name=asb-tls
                   service.alpha.openshift.io/serving-cert-signed-by=openshift-service-serving-signer@1536674889
Selector:          app=openshift-ansible-service-broker,service=asb
Type:              ClusterIP
IP:                172.30.24.188


Endlessly growing as well:
netstat -an | wc -l
123

# netstat -an | wc -l
134

And so on.

Comment 2 Jason Montleon 2018-09-12 15:06:58 UTC
controller-manager side for the template service broker without having performed any requests, so this is probably just relists:

[root@controller-manager-d5cqs /]# netstat -an | grep -w 172.30.68.208
tcp        0      0 10.128.0.167:48316      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:55248      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:40592      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:44852      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:34402      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:51478      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:59026      172.30.68.208:443       ESTABLISHED

oc describe services -n openshift-template-service-broker
Name:              apiserver
Namespace:         openshift-template-service-broker
Labels:            <none>
Annotations:       kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"service.alpha.openshift.io/serving-cert-secret-name":"apiserver-serving-cert"},"name":"...
                   service.alpha.openshift.io/serving-cert-secret-name=apiserver-serving-cert
                   service.alpha.openshift.io/serving-cert-signed-by=openshift-service-serving-signer@1536674889
Selector:          apiserver=true
Type:              ClusterIP
IP:                172.30.68.208
Port:              <unset>  443/TCP
TargetPort:        8443/TCP
Endpoints:         10.128.0.13:8443
Session Affinity:  None
Events:            <none>

Comment 3 Jason Montleon 2018-09-12 15:22:32 UTC
After restarting the template service broker to add the net-tools package to the image I see similar after running a few provisions:

Controller Manager:
[root@controller-manager-d5cqs /]# netstat -an | grep -w 172.30.68.208
tcp        0      0 10.128.0.167:36818      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:37568      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:36848      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:38504      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:37430      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:38370      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:38510      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:36756      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:37918      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:37672      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:38984      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:37470      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:36806      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:37488      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:38538      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:37618      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:36894      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:38450      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:38502      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:38580      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:38312      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:36996      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:37522      172.30.68.208:443       ESTABLISHED
tcp        0      0 10.128.0.167:37194      172.30.68.208:443       ESTABLISHED


TSB: 
bash-4.2$ netstat -an | grep ESTABLISHED
tcp        0      0 10.128.1.37:36654       172.30.0.1:443          ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:36818        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:37470        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:38984        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:37918        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:37672        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:38538        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:38370        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:38504        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:37430        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:38510        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:36848        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:38312        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:36996        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:36756        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:37522        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:37618        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:37488        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:36894        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:38580        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:38502        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:37568        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:37194        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:38450        ESTABLISHED
tcp6       0      0 10.128.1.37:8443        10.128.0.1:36806        ESTABLISHED

Comment 4 Jason Montleon 2018-09-12 15:54:39 UTC
The error you'll see from the controller-manager when things become unresponsive is:
W0912 13:10:38.790205       1 controller_clusterservicebroker.go:216] ClusterServiceBroker "ansible-service-broker" v566580: Error getting broker catalog: Get https://asb.openshift-ansible-service-broker.svc:1338/osb/v2/catalog: dial tcp 172.30.24.188:1338: connect: cannot assign requested address

Comment 6 Jay Boyd 2018-09-12 19:15:40 UTC
hopefully fixed by https://github.com/pmorie/go-open-service-broker-client/pull/131
which would still require a new Service Catalog release.

Comment 7 Jason Montleon 2018-09-12 20:04:09 UTC
We tested with this patch and from what we could see it didn't cause the number of connections to ever decrease.

Comment 8 Jay Boyd 2018-09-12 21:42:25 UTC
We need to do more investigation into this issue but there is general agreement this is not a stop shop issue but we should have a fix to address this immediately post the 3.11 release.

Comment 9 Marko Luksa 2018-09-13 11:32:33 UTC
This issue will be fixed with PR https://github.com/kubernetes-incubator/service-catalog/pull/2337

I've confirmed that connections keep piling up without the patch in said PR:

$ sudo nsenter -t $BROKER_PID --net netstat -n | grep ESTABLISHED | wc
     19     114    1520

$ svcat sync broker ups-broker
Synchronization requested for broker: ups-broker

// repeat above command a few times

$ sudo nsenter -t $BROKER_PID --net netstat -n | grep ESTABLISHED | wc
     23     138    1840


After I apply the patch, the service catalog controller keeps only a single connection open to the broker (no matter how many times I sync the broker):

$ sudo nsenter -t $BROKER_PID --net netstat -n | grep ESTABLISHED
tcp6       0      0 172.17.0.6:8080         172.17.0.4:59306        ESTABLISHED

Comment 14 Jay Boyd 2018-10-22 19:12:18 UTC
Fixed in 3.11 by https://github.com/openshift/service-catalog/pull/29 and included in atomic-enterprise-service-catalog-3.11.29-1

Comment 16 Jian Zhang 2018-11-02 09:28:19 UTC
Test steps:
The version info:
[root@ip-172-18-0-56 ~]# oc exec controller-manager-x8jfr -- service-catalog --version
v3.11.36;Upstream:v0.1.35

1, Record the connection num about the service catalog controller and apiserver.

[root@ip-172-18-0-56 ~]# oc get svc -n kube-service-catalog
NAME                 TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
apiserver            ClusterIP   172.30.3.135   <none>        443/TCP   7h
controller-manager   ClusterIP   172.30.58.56   <none>        443/TCP   7h
[root@ip-172-18-0-56 ~]# netstat -an|grep -w 172.30.3.135
tcp        0      0 10.128.0.1:51940        172.30.3.135:443        ESTABLISHED
tcp        0      0 10.128.0.1:55834        172.30.3.135:443        ESTABLISHED
tcp        0      0 10.128.0.1:55842        172.30.3.135:443        ESTABLISHED
[root@ip-172-18-0-56 ~]# netstat -an|grep -w 172.30.58.56

2, Access the slave node to record the connection number of the ansible-service-broker.

[root@ip-172-18-3-151 ~]# ps -elf | grep broker
4 S 1000240+  1853  1837  0  80   0 - 115988 ep_pol 02:37 ?       00:00:10 asbd -c /etc/ansible-service-broker/config.yaml
[root@ip-172-18-3-151 ~]# nsenter -t 1853 --net netstat -n | grep ESTABLISHED
tcp        0      0 10.130.0.10:57394       172.30.0.1:443          ESTABLISHED
tcp6       0      0 10.130.0.10:1338        10.128.0.1:50876        ESTABLISHED
[root@ip-172-18-3-151 ~]# nsenter -t 1853 --net netstat -n | grep ESTABLISHED | wc
      2      12     160

3, In the master node, do some sync operation with the ansible-service-broker via the svcat tool.
[root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster
synced clusterservicebroker (ansible-service-broker)
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster
synced clusterservicebroker (ansible-service-broker)
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster
\synced clusterservicebroker (ansible-service-broker)
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster
\synced clusterservicebroker (ansible-service-broker)
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster
synced clusterservicebroker (ansible-service-broker)
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster
\synced clusterservicebroker (ansible-service-broker)
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster
synced clusterservicebroker (ansible-service-broker)
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster
synced clusterservicebroker (ansible-service-broker)
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster
synced clusterservicebroker (ansible-service-broker)
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-0-56 ~]# svcat sync broker ansible-service-broker --scope cluster
synced clusterservicebroker (ansible-service-broker)
Synchronization requested for broker: ansible-service-broker

4, Do some Provision/Deprovision opertions.
 
5, Check the connection between service-catalog and the ansible-service-broker. It's still the 2, LGTM.
[root@ip-172-18-3-151 ~]# nsenter -t 1853 --net netstat -n | grep ESTABLISHED | wc
      2      12     160


And, check the controller-manager and apiserver connection number. No increase.

[root@ip-172-18-0-56 ~]# netstat -an|grep -w 172.30.58.56

[root@ip-172-18-0-56 ~]# netstat -an|grep -w 172.30.3.135
tcp        0      0 10.128.0.1:51940        172.30.3.135:443        ESTABLISHED
tcp        0      0 10.128.0.1:42680        172.30.3.135:443        ESTABLISHED
tcp        0      0 10.128.0.1:42676        172.30.3.135:443        ESTABLISHED

[root@ip-172-18-0-56 ~]# lsof -i:42680
COMMAND     PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
openshift 14846 root  177u  IPv4 1492771      0t0  TCP ip-10-128-0-1.ec2.internal:42680->apiserver.kube-service-catalog.svc.cluster.local:https (ESTABLISHED)
[root@ip-172-18-0-56 ~]# lsof -i:42676
COMMAND     PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
openshift 14846 root  173u  IPv4 1492763      0t0  TCP ip-10-128-0-1.ec2.internal:42676->apiserver.kube-service-catalog.svc.cluster.local:https (ESTABLISHED)
[root@ip-172-18-0-56 ~]# lsof -i:51940
COMMAND     PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
openshift 14846 root   99u  IPv4 895865      0t0  TCP ip-10-128-0-1.ec2.internal:51940->apiserver.kube-service-catalog.svc.cluster.local:https (ESTABLISHED)
[root@ip-172-18-0-56 ~]# ps aux | grep 14846
root     14846 12.1 12.2 1795564 917560 ?      Ssl  Nov01  58:12 openshift start master api --config=/etc/origin/master/master-config.yaml --loglevel=5

One more question, the connection number of the apiserver is 3.  Is it as excepted? @Jay

PS: I know this issue has nothing with this bug, so verify this bug.

Comment 17 Jian Zhang 2018-11-02 09:56:02 UTC
Jay,

I created a bug 1645465 for 3.10 version. Correct me if I'm wrong.

Comment 18 Jay Boyd 2018-11-02 15:12:30 UTC
> One more question, the connection number of the apiserver is 3.  Is it as excepted?

I'm not surprised to see some connections open on the apiserver.  14846 looks to be from the master API server, not surprising.  Master API will be communicating with our API server on a regular basis including readiness & liveness probes.  As long as this number is low and not continuously increasing I think it's good.

Comment 20 errata-xmlrpc 2018-11-20 03:10:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3537