Bug 1641791

Summary:	controller-manager pod stops responding with max memory usage and lot of open tcp sockets
Product:	OpenShift Container Platform	Reporter:	Jay Boyd <jaboyd>
Component:	Service Catalog	Assignee:	Jay Boyd <jaboyd>
Status:	CLOSED ERRATA	QA Contact:	Jian Zhang <jiazha>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	3.9.0	CC:	chezhang, dyan, jfan, jiazha, suchaudh, zitang
Target Milestone:	---
Target Release:	3.10.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: The OSB Client Library utilized by the Service Catalog controller pod was not closing and freeing TCP connections used to communicate with Brokers. Consequence: Over a period of time many TCP connections would remain open and eventually the communication between the Service Catalog controller and Brokers would fail. Additionally the pod would become unresponsive. Fix: Use new a new version of the OSB Client Library which contains a fix to close connections when finishing a HTTP request and free idle connections.	Story Points:	---
Clone Of:	1638726	Environment:
Last Closed:	2018-12-13 17:09:08 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1638726, 1641796
Bug Blocks:

Comment 1 Jay Boyd 2018-10-22 19:08:32 UTC

Fixed in 3.10 by https://github.com/openshift/service-catalog/pull/28

Comment 2 Jay Boyd 2018-11-02 13:51:33 UTC

*** Bug 1645465 has been marked as a duplicate of this bug. ***

Comment 3 Jay Boyd 2018-11-02 17:23:31 UTC

Builds atomic-enterprise-service-catalog-3.10.56-1 and newer contain this fix.

Comment 9 Jian Zhang 2018-12-04 08:46:37 UTC

LGTM, verify it. Please see below for details.

The latest image of the ServiceCatalog:
registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.10.83-1
The version of the ServiceCatalog:
[root@ip-172-18-9-32 ~]# oc exec controller-manager-qj2m5 -- service-catalog --version
v3.10.83;Upstream:v0.1.19

1, Record the connection num about the service catalog controller and apiserver.
[root@ip-172-18-9-32 ~]# oc get svc
NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
apiserver            ClusterIP   172.30.49.143   <none>        443/TCP   50m
controller-manager   ClusterIP   172.30.88.52    <none>        443/TCP   49m
[root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.143
tcp        0      0 10.128.0.1:55530        172.30.49.143:443       ESTABLISHED
[root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.52
[root@ip-172-18-9-32 ~]# 

2, Access the slave node to record the connection number of the ansible-service-broker.
[root@ip-172-18-0-29 ~]# ps -elf|grep broker
4 S 1000110+  14118  14099  0  80   0 - 44393 futex_ Dec03 ?        00:00:41 asbd -c /etc/ansible-service-broker/config.yaml
0 S root      27251  26846  0  80   0 - 28171 pipe_w 03:14 pts/0    00:00:00 grep --color=auto broker

[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED
tcp        0      0 10.129.0.3:41590        172.30.0.1:443          ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:49396        ESTABLISHED
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc 
      2      12     160

3, Do some Provision/Deprovision APB operations.
4, Do some sync operation with the ansible-service-broker via the svcat tool. Like below:
[root@ip-172-18-9-32 ~]# svcat sync broker  ansible-service-broker 
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-9-32 ~]# svcat sync broker  ansible-service-broker 
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-9-32 ~]# svcat sync broker  ansible-service-broker 
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-9-32 ~]# svcat sync broker  ansible-service-broker 
Synchronization requested for broker: ansible-service-broker
...

5, Check the connection between service-catalog and the ansible-service-broker.
We can see the number is increased.
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc 
     11      66     880

[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED
tcp        0      0 10.129.0.3:41590        172.30.0.1:443          ESTABLISHED
tcp        0      0 10.129.0.3:44894        23.5.234.71:443         ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:54738        ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:54714        ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:54634        ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:54688        ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:54702        ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:54628        ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:54668        ESTABLISHED

After a while, we can see the number is down. Looks good.
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED
tcp        0      0 10.129.0.3:41590        172.30.0.1:443          ESTABLISHED
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc
      1       6      80
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc
      2      12     160
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED
tcp        0      0 10.129.0.3:46348        23.5.234.71:443         ESTABLISHED
tcp        0      0 10.129.0.3:41590        172.30.0.1:443          ESTABLISHED


6, Check the controller-manager and apiserver connection number. The number is low, and the 3 is not surspring. Because Master API will be communicating with our API server on a regular basis including readiness & liveness probes.
[root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.143
tcp        0      0 10.128.0.1:34094        172.30.49.143:443       ESTABLISHED
tcp        0      0 10.128.0.1:55530        172.30.49.143:443       ESTABLISHED
tcp        0      0 10.128.0.1:34102        172.30.49.143:443       ESTABLISHED
[root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.52
[root@ip-172-18-9-32 ~]#

Comment 11 errata-xmlrpc 2018-12-13 17:09:08 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3750