Bug 1641791 - controller-manager pod stops responding with max memory usage and lot of open tcp sockets
Summary: controller-manager pod stops responding with max memory usage and lot of open...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Service Catalog
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.10.z
Assignee: Jay Boyd
QA Contact: Jian Zhang
URL:
Whiteboard:
: 1645465 (view as bug list)
Depends On: 1638726 1641796
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-22 19:03 UTC by Jay Boyd
Modified: 2018-12-13 17:09 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: The OSB Client Library utilized by the Service Catalog controller pod was not closing and freeing TCP connections used to communicate with Brokers. Consequence: Over a period of time many TCP connections would remain open and eventually the communication between the Service Catalog controller and Brokers would fail. Additionally the pod would become unresponsive. Fix: Use new a new version of the OSB Client Library which contains a fix to close connections when finishing a HTTP request and free idle connections.
Clone Of: 1638726
Environment:
Last Closed: 2018-12-13 17:09:08 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:3750 None None None 2018-12-13 17:09:14 UTC

Comment 1 Jay Boyd 2018-10-22 19:08:32 UTC
Fixed in 3.10 by https://github.com/openshift/service-catalog/pull/28

Comment 2 Jay Boyd 2018-11-02 13:51:33 UTC
*** Bug 1645465 has been marked as a duplicate of this bug. ***

Comment 3 Jay Boyd 2018-11-02 17:23:31 UTC
Builds atomic-enterprise-service-catalog-3.10.56-1 and newer contain this fix.

Comment 9 Jian Zhang 2018-12-04 08:46:37 UTC
LGTM, verify it. Please see below for details.

The latest image of the ServiceCatalog:
registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.10.83-1
The version of the ServiceCatalog:
[root@ip-172-18-9-32 ~]# oc exec controller-manager-qj2m5 -- service-catalog --version
v3.10.83;Upstream:v0.1.19

1, Record the connection num about the service catalog controller and apiserver.
[root@ip-172-18-9-32 ~]# oc get svc
NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
apiserver            ClusterIP   172.30.49.143   <none>        443/TCP   50m
controller-manager   ClusterIP   172.30.88.52    <none>        443/TCP   49m
[root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.143
tcp        0      0 10.128.0.1:55530        172.30.49.143:443       ESTABLISHED
[root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.52
[root@ip-172-18-9-32 ~]# 

2, Access the slave node to record the connection number of the ansible-service-broker.
[root@ip-172-18-0-29 ~]# ps -elf|grep broker
4 S 1000110+  14118  14099  0  80   0 - 44393 futex_ Dec03 ?        00:00:41 asbd -c /etc/ansible-service-broker/config.yaml
0 S root      27251  26846  0  80   0 - 28171 pipe_w 03:14 pts/0    00:00:00 grep --color=auto broker

[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED
tcp        0      0 10.129.0.3:41590        172.30.0.1:443          ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:49396        ESTABLISHED
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc 
      2      12     160

3, Do some Provision/Deprovision APB operations.
4, Do some sync operation with the ansible-service-broker via the svcat tool. Like below:
[root@ip-172-18-9-32 ~]# svcat sync broker  ansible-service-broker 
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-9-32 ~]# svcat sync broker  ansible-service-broker 
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-9-32 ~]# svcat sync broker  ansible-service-broker 
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-9-32 ~]# svcat sync broker  ansible-service-broker 
Synchronization requested for broker: ansible-service-broker
...

5, Check the connection between service-catalog and the ansible-service-broker.
We can see the number is increased.
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc 
     11      66     880

[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED
tcp        0      0 10.129.0.3:41590        172.30.0.1:443          ESTABLISHED
tcp        0      0 10.129.0.3:44894        23.5.234.71:443         ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:54738        ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:54714        ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:54634        ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:54688        ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:54702        ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:54628        ESTABLISHED
tcp6       0      0 10.129.0.3:1338         10.128.0.1:54668        ESTABLISHED

After a while, we can see the number is down. Looks good.
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED
tcp        0      0 10.129.0.3:41590        172.30.0.1:443          ESTABLISHED
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc
      1       6      80
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc
      2      12     160
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED
tcp        0      0 10.129.0.3:46348        23.5.234.71:443         ESTABLISHED
tcp        0      0 10.129.0.3:41590        172.30.0.1:443          ESTABLISHED


6, Check the controller-manager and apiserver connection number. The number is low, and the 3 is not surspring. Because Master API will be communicating with our API server on a regular basis including readiness & liveness probes.
[root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.143
tcp        0      0 10.128.0.1:34094        172.30.49.143:443       ESTABLISHED
tcp        0      0 10.128.0.1:55530        172.30.49.143:443       ESTABLISHED
tcp        0      0 10.128.0.1:34102        172.30.49.143:443       ESTABLISHED
[root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.52
[root@ip-172-18-9-32 ~]#

Comment 11 errata-xmlrpc 2018-12-13 17:09:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3750


Note You need to log in before you can comment on or make changes to this bug.