Bug 1641791
Summary: | controller-manager pod stops responding with max memory usage and lot of open tcp sockets | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jay Boyd <jaboyd> |
Component: | Service Catalog | Assignee: | Jay Boyd <jaboyd> |
Status: | CLOSED ERRATA | QA Contact: | Jian Zhang <jiazha> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.9.0 | CC: | chezhang, dyan, jfan, jiazha, suchaudh, zitang |
Target Milestone: | --- | ||
Target Release: | 3.10.z | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: The OSB Client Library utilized by the Service Catalog controller pod was not closing and freeing TCP connections used to communicate with Brokers.
Consequence: Over a period of time many TCP connections would remain open and eventually the communication between the Service Catalog controller and Brokers would fail. Additionally the pod would become unresponsive.
Fix: Use new a new version of the OSB Client Library which contains a fix to close connections when finishing a HTTP request and free idle connections.
|
Story Points: | --- |
Clone Of: | 1638726 | Environment: | |
Last Closed: | 2018-12-13 17:09:08 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1638726, 1641796 | ||
Bug Blocks: |
Comment 1
Jay Boyd
2018-10-22 19:08:32 UTC
*** Bug 1645465 has been marked as a duplicate of this bug. *** Builds atomic-enterprise-service-catalog-3.10.56-1 and newer contain this fix. LGTM, verify it. Please see below for details. The latest image of the ServiceCatalog: registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.10.83-1 The version of the ServiceCatalog: [root@ip-172-18-9-32 ~]# oc exec controller-manager-qj2m5 -- service-catalog --version v3.10.83;Upstream:v0.1.19 1, Record the connection num about the service catalog controller and apiserver. [root@ip-172-18-9-32 ~]# oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE apiserver ClusterIP 172.30.49.143 <none> 443/TCP 50m controller-manager ClusterIP 172.30.88.52 <none> 443/TCP 49m [root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.143 tcp 0 0 10.128.0.1:55530 172.30.49.143:443 ESTABLISHED [root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.52 [root@ip-172-18-9-32 ~]# 2, Access the slave node to record the connection number of the ansible-service-broker. [root@ip-172-18-0-29 ~]# ps -elf|grep broker 4 S 1000110+ 14118 14099 0 80 0 - 44393 futex_ Dec03 ? 00:00:41 asbd -c /etc/ansible-service-broker/config.yaml 0 S root 27251 26846 0 80 0 - 28171 pipe_w 03:14 pts/0 00:00:00 grep --color=auto broker [root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED tcp 0 0 10.129.0.3:41590 172.30.0.1:443 ESTABLISHED tcp6 0 0 10.129.0.3:1338 10.128.0.1:49396 ESTABLISHED [root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc 2 12 160 3, Do some Provision/Deprovision APB operations. 4, Do some sync operation with the ansible-service-broker via the svcat tool. Like below: [root@ip-172-18-9-32 ~]# svcat sync broker ansible-service-broker Synchronization requested for broker: ansible-service-broker [root@ip-172-18-9-32 ~]# svcat sync broker ansible-service-broker Synchronization requested for broker: ansible-service-broker [root@ip-172-18-9-32 ~]# svcat sync broker ansible-service-broker Synchronization requested for broker: ansible-service-broker [root@ip-172-18-9-32 ~]# svcat sync broker ansible-service-broker Synchronization requested for broker: ansible-service-broker ... 5, Check the connection between service-catalog and the ansible-service-broker. We can see the number is increased. [root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc 11 66 880 [root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED tcp 0 0 10.129.0.3:41590 172.30.0.1:443 ESTABLISHED tcp 0 0 10.129.0.3:44894 23.5.234.71:443 ESTABLISHED tcp6 0 0 10.129.0.3:1338 10.128.0.1:54738 ESTABLISHED tcp6 0 0 10.129.0.3:1338 10.128.0.1:54714 ESTABLISHED tcp6 0 0 10.129.0.3:1338 10.128.0.1:54634 ESTABLISHED tcp6 0 0 10.129.0.3:1338 10.128.0.1:54688 ESTABLISHED tcp6 0 0 10.129.0.3:1338 10.128.0.1:54702 ESTABLISHED tcp6 0 0 10.129.0.3:1338 10.128.0.1:54628 ESTABLISHED tcp6 0 0 10.129.0.3:1338 10.128.0.1:54668 ESTABLISHED After a while, we can see the number is down. Looks good. [root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED tcp 0 0 10.129.0.3:41590 172.30.0.1:443 ESTABLISHED [root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc 1 6 80 [root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc 2 12 160 [root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED tcp 0 0 10.129.0.3:46348 23.5.234.71:443 ESTABLISHED tcp 0 0 10.129.0.3:41590 172.30.0.1:443 ESTABLISHED 6, Check the controller-manager and apiserver connection number. The number is low, and the 3 is not surspring. Because Master API will be communicating with our API server on a regular basis including readiness & liveness probes. [root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.143 tcp 0 0 10.128.0.1:34094 172.30.49.143:443 ESTABLISHED tcp 0 0 10.128.0.1:55530 172.30.49.143:443 ESTABLISHED tcp 0 0 10.128.0.1:34102 172.30.49.143:443 ESTABLISHED [root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.52 [root@ip-172-18-9-32 ~]# Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3750 |