Bug 1638726
Summary: | controller-manager pod stops responding with max memory usage and lot of open tcp sockets | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Sudarshan Chaudhari <suchaudh> | ||||
Component: | Service Catalog | Assignee: | Jay Boyd <jaboyd> | ||||
Status: | CLOSED ERRATA | QA Contact: | Jian Zhang <jiazha> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 3.9.0 | CC: | dyan, jaboyd, jfan, jiazha, jlee, zitang | ||||
Target Milestone: | --- | ||||||
Target Release: | 3.9.z | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause: The OSB Client Library utilized by the Service Catalog controller pod was not closing and freeing TCP connections used to communicate with Brokers.
Consequence: Over a period of time many TCP connections would remain open and eventually the communication between the Service Catalog controller and Brokers would fail. Additionally the pod would become unresponsive.
Fix: Use new a new version of the OSB Client Library which contains a fix to close connections when finishing a HTTP request and free idle connections.
|
Story Points: | --- | ||||
Clone Of: | |||||||
: | 1641791 1641796 (view as bug list) | Environment: | |||||
Last Closed: | 2018-12-13 19:27:05 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1641791, 1641796 | ||||||
Attachments: |
|
Description
Sudarshan Chaudhari
2018-10-12 10:38:06 UTC
I will investigate backporting two fixes made to the go-open-service-broker package to 1.9 and 1.10: https://github.com/pmorie/go-open-service-broker-client/pull/131 and https://github.com/pmorie/go-open-service-broker-client/pull/132. There were other changes made in Service Catalog for reusing an established connection, but those changes will not be possible to backport to 1.9. The service broker client changes should address the connection & memory issues you are seeing. not 1.9 & 1.10, should have said 3.9 & 3.10 fixed by https://github.com/openshift/ose/pull/1439 in 3.9.48-1 LGTM, verify it. Details as below: [root@ip-172-18-2-26 ~]# oc exec controller-manager-xqjsz -- service-catalog --version v0.1.9.1 1, Record the connection num about the service catalog controller and apiserver. [root@ip-172-18-11-165 ~]# oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE apiserver ClusterIP 172.30.42.121 <none> 443/TCP 1h [root@ip-172-18-11-165 ~]# netstat -an|grep -w 172.30.42.121 tcp 0 0 10.129.0.1:52754 172.30.42.121:443 ESTABLISHED 2, Access the slave node to record the connection number of the ansible-service-broker. [root@ip-172-18-1-79 ~]# ps -elf |grep broker 4 S 1000100+ 59768 59753 0 80 0 - 57802 futex_ 00:00 ? 00:00:04 asbd -c /etc/ansible-service-broker/config.yaml [root@ip-172-18-1-79 ~]# nsenter -t 59768 --net netstat -n | grep ESTABLISHED tcp 0 0 10.128.0.15:38448 172.30.0.1:443 ESTABLISHED tcp 0 0 10.128.0.15:37646 172.30.223.196:2379 ESTABLISHED 3, Sync the ansible-service-broker with the service catalog many times. 4, Do some provision/deprovision APB operations. 5, Check the connection between service-catalog and the ansible-service-broker. [root@ip-172-18-1-79 ~]# nsenter -t 59768 --net netstat -n | grep ESTABLISHED tcp 0 0 10.128.0.15:38448 172.30.0.1:443 ESTABLISHED tcp 0 0 10.128.0.15:37646 172.30.223.196:2379 ESTABLISHED tcp 0 0 10.128.0.15:45686 104.108.121.159:443 ESTABLISHED tcp6 0 0 10.128.0.15:1338 10.129.0.1:49556 ESTABLISHED tcp6 0 0 10.128.0.15:1338 10.129.0.1:49580 ESTABLISHED tcp6 0 0 10.128.0.15:1338 10.129.0.1:49578 ESTABLISHED tcp6 0 0 10.128.0.15:1338 10.129.0.1:49558 ESTABLISHED tcp6 0 0 10.128.0.15:1338 10.129.0.1:49592 ESTABLISHED tcp6 0 0 10.128.0.15:1338 10.129.0.1:49586 ESTABLISHED tcp6 0 0 10.128.0.15:1338 10.129.0.1:49554 ESTABLISHED tcp6 0 0 10.128.0.15:1338 10.129.0.1:49692 ESTABLISHED tcp6 0 0 10.128.0.15:1338 10.129.0.1:49616 ESTABLISHED tcp6 0 0 10.128.0.15:1338 10.129.0.1:49588 ESTABLISHED tcp6 0 0 10.128.0.15:1338 10.129.0.1:49642 ESTABLISHED tcp6 0 0 10.128.0.15:1338 10.129.0.1:49786 ESTABLISHED tcp6 0 0 10.128.0.15:1338 10.129.0.1:49560 ESTABLISHED After a while, we can see: [root@ip-172-18-1-79 ~]# nsenter -t 59768 --net netstat -n | grep ESTABLISHED tcp 0 0 10.128.0.15:38448 172.30.0.1:443 ESTABLISHED tcp 0 0 10.128.0.15:37646 172.30.223.196:2379 ESTABLISHED tcp6 0 0 10.128.0.15:1338 10.129.0.1:51756 ESTABLISHED Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3748 |