Bug 1641791
| Summary: | controller-manager pod stops responding with max memory usage and lot of open tcp sockets | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jay Boyd <jaboyd> |
| Component: | Service Catalog | Assignee: | Jay Boyd <jaboyd> |
| Status: | CLOSED ERRATA | QA Contact: | Jian Zhang <jiazha> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.9.0 | CC: | chezhang, dyan, jfan, jiazha, suchaudh, zitang |
| Target Milestone: | --- | ||
| Target Release: | 3.10.z | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: |
Cause: The OSB Client Library utilized by the Service Catalog controller pod was not closing and freeing TCP connections used to communicate with Brokers.
Consequence: Over a period of time many TCP connections would remain open and eventually the communication between the Service Catalog controller and Brokers would fail. Additionally the pod would become unresponsive.
Fix: Use new a new version of the OSB Client Library which contains a fix to close connections when finishing a HTTP request and free idle connections.
|
Story Points: | --- |
| Clone Of: | 1638726 | Environment: | |
| Last Closed: | 2018-12-13 17:09:08 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1638726, 1641796 | ||
| Bug Blocks: | |||
|
Comment 1
Jay Boyd
2018-10-22 19:08:32 UTC
*** Bug 1645465 has been marked as a duplicate of this bug. *** Builds atomic-enterprise-service-catalog-3.10.56-1 and newer contain this fix. LGTM, verify it. Please see below for details.
The latest image of the ServiceCatalog:
registry.reg-aws.openshift.com:443/openshift3/ose-service-catalog:v3.10.83-1
The version of the ServiceCatalog:
[root@ip-172-18-9-32 ~]# oc exec controller-manager-qj2m5 -- service-catalog --version
v3.10.83;Upstream:v0.1.19
1, Record the connection num about the service catalog controller and apiserver.
[root@ip-172-18-9-32 ~]# oc get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
apiserver ClusterIP 172.30.49.143 <none> 443/TCP 50m
controller-manager ClusterIP 172.30.88.52 <none> 443/TCP 49m
[root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.143
tcp 0 0 10.128.0.1:55530 172.30.49.143:443 ESTABLISHED
[root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.52
[root@ip-172-18-9-32 ~]#
2, Access the slave node to record the connection number of the ansible-service-broker.
[root@ip-172-18-0-29 ~]# ps -elf|grep broker
4 S 1000110+ 14118 14099 0 80 0 - 44393 futex_ Dec03 ? 00:00:41 asbd -c /etc/ansible-service-broker/config.yaml
0 S root 27251 26846 0 80 0 - 28171 pipe_w 03:14 pts/0 00:00:00 grep --color=auto broker
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED
tcp 0 0 10.129.0.3:41590 172.30.0.1:443 ESTABLISHED
tcp6 0 0 10.129.0.3:1338 10.128.0.1:49396 ESTABLISHED
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc
2 12 160
3, Do some Provision/Deprovision APB operations.
4, Do some sync operation with the ansible-service-broker via the svcat tool. Like below:
[root@ip-172-18-9-32 ~]# svcat sync broker ansible-service-broker
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-9-32 ~]# svcat sync broker ansible-service-broker
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-9-32 ~]# svcat sync broker ansible-service-broker
Synchronization requested for broker: ansible-service-broker
[root@ip-172-18-9-32 ~]# svcat sync broker ansible-service-broker
Synchronization requested for broker: ansible-service-broker
...
5, Check the connection between service-catalog and the ansible-service-broker.
We can see the number is increased.
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc
11 66 880
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED
tcp 0 0 10.129.0.3:41590 172.30.0.1:443 ESTABLISHED
tcp 0 0 10.129.0.3:44894 23.5.234.71:443 ESTABLISHED
tcp6 0 0 10.129.0.3:1338 10.128.0.1:54738 ESTABLISHED
tcp6 0 0 10.129.0.3:1338 10.128.0.1:54714 ESTABLISHED
tcp6 0 0 10.129.0.3:1338 10.128.0.1:54634 ESTABLISHED
tcp6 0 0 10.129.0.3:1338 10.128.0.1:54688 ESTABLISHED
tcp6 0 0 10.129.0.3:1338 10.128.0.1:54702 ESTABLISHED
tcp6 0 0 10.129.0.3:1338 10.128.0.1:54628 ESTABLISHED
tcp6 0 0 10.129.0.3:1338 10.128.0.1:54668 ESTABLISHED
After a while, we can see the number is down. Looks good.
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED
tcp 0 0 10.129.0.3:41590 172.30.0.1:443 ESTABLISHED
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc
1 6 80
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED|wc
2 12 160
[root@ip-172-18-0-29 ~]# nsenter -t 14118 --net netstat -n | grep ESTABLISHED
tcp 0 0 10.129.0.3:46348 23.5.234.71:443 ESTABLISHED
tcp 0 0 10.129.0.3:41590 172.30.0.1:443 ESTABLISHED
6, Check the controller-manager and apiserver connection number. The number is low, and the 3 is not surspring. Because Master API will be communicating with our API server on a regular basis including readiness & liveness probes.
[root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.143
tcp 0 0 10.128.0.1:34094 172.30.49.143:443 ESTABLISHED
tcp 0 0 10.128.0.1:55530 172.30.49.143:443 ESTABLISHED
tcp 0 0 10.128.0.1:34102 172.30.49.143:443 ESTABLISHED
[root@ip-172-18-9-32 ~]# netstat -an|grep -w 172.30.49.52
[root@ip-172-18-9-32 ~]#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3750 |