Bug 1368641

Summary: Memory leak in swift used as backend to glance
Product: Red Hat OpenStack Reporter: Robin Cernin <rcernin>
Component: openstack-tripleo-heat-templatesAssignee: Christian Schwede (cschwede) <cschwede>
Status: CLOSED CURRENTRELEASE QA Contact: Arik Chernetsky <achernet>
Severity: high Docs Contact:
Priority: high    
Version: 8.0 (Liberty)CC: aschultz, cschwede, derekh, egafford, ekuvaja, jmelvin, ltoscano, mandreou, markmc, mburns, pgrist, rhel-osp-director-maint, smykhail, srevivo, zaitcev
Target Milestone: asyncKeywords: Triaged
Target Release: 8.0 (Liberty)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-0.8.14-34.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1533948 (view as bug list) Environment:
Last Closed: 2018-09-25 08:57:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1533948    

Description Robin Cernin 2016-08-20 09:44:28 UTC
Description of problem:

Seems there is a memory leak in the swift when used with the glance as backup. And the connection for some reason is terminated during the transfer(either HAProxy) the memory stays allocated in swift proccess which get's killed by OOM afterwards.

Version-Release number of selected component (if applicable):

openstack-swift-2.5.0-2.el7ost.noarch
openstack-swift-account-2.5.0-2.el7ost.noarch
openstack-swift-container-2.5.0-2.el7ost.noarch
openstack-swift-object-2.5.0-2.el7ost.noarch
openstack-swift-plugin-swift3-1.9-1.el7ost.noarch
openstack-swift-proxy-2.5.0-2.el7ost.noarch
python-swiftclient-2.6.0-1.el7ost.noarch

openstack-glance-11.0.1-4.el7ost.noarch
python-glance-11.0.1-4.el7ost.noarch
python-glance-store-0.9.1-3.el7ost.noarch
python-glanceclient-1.1.1-1.el7ost.noarch

How reproducible:

glance -d image-download 03d98224-66ba-11e6-8b77-86f30ca893d3 --file edge.iso
--progress
DEBUG:keystoneclient.session:REQ: curl -g -i -X GET http://192.168.0.5:5000/v2.0
-H "Accept: application/json" -H "User-Agent: python-keystoneclient"
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection
(1): 192.168.0.5
DEBUG:requests.packages.urllib3.connectionpool:"GET /v2.0 HTTP/1.1" 200 335
DEBUG:keystoneclient.session:RESP: [200] Vary: X-Auth-Token Content-Type:
application/json Content-Length: 335 X-Openstack-Request-Id:
req-0f622e02-66ba-11e6-8b77-86f30ca893d3 Date: Sat, 20 Aug 2016 03:37:49 GMT
RESP BODY: {"version": {"status": "stable", "updated": "2014-04-17T00:00:00Z",
"media-types": [{"base": "application/json", "type":
"application/vnd.openstack.identity-v2.0+json"}], "id": "v2.0", "links":
[{"href": "http://192.168.0.5:5000/v2.0/", "rel": "self"}, {"href":
"http://docs.openstack.org/", "type": "text/html", "rel": "describedby"}]}}

DEBUG:keystoneclient.auth.identity.v2:Making authentication request to
http://192.168.0.5:5000/v2.0/tokens
DEBUG:requests.packages.urllib3.connectionpool:"POST /v2.0/tokens HTTP/1.1"
200 3423
DEBUG:keystoneclient.session:REQ: curl -g -i -X GET
http://192.168.0.5:9292/v2/images/03d98224-66ba-11e6-8b77-86f30ca893d3/file -H
"User-Agent: python-glanceclient" -H "Content-Type: application/octet-stream"
-H "X-Auth-Token: xxxxxxxxxxxxxxx"
INFO:requests.packages.urllib3.connectionpool:Starting new HTTP connection
(1): 192.168.0.5
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/glanceclient/shell.py", line 668, in
main
    OpenStackImagesShell().main(argv)
  File "/usr/lib/python2.7/site-packages/glanceclient/shell.py", line 599, in
main
    args.func(client, args)
  File "/usr/lib/python2.7/site-packages/glanceclient/v2/shell.py", line 282,
in do_image_download
    body = gc.images.data(args.id)
  File "/usr/lib/python2.7/site-packages/glanceclient/v2/images.py", line 195,
in data
    resp, body = self.http_client.get(url)
  File "/usr/lib/python2.7/site-packages/keystoneclient/adapter.py", line 176,
in get
    return self.request(url, 'GET', **kwargs)
  File "/usr/lib/python2.7/site-packages/glanceclient/common/http.py", line
343, in request
    raise exc.CommunicationError(message=message)
CommunicationError: Error finding address for
http://192.168.0.5:9292/v2/images/03d98224-66ba-11e6-8b77-86f30ca893d3/file:
Unable to establish connection to
http://192.168.0.5:9292/v2/images/03d98224-66ba-11e6-8b77-86f30ca893d3/file
Error finding address for
http://192.168.0.5:9292/v2/images/03d98224-66ba-11e6-8b77-86f30ca893d3/file:
Unable to establish connection to
http://192.168.0.5:9292/v2/images/03d98224-66ba-11e6-8b77-86f30ca893d3/file

Right around the time glance client error out, second sub-process of
swift-proxy started to grow:
[root@overcloud-controller-0 swift]# ps awfux|grep swift-proxy
swift    13084  1.0  0.0 302536 35248 ?        Ss   03:36   0:01
/usr/bin/python2 /usr/bin/swift-proxy-server /etc/swift/proxy-server.conf
swift    13160 35.3 20.0 13699376 13196652 ?   S    03:36   0:48  \_
/usr/bin/python2 /usr/bin/swift-proxy-server /etc/swift/proxy-server.conf
swift    13167  3.9  2.0 1608432 1331140 ?     R    03:36   0:05  \_
/usr/bin/python2 /usr/bin/swift-proxy-server /etc/swift/proxy-server.conf

and here where it settled:
[root@overcloud-controller-0 swift]# ps awfux|grep swift-proxy
swift    13084  0.7  0.0 302536 31448 ?        Ss   03:36   0:04
/usr/bin/python2 /usr/bin/swift-proxy-server /etc/swift/proxy-server.conf
swift    26553 58.8 52.5 34980264 34553208 ?   S    03:41   3:06  \_
/usr/bin/python2 /usr/bin/swift-proxy-server /etc/swift/proxy-server.conf
swift      455  0.0  0.0 302836 31556 ?        S    03:43   0:00  \_
/usr/bin/python2 /usr/bin/swift-proxy-server /etc/swift/proxy-server.conf

Steps to Reproduce:
1.
2.
3.

Actual results:



Expected results:


Additional info:

Comment 6 Christian Schwede (cschwede) 2016-08-22 06:23:21 UTC
I'm pretty sure this is related to a missing proxy-logging middleware in the beginning of the pipeline in /etc/swift/proxy-server.conf
Could you share that file here?

This issue has been reported earlier on in the CI:

https://bugs.launchpad.net/tripleo/+bug/1595916

And a fix was proposed too:

https://review.openstack.org/#/c/340389/

I was not able to reproduce this with a Swift-only installation.

Comment 8 Christian Schwede (cschwede) 2016-08-22 06:46:03 UTC
Thanks for sharing! You need the proxy-logging middleware twice, once at the beginning and once at the end. The correct one would look like this:

pipeline = catch_errors healthcheck proxy-logging cache ratelimit tempurl formpost authtoken keystone staticweb proxy-logging proxy-server

Some background on this: if an error happens before the end of the pipeline, it will not be properly logged if there is no proxy-logging at the beginning of the pipeline.

The fix for director is merged upstream:

https://review.openstack.org/#/c/340389/1/puppet/hieradata/controller.yaml

I think we need a backport for this.

Comment 9 Mark McLoughlin 2016-08-22 06:56:39 UTC
Moving to openstack-tripleo-heat-templates

Comment 14 Paul Grist 2016-08-26 17:57:13 UTC
Back to Christian backport the change for OSP8.

>>>>>>
Christian's reply:

Please don't remove the proxy-logging middleware, but add it a second
time to the beginning of the pipeline after health_check - removing will
most likely break logging for the customer (pls see also my comment
https://bugzilla.redhat.com/show_bug.cgi?id=1368641#c8).

FTR - I already had a quick look into this, and the issue goes like
this: some non-critical warning is logged in some of the middlewares
before the end of the pipeline (where the proxy-logging is added in
OSP7/8), and that breaks because logging is not yet defined. Therefore
you need a second proxy-logging middleware early on in the pipeline.

I will have a closer look next week to the issue itself; anyways the
correct way is to always define the proxy-logging twice in the pipeline
- at least that's how Swift has to be configured for a very long time.
<<<<<