Bug 759200
| Summary: | Memory leak in python-psycopg2 | ||
|---|---|---|---|
| Product: | Red Hat Enterprise MRG | Reporter: | Trevor McKay <tmckay> |
| Component: | cumin | Assignee: | Trevor McKay <tmckay> |
| Status: | CLOSED ERRATA | QA Contact: | Stanislav Graf <sgraf> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | Development | CC: | iboverma, jross, matt, mkudlej, sgraf |
| Target Milestone: | 2.1.1 | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | python-psycopg2-2.0.14-3.el5 | Doc Type: | Bug Fix |
| Doc Text: |
Cause
A bug in python-pyscopg2-2.0.14 causes a reference leak when cumin-data updates objects in the database. Specifically, the bug manifests when cursor.mogrify(operations, params) or cursor.execute(operations, params) is called and the 'operations' string references the same value from 'params' more than once.
Consequence
Long-running instances of cumin can leak significant amounts of memory.
Fix
The _mogrify() routine in cursor_type.c was patched in python-pscyopg2-2.0.14-3.
Result
This reference leak has been repaired and no longer is an issue when cumin-data updates obejcts.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2012-02-06 18:18:25 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 765607 | ||
| Attachments: | |||
|
Description
Trevor McKay
2011-12-01 16:35:13 UTC
The cursor.mogrify(operation, parameters) method in psycopg2 produces query strings from format/value pairs. The mogrify code is called from cursor.execute(operation, parameters) to produce a sql query before execution. Cumin-data uses this mechanism to fill in "template" queries with values from particular objects.
It appears that cursor.mogrify() will leak a reference to a value if that value is referred to more than once in the format string. For example:
cursor.mogrify("%(a)s %(a)s", {"a": "foo"})
will incorrectly increment the reference count for the string "foo".
When an object is updated, cumin-data updates objects and sample data in the same sql operation string (an update and an insert, separated by a semicolon) passed to cursor.execute(). So whenever an object column is written through to the database and is also tracked in sample data, the value for that column is leaked because the format string contains multiple references to the value.
Workaround:
It should be possible to work around this leak in cumin-data by issuing the update and insert operations for object data and sample data as two separate operations and therefore two calls to mogrify() (but this still will require only a single commit). This should avoid the same value being referenced more than once in the format string.
Ultimately, this needs to be fixed in pyscopg2 (will be submitting a bug against it).
See attachment for a reproducer script that shows the bogus reference count. The reproducer requires a running Postgresql server (the script assumes the "cumin" database and user).
Created attachment 539925 [details]
Script that shows reference count problem with cursor.mogrify()
Illustrates the problem with cursor.mogrify().
Output of the test script. Note in the last fragment that the reference count will be incremented incorrectly for EACH reference beyond the first! b is referenced 3 times, so the count is incremented twice.
[tmckay@tmckaylt mint]$ ./psycopg_leak.py
sql is %(a)s %(b)s
vals {'a': 'unique string', 'b': 'some other string'}
reference count for vals[a] = 6
reference count for vals[b] = 6
after call to cursor.mogrify(sql, vals)...
reference count for vals[a] = 6
reference count for vals[b] = 6
sql is %(a)s %(a)s
vals {'a': 'unique string', 'b': 'some other string'}
reference count for vals[a] = 6
reference count for vals[b] = 6
after call to cursor.mogrify(sql, vals)...
reference count for vals[a] = 7
reference count for vals[b] = 6
sql is %(b)s %(b)s
vals {'a': 'unique string', 'b': 'some other string'}
reference count for vals[a] = 7
reference count for vals[b] = 6
after call to cursor.mogrify(sql, vals)...
reference count for vals[a] = 7
reference count for vals[b] = 7
sql is %(b)s %(b)s %(b)s
vals {'a': 'unique string', 'b': 'some other string'}
reference count for vals[a] = 7
reference count for vals[b] = 7
after call to cursor.mogrify(sql, vals)...
reference count for vals[a] = 7
reference count for vals[b] = 9
hmmm, we package psycopg2 for RHEL (I think jross has built it). Maybe we should attempt a patch ourselves, coincident with submitting a bug to upstream. Here is a link to the psycopg ticket http://psycopg.lighthouseapp.com/projects/62710/tickets/81-_mogrify-leaks-reference-when-same-dict-value-referenced-multiple-times-with Created attachment 541037 [details]
Modify update.py to use two operations instead of one to work around leak
Running this patch as a test. So far so good. This is an alternative to a patched psycopg2. I need to run a scale test to see what the impact is of an extra operation.
Note, archives of notices on releases can be found here: http://initd.org/psycopg/articles/ Here's a summary of changes from Python 2.0 to 2.4.2 (currrent version, jumped from 2.0 straight to 2.2, not sure what happened to 2.1) after reading through posts. Sounds to me like we could benefit in general from moving forward to 2.4.x in the near future -- a few leaks have been fixed, and performance in general has been improved. As RHEL moves forward in the future and uses newer versions of Postgres, it makes sense to update pyscopg2 as well. However, aside from maybe having some more reference leaks we haven't discovered yet, there does not seem to be anything requiring us to move to a newer version at this time. A patched 2.0.x would work for us. two phase commits a reference leak or two (or three) better handling of a variety of date/time objects better handling for decimals optimizations for fewer copies, superfluous operations fix leak related to libpq (part of postgresql-devel and/or postgresql-debuginfo) doc updates asynchronous queries support for Postgresql 8.4 and 9.0 That was psycopg 2.0 to 2.4.2, not Python (In reply to comment #7) > > Here's a summary of changes from Python 2.0 to 2.4.2 (currrent version, jumped > from 2.0 straight to 2.2, not sure what happened to 2.1) after reading through > posts. fyi, the patch to cumin-data appears to have taken care of the leak. There may still be a slow leak somewhere else, however -- hard to tell whether the current increase is do to normal operation and changing grid landscape or a real leak. Will run on another machine with latest psycopg2 release (there have been other leak fixes since 2.0.13) Waiting for a response from upstream on the psycopg patch for the mogrify issue. Abbreviated ps output, with RSS (unit is k) 12/5/2011 at 12:30 pm 34508 /usr/bin/cumin-data --section=data.grid --daemon --heapy 36832 /usr/bin/cumin-data --section=data.grid-slots --daemon --heapy 101904 /usr/bin/cumin-data --section=data.grid-submissions --daemon --heapy 32300 /usr/bin/cumin-data --section=data.sesame --daemon --heapy Output on 12/6/2011 at about 10am. 34580 /usr/bin/cumin-data --section=data.grid --daemon --heapy 36964 /usr/bin/cumin-data --section=data.grid-slots --daemon --heapy 105428 /usr/bin/cumin-data --section=data.grid-submissions --daemon --heapy 32308 /usr/bin/cumin-data --section=data.sesame --daemon --heapy Based on cumin-data stats output, 2 queries vs 1 query does not make a significant difference. Still looks like 0.009-0.010 seconds to process an update. This seems reasonable, since the difference is processing "update; insert" versus "update" followed by "insert" -- Postgresql I'm guessing treats the first as two separate queries anyway. The difference would be in the overhead in psycopg2 in that case. I would say that splitting the query is a viable patch for the leak that was found. Update on cumin-data leakage (with psycopg2 patched) New numbers on 12/8/2011, a few scale tests with lots of submissions and creation/deletion of groups of slots on afternoon of 12/7 39556 /usr/bin/cumin-data --section=data.grid --daemon --heapy 129644 /usr/bin/cumin-data --section=data.grid-slots --daemon --heapy 112624 /usr/bin/cumin-data --section=data.grid-submissions --daemon --heapy 37504 /usr/bin/cumin-data --section=data.sesame --daemon --heapy Continue to investigate whether 2.4.2 will give better numbers... fyi, we will attempt to correct this with a patched version of psycopg2 Created attachment 545841 [details]
Patch applied to cursor_type.c in python-psycopg2-2.0.14-3
Changed the check for an existing value to use PyDict_Contains() instead of creating a new temporary object with PyObject_GetItem(), this removes the "item" variable from the code.
Added the reference decrement for "value" when the the value is already in the temporary dictionary from a previous reference.
Comment on attachment 541037 [details]
Modify update.py to use two operations instead of one to work around leak
This alternative is no longer needed given the patch to psycopg2
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Cause
A bug in python-pyscopg2-2.0.14 causes a reference leak when cumin-data updates objects in the database. Specifically, the bug manifests when cursor.mogrify(operations, params) or cursor.execute(operations, params) is called and the 'operations' string references the same value from 'params' more than once.
Consequence
Long-running instances of cumin can leak significant amounts of memory.
Fix
The _mogrify() routine in cursor_type.c was patched in python-pscyopg2-2.0.14-3.
Result
This reference leak has been repaired and no longer is an issue when cumin-data updates obejcts.
Reproduced:
RHEL 5.7 i386/x86_64, MRG 2.1.0
cumin-0.1.5098-2.el5
python-psycopg2-2.0.14-2.el5
sql is %(a)s %(b)s
vals {'a': 'unique string', 'b': 'some other string'}
reference count for vals[a] = 5
reference count for vals[b] = 5
after call to cursor.mogrify(sql, vals)...
reference count for vals[a] = 5
reference count for vals[b] = 5
sql is %(a)s %(a)s
vals {'a': 'unique string', 'b': 'some other string'}
reference count for vals[a] = 5
reference count for vals[b] = 5
after call to cursor.mogrify(sql, vals)...
reference count for vals[a] = 6
reference count for vals[b] = 5
sql is %(b)s %(b)s
vals {'a': 'unique string', 'b': 'some other string'}
reference count for vals[a] = 6
reference count for vals[b] = 5
after call to cursor.mogrify(sql, vals)...
reference count for vals[a] = 6
reference count for vals[b] = 6
sql is %(b)s %(b)s %(b)s
vals {'a': 'unique string', 'b': 'some other string'}
reference count for vals[a] = 6
reference count for vals[b] = 6
after call to cursor.mogrify(sql, vals)...
reference count for vals[a] = 6
reference count for vals[b] = 8
Verified:
RHEL 5.7 i386/x86_64, MRG 2.1.x
cumin-0.1.5180-1.el5
python-psycopg2-2.0.14-3.el5
sql is %(a)s %(b)s
vals {'a': 'unique string', 'b': 'some other string'}
reference count for vals[a] = 5
reference count for vals[b] = 5
after call to cursor.mogrify(sql, vals)...
reference count for vals[a] = 5
reference count for vals[b] = 5
sql is %(a)s %(a)s
vals {'a': 'unique string', 'b': 'some other string'}
reference count for vals[a] = 5
reference count for vals[b] = 5
after call to cursor.mogrify(sql, vals)...
reference count for vals[a] = 5
reference count for vals[b] = 5
sql is %(b)s %(b)s
vals {'a': 'unique string', 'b': 'some other string'}
reference count for vals[a] = 5
reference count for vals[b] = 5
after call to cursor.mogrify(sql, vals)...
reference count for vals[a] = 5
reference count for vals[b] = 5
sql is %(b)s %(b)s %(b)s
vals {'a': 'unique string', 'b': 'some other string'}
reference count for vals[a] = 5
reference count for vals[b] = 5
after call to cursor.mogrify(sql, vals)...
reference count for vals[a] = 5
reference count for vals[b] = 5
VERIFIED
RHEL 6 part is tracked in Bug 765998
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0100.html |