Hide Forgot
Description of problem: Python application using lxml/python-lxml causes memory leaks and itermittent segfaults. Following are the error reports from valgrind. It reports two separate occurrences of invalid memory reads: one when freeing an XML property, and another when freeing a node. 02== Invalid read of size 8 ==14402== at 0x35C7A54AED: xmlFreeProp (tree.c:2032) ==14402== by 0x35C7A54DD8: xmlFreePropList (tree.c:2016) ==14402== by 0x35C7A54462: xmlFreeNodeList (tree.c:3617) ==14402== by 0x35C7A54442: xmlFreeNodeList (tree.c:3612) ==14402== by 0x35C7A54442: xmlFreeNodeList (tree.c:3612) ==14402== by 0x35C7A54442: xmlFreeNodeList (tree.c:3612) ==14402== by 0x35C7A54442: xmlFreeNodeList (tree.c:3612) ==14402== by 0x35C7A54442: xmlFreeNodeList (tree.c:3612) ==14402== by 0x35C7A54297: xmlFreeDoc (tree.c:1224) ==14402== by 0x10AEBA08: __pyx_tp_dealloc_4lxml_5etree__Document (lxml.etree.c:28182) ==14402== by 0x10AEB8AC: __pyx_tp_dealloc_4lxml_5etree__Element (lxml.etree.c:7079) ==14402== by 0x35CBAA1A15: subtype_dealloc (typeobject.c:1019) ==14402== Address 0xcae87d8 is 152 bytes inside a block of size 176 free'd ==14402== at 0x4A04D72: free (vg_replace_malloc.c:325) ==14402== by 0x35C7A5434A: xmlFreeDoc (tree.c:1231) ==14402== by 0x10AEBA08: __pyx_tp_dealloc_4lxml_5etree__Document (lxml.etree.c:28182) ==14402== by 0x10AF6EDA: __pyx_f_4lxml_5etree_moveNodeToDocument (lxml.etree.c:7182) ==14402== by 0x10AF6FC6: __pyx_f_4lxml_5etree__appendChild (lxml.etree.c:18977) ==14402== by 0x10AF72F2: __pyx_pf_4lxml_5etree_8_Element_append (lxml.etree.c:31584) ==14402== by 0x35CBADF7F8: PyEval_EvalFrameEx (ceval.c:3738) ==14402== by 0x35CBADF52C: PyEval_EvalFrameEx (ceval.c:3836) ==14402== by 0x35CBADF52C: PyEval_EvalFrameEx (ceval.c:3836) ==14402== by 0x35CBADF52C: PyEval_EvalFrameEx (ceval.c:3836) ==14402== by 0x35CBADF52C: PyEval_EvalFrameEx (ceval.c:3836) ==14402== by 0x35CBAE05A3: PyEval_EvalCodeEx (ceval.c:3000) ==14402== ==14402== Invalid read of size 8 ==14402== at 0x35C7A54409: xmlFreeNodeList (tree.c:3602) ==14402== by 0x35C7A54B1F: xmlFreeProp (tree.c:2041) ==14402== by 0x35C7A54DD8: xmlFreePropList (tree.c:2016) ==14402== by 0x35C7A54462: xmlFreeNodeList (tree.c:3617) ==14402== by 0x35C7A54442: xmlFreeNodeList (tree.c:3612) ==14402== by 0x35C7A54442: xmlFreeNodeList (tree.c:3612) ==14402== by 0x35C7A54442: xmlFreeNodeList (tree.c:3612) ==14402== by 0x35C7A54442: xmlFreeNodeList (tree.c:3612) ==14402== by 0x35C7A54442: xmlFreeNodeList (tree.c:3612) ==14402== by 0x35C7A54297: xmlFreeDoc (tree.c:1224) ==14402== by 0x10AEBA08: __pyx_tp_dealloc_4lxml_5etree__Document (lxml.etree.c:28182) ==14402== by 0x10AEB8AC: __pyx_tp_dealloc_4lxml_5etree__Element (lxml.etree.c:7079) ==14402== Address 0xcae87d8 is 152 bytes inside a block of size 176 free'd ==14402== at 0x4A04D72: free (vg_replace_malloc.c:325) ==14402== by 0x35C7A5434A: xmlFreeDoc (tree.c:1231) ==14402== by 0x10AEBA08: __pyx_tp_dealloc_4lxml_5etree__Document (lxml.etree.c:28182) ==14402== by 0x10AF6EDA: __pyx_f_4lxml_5etree_moveNodeToDocument (lxml.etree.c:7182) ==14402== by 0x10AF6FC6: __pyx_f_4lxml_5etree__appendChild (lxml.etree.c:18977) ==14402== by 0x10AF72F2: __pyx_pf_4lxml_5etree_8_Element_append (lxml.etree.c:31584) ==14402== by 0x35CBADF7F8: PyEval_EvalFrameEx (ceval.c:3738) ==14402== by 0x35CBADF52C: PyEval_EvalFrameEx (ceval.c:3836) ==14402== by 0x35CBADF52C: PyEval_EvalFrameEx (ceval.c:3836) ==14402== by 0x35CBADF52C: PyEval_EvalFrameEx (ceval.c:3836) ==14402== by 0x35CBADF52C: PyEval_EvalFrameEx (ceval.c:3836) ==14402== by 0x35CBAE05A3: PyEval_EvalCodeEx (ceval.c:3000) ==14402== Version-Release number of selected component (if applicable): python-lxml-2.2.3-1.1.el6.x86_64 How reproducible: Always Steps to Reproduce: 1. Install the valgrind packages (if not already installed) 2. Create two files on disk, named test.definition and packages.xml respectively. 2a. Paste the following xml content to the test.definition file. --------------------------------------- <distribution xmlns:xi="http://www.w3.org/2001/XInclude"> <main> <fullname>Example System</fullname> <name>example</name> <version>5</version> <arch>i386</arch> </main> <packages> <xi:include href="packages.xml" xpointer="xpointer(/distribution/packages/*)"/> </packages> <repos> <repo id="base"> <name>CentOS-$releasever - Base</name> <baseurl>http://mirror.centos.org/centos/$releasever/os/$basearch/</baseurl> </repo> </repos> </distribution> --------------------------------- 2b. Paste the following content to the packages.xml file. --------------------------------- <?xml version="1.0" encoding="utf-8"?> <distribution xmlns:xi="http://www.w3.org/2001/XInclude"> <packages> <group repoid='base'>core</group> </packages> </distribution> ---------------------------------- 3. Follow the instructions at http://www.renditionsoftware.com/systemstudio/source to download and run the application from source code. 4. Run the application as the root user from within valgrind using the following command line. Replace $WORKSPACE with the location of the systemstudio sources from step 3 above. # valgrind python $WORKSPACE/bin/systemstudio test.definition --force all --debug --log-level 2 Actual results: Memory leak. Expected results: Application should get installed without any memory leak or segmentation fault. Additional info: * Valgrind may report an error "Syscall param utimes(tvp[1]) points to uninitialised byte(s)". This is a false positive (as mentioned in valgrind documentation) and can be safely ignored. * The first run of the application can take five minutes or more as megabites of data are being downloaded over the internet (in this case from the CentOS mirror website - sorry). Be patient. Alternatively, change the baseurl in the test.definition to point to a local RHEL 5 or 6 install tree :-) * At some point during processing (often within the "depsolve" step) valgrind will report the two errors listed above. Following that, the process may continue to the end. Often, however, it dies with a segfault error.
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative.
Thanks for the description, however I'm not able to reproduce it at the moment because the systemstudio application is crashing during start with: .../kickstart.py", line 32, in __init__ h = list(ts.dbMatch('name', 'pykickstart'))[0] IndexError: list index out of range I'll try it again later.
(In reply to comment #2) > .../kickstart.py", line 32, in __init__ > h = list(ts.dbMatch('name', 'pykickstart'))[0] > IndexError: list index out of range The problem was in missing pykickstart package. I wrote to Rendition Software and the updated http://www.renditionsoftware.com/systemstudio/source to reflect that.
I've been running systemstudio in valgrind as you described many times, but I've never seen the crash and the 'invalid read of size 8' warning. I've been trying it on RHEL-6.1 and I see the report is against RHEL-6.0. Is it possible that it no longer occurs on 6.1 (although I'm not sure how when python-lxml and libxml2 were not updated in 6.1) ? According to the valgrind log you posted the first idea for a fix would be something like this: diff src/lxml/lxml.etree.c static void __pyx_pf_4lxml_5etree_9_Document___dealloc__(PyObject *__pyx_v_self) { __Pyx_SetupRefcountContext("__dealloc__"); + if (((struct LxmlDocument *)__pyx_v_self)->_c_doc != NULL) { xmlFreeDoc(((struct LxmlDocument *)__pyx_v_self)->_c_doc); + ((struct LxmlDocument *)__pyx_v_self)->_c_doc = NULL; + } __Pyx_FinishRefcountContext(); } But I can't test it because I'm not able to reproduce the crash.
To the memory leaking issue: I certainly see some memory leaks when running systemstudio via valgrind. When running with --leak-check=full I see something like: 240 bytes in 1 blocks are possibly lost in loss record 4,920 of 7,015 at 0x4A04820: memalign (vg_replace_malloc.c:581) by 0x4A048D7: posix_memalign (vg_replace_malloc.c:709) by 0x1290DF87: slab_allocator_alloc_chunk (gslice.c:1136) by 0x1290E80B: g_slice_alloc (gslice.c:661) by 0x1290FDBD: g_slist_prepend (gslist.c:160) by 0x1291461F: g_string_chunk_insert_len (gstring.c:334) by 0x126B0E47: primary_sax_end_element (xml-parser.c:379) by 0x3DAE83F58C: xmlParseEndTag1 (parser.c:8228) by 0x3DAE84637A: xmlParseElement (parser.c:9568) by 0x3DAE846649: xmlParseContent (parser.c:9371) by 0x3DAE8461A0: xmlParseElement (parser.c:9542) by 0x3DAE846649: xmlParseContent (parser.c:9371) This is telling us that something (probably python-lxml) had allocated some memory via libxml2 (parser.c) and didn't free it. Problem is that there's no more info to it so I don't know how to dig into this any deeper. And I'm not sure that this (memory leaks) is a big deal when the systemstudio isn't a long time running program/daemon.
(In reply to comment #5) > Problem is that there's no more info to it so I don't know how to dig into this any deeper. I'll try --num-callers to get some more info.
(In reply to comment #6) > I'll try --num-callers to get some more info. Ok, widening the stack have shown that it's not python-lxml but yum-metadata-parser what leaks memory via libxml2. ... by 0x3DAE8461A0: xmlParseElement (parser.c:9542) by 0x3DAE84D131: xmlParseDocument (parser.c:10204) by 0x3DAE84DF0E: xmlSAXUserParseFile (parser.c:13591) by 0x126B1B39: yum_xml_parse_primary (xml-parser.c:593) by 0x126B3EC3: py_update (sqlitecache.c:420) by 0x126B4749: py_update_primary (sqlitecache.c:578) So I'm not aware of any evidence of python-lxml leaking memory.
The summary so far is: I'm not able to reproduce the crash in RHEL-6.1(x86_64) and don't see any memory leaks in python-lxml. So I tend to close this BZ as WORKSFORME. What do you think ?
Jiri Popelka, ensure that you are using correct revision of the source code i.e revision 2139. "hg clone https://www.renditionsoftware.com/hg/public/systemstudio -r 2139"
Yes, I'm able to reproduce the segfault with revision 2139. According to the systemstudio changelog the problem seems to be (actually was) in systemstudio # hg log -r 2140 changeset: 2140:37679abc80b7 branch: trunk user: Kay Williams date: Tue Jan 25 17:35:24 2011 -0800 summary: fixed lxml segfaults with rhel6, also fixed schema validation I'm closing this ticket as notabug because the problem is (was) in a third party software and per comment #7 it's not python-lxml what leaks memory.