Bug 132426

Summary: ccsd memory leak
Product: [Retired] Red Hat Cluster Suite Reporter: Christine Caulfield <ccaulfie>
Component: gfsAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED NEXTRELEASE QA Contact: GFS Bugs <gfs-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 4   
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-16 07:57:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Christine Caulfield 2004-09-13 09:58:09 UTC
Description of problem:
Running Dean's join/leave script causes ccsd to allocate more and more
memory until it gets killed by the OOM killer.

while true 
do 
  sleep 1 
  cman_tool leave 
  sleep 1 
  cman_tool join 
done 

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1. Run above script
2. Run top in aother window and press M to watch ccsd climb up the list
3.
    

Additional info:

Comment 1 Christine Caulfield 2004-09-13 10:18:29 UTC
One interesting additional piece of information: If I "cman_tool
leave" the cluster and leave ccsd running, it continues to allocate
memory (according to top). When I join thew cluster again it settles
down again.


Comment 2 Jonathan Earl Brassow 2004-09-13 20:52:27 UTC
At least one memory leak is located in ccs/daemon/misc.c:get_cluster_name

The normal operation does not free the xml structures.

Comment 3 Jonathan Earl Brassow 2004-09-13 21:55:27 UTC
should be fixed... the above was all I found.

Comment 4 Christine Caulfield 2004-09-14 07:49:30 UTC
That helps.

It gets rid of the leak when the node is not a cluster member. But
when doing the loop test there is still a small leak coming from
somewhere.

Comment 5 Jonathan Earl Brassow 2004-09-14 15:08:11 UTC
Firstly, does the cluster remain quorate when you are doing "the loop test"?


Valgrind seems to indicate that this is in the ld library?

Note that many of these gripes are simply state that is held until exit - and 
therefore, not the memory leak.

==21570== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 5122 
from 1)
==21570== malloc/free: in use at exit: 83777 bytes in 2557 blocks.
==21570== malloc/free: 7430188 allocs, 7427631 frees, 497194386 bytes 
allocated.
==21570== For counts of detected errors, rerun with: -v
==21570== searching for pointers to 2557 not-freed blocks.
==21570== checked 2893752 bytes.
==21570== 
==21570== 8 bytes in 1 blocks are still reachable in loss record 1 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0xC544EF: strdup (in /lib/tls/libc-2.3.3.so)
==21570==    by 0x8052F1E: get_cluster_name (misc.c:136)
==21570==    by 0x804E328: process_connect (cnx_mgr.c:586)
==21570== 
==21570== 
==21570== 8 bytes in 1 blocks are still reachable in loss record 2 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x80501C0: process_request (cnx_mgr.c:1071)
==21570==    by 0x8049F0D: main (ccsd.c:195)
==21570== 
==21570== 
==21570== 8 bytes in 1 blocks are still reachable in loss record 3 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x8050565: process_broadcast (cnx_mgr.c:1211)
==21570==    by 0x804A04F: main (ccsd.c:201)
==21570== 
==21570== 
==21570== 12 bytes in 1 blocks are still reachable in loss record 4 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x625822: xmlHashCreate (in /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x64C1EA: xmlXPathNewContext (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x8052D78: get_cluster_name (misc.c:109)
==21570== 
==21570== 
==21570== 20 bytes in 1 blocks are still reachable in loss record 5 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x804C606: broadcast_for_doc (cnx_mgr.c:235)
==21570==    by 0x804DEF0: process_connect (cnx_mgr.c:638)
==21570==    by 0x804FCA7: process_request (cnx_mgr.c:1088)
==21570== 
==21570== 
==21570== 20 bytes in 1 blocks are still reachable in loss record 6 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x804FC53: process_request (cnx_mgr.c:1055)
==21570==    by 0x8049F0D: main (ccsd.c:195)
==21570== 
==21570== 
==21570== 24 bytes in 1 blocks are still reachable in loss record 7 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x647245: (within /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x64C2ED: xmlXPathNewParserContext (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x657C39: xmlXPathEvalExpression (in /usr/lib/
libxml2.so.2.6.8)
==21570== 
==21570== 
==21570== 24 bytes in 2 blocks are still reachable in loss record 8 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x649978: xmlXPathNodeSetCreate (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x64A422: xmlXPathNewNodeSet (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x64E9A9: xmlXPathRoot (in /usr/lib/libxml2.so.2.6.8)
==21570== 
==21570== 
==21570== 26 bytes in 1 blocks are still reachable in loss record 9 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x804A79E: parse_cli_args (ccsd.c:282)
==21570==    by 0x8049AAA: main (ccsd.c:55)
==21570== 
==21570== 
==21570== 26 bytes in 1 blocks are still reachable in loss record 10 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x804A777: parse_cli_args (ccsd.c:281)
==21570==    by 0x8049AAA: main (ccsd.c:55)
==21570== 
==21570== 
==21570== 32 bytes in 2 blocks are still reachable in loss record 11 of 36
==21570==    at 0x1B9033FD: calloc (vg_replace_malloc.c:176)
==21570==    by 0xD0C308: _dlerror_run (in /lib/libdl-2.3.3.so)
==21570==    by 0xD0BED0: dlsym (in /lib/libdl-2.3.3.so)
==21570==    by 0x1B91B61E: open64 (vg_libpthread.c:2331)
==21570== 
==21570== 
==21570== 40 bytes in 1 blocks are still reachable in loss record 12 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x6499A6: xmlXPathNodeSetCreate (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x64A422: xmlXPathNewNodeSet (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x64E9A9: xmlXPathRoot (in /usr/lib/libxml2.so.2.6.8)
==21570== 
==21570== 
==21570== 40 bytes in 1 blocks are still reachable in loss record 13 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x64A3F8: xmlXPathNewNodeSet (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x64E9A9: xmlXPathRoot (in /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x65643C: (within /usr/lib/libxml2.so.2.6.8)
==21570== 
==21570== 
==21570== 40 bytes in 1 blocks are still reachable in loss record 14 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x65753E: (within /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x657C43: xmlXPathEvalExpression (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x8052D93: get_cluster_name (misc.c:116)
==21570== 
==21570== 
==21570== 40 bytes in 1 blocks are still reachable in loss record 15 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x804E4A1: process_connect (cnx_mgr.c:532)
==21570==    by 0x804FCA7: process_request (cnx_mgr.c:1088)
==21570==    by 0x8049F0D: main (ccsd.c:195)
==21570== 
==21570== 
==21570== 40 bytes in 2 blocks are still reachable in loss record 16 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x1B94502C: clist_insert (clist.c:69)
==21570==    by 0x1B9497C0: msg_listen (message.c:552)
==21570==    by 0x8051B46: cluster_communicator (cluster_mgr.c:298)
==21570== 
==21570== 
==21570== 44 bytes in 1 blocks are still reachable in loss record 17 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x64C2C8: xmlXPathNewParserContext (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x657C39: xmlXPathEvalExpression (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x8052D93: get_cluster_name (misc.c:116)
==21570== 
==21570== 
==21570== 48 bytes in 2 blocks are still reachable in loss record 18 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x625FB1: xmlHashAddEntry3 (in /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x625C89: xmlHashAddEntry2 (in /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x64AED4: xmlXPathRegisterFuncNS (in /usr/lib/
libxml2.so.2.6.8)
==21570== 
==21570== 
==21570== 48 bytes in 2 blocks are still reachable in loss record 19 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x66B392: xmlNewMutex (in /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x66A508: xmlInitGlobals (in /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x61B70F: xmlInitParser (in /usr/lib/libxml2.so.2.6.8)
==21570== 
==21570== 
==21570== 68 bytes in 1 blocks are possibly lost in loss record 20 of 36
==21570==    at 0x1B9033FD: calloc (vg_replace_malloc.c:176)
==21570==    by 0x1B8F1E38: _dl_allocate_tls_storage (in /lib/ld-2.3.3.so)
==21570==    by 0x1B8F26A8: __GI__dl_allocate_tls (in /lib/ld-2.3.3.so)
==21570==    by 0x1B918550: pthread_create (vg_libpthread.c:1155)
==21570== 
==21570== 
==21570== 160 bytes in 8 blocks are still reachable in loss record 21 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x601C3B: xmlNewCharEncodingHandler (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x601D7D: xmlInitCharEncodingHandlers (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x61B71E: xmlInitParser (in /usr/lib/libxml2.so.2.6.8)
==21570== 
==21570== 
==21570== 176 bytes in 2 blocks are still reachable in loss record 22 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x61E403: xmlNewDoc (in /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x69E7AB: xmlSAX2StartDocument (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x61736D: xmlParseDocument (in /usr/lib/libxml2.so.2.6.8)
==21570== 
==21570== 
==21570== 196 bytes in 1 blocks are still reachable in loss record 23 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x64C195: xmlXPathNewContext (in /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x8052D78: get_cluster_name (misc.c:109)
==21570==    by 0x804CA0A: broadcast_for_doc (cnx_mgr.c:381)
==21570== 
==21570== 
==21570== 200 bytes in 1 blocks are still reachable in loss record 24 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x601CF5: xmlInitCharEncodingHandlers (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x61B71E: xmlInitParser (in /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x61AEEB: xmlSAXParseFileWithData (in /usr/lib/
libxml2.so.2.6.8)
==21570== 
==21570== 
==21570== 320 bytes in 1 blocks are still reachable in loss record 25 of 36
==21570==    at 0x1B9034EA: realloc (vg_replace_malloc.c:197)
==21570==    by 0x649E19: xmlXPathNodeSetAddUnique (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x6549EC: (within /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x6565B5: (within /usr/lib/libxml2.so.2.6.8)
==21570== 
==21570== 
==21570== 400 bytes in 1 blocks are still reachable in loss record 26 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x647270: (within /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x64C2ED: xmlXPathNewParserContext (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x657C39: xmlXPathEvalExpression (in /usr/lib/
libxml2.so.2.6.8)
==21570== 
==21570== 
==21570== 1232 bytes in 1 blocks are possibly lost in loss record 28 of 36
==21570==    at 0x1B9035B5: memalign (vg_replace_malloc.c:217)
==21570==    by 0x1B8F1DF1: _dl_allocate_tls_storage (in /lib/ld-2.3.3.so)
==21570==    by 0x1B8F26A8: __GI__dl_allocate_tls (in /lib/ld-2.3.3.so)
==21570==    by 0x1B918550: pthread_create (vg_libpthread.c:1155)
==21570== 
==21570== 
==21570== 1677 bytes in 1 blocks are still reachable in loss record 29 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x804C97A: broadcast_for_doc (cnx_mgr.c:365)
==21570==    by 0x804DEF0: process_connect (cnx_mgr.c:638)
==21570==    by 0x804FCA7: process_request (cnx_mgr.c:1088)
==21570== 
==21570== 
==21570== 2786 bytes in 427 blocks are still reachable in loss record 30 of 
36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x66D7E5: xmlStrndup (in /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x66D883: xmlStrdup (in /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x601C26: xmlNewCharEncodingHandler (in /usr/lib/
libxml2.so.2.6.8)
==21570== 
==21570== 
==21570== 4124 bytes in 1 blocks are possibly lost in loss record 31 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0xC700C7: opendir (in /lib/tls/libc-2.3.3.so)
==21570==    by 0x1B942708: clu_connect (global.c:63)
==21570==    by 0x8051BA3: cluster_communicator (cluster_mgr.c:310)
==21570== 
==21570== 
==21570== 4416 bytes in 92 blocks are still reachable in loss record 32 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x61F516: xmlNewNsProp (in /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x6A0486: (within /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x6A072F: xmlSAX2StartElementNs (in /usr/lib/
libxml2.so.2.6.8)
==21570== 
==21570== 
==21570== 4440 bytes in 74 blocks are still reachable in loss record 33 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x61FB51: xmlNewNode (in /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x61FCA8: xmlNewDocNode (in /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x6A097E: xmlSAX2StartElementNs (in /usr/lib/
libxml2.so.2.6.8)
==21570== 
==21570== 
==21570== 6144 bytes in 1 blocks are still reachable in loss record 34 of 36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x625844: xmlHashCreate (in /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x64C1EA: xmlXPathNewContext (in /usr/lib/
libxml2.so.2.6.8)
==21570==    by 0x8052D78: get_cluster_name (misc.c:109)
==21570== 
==21570== 
==21570== 13140 bytes in 219 blocks are still reachable in loss record 35 of 
36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x69FE7E: (within /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x6A0034: (within /usr/lib/libxml2.so.2.6.8)
==21570==    by 0x6A072F: xmlSAX2StartElementNs (in /usr/lib/
libxml2.so.2.6.8)
==21570== 
==21570== 
==21570== 43350 bytes in 1700 blocks are definitely lost in loss record 36 of 
36
==21570==    at 0x1B902A80: malloc (vg_replace_malloc.c:131)
==21570==    by 0x1B8E86E3: _dl_map_object (in /lib/ld-2.3.3.so)
==21570==    by 0xCDA113: dl_open_worker (in /lib/tls/libc-2.3.3.so)
==21570==    by 0x1B8EFB45: _dl_catch_error (in /lib/ld-2.3.3.so)
==21570== 
==21570== LEAK SUMMARY:
==21570==    definitely lost: 43350 bytes in 1700 blocks.
==21570==    possibly lost:   5424 bytes in 3 blocks.
==21570==    still reachable: 34603 bytes in 852 blocks.
==21570==         suppressed: 400 bytes in 2 blocks.


Comment 6 Christine Caulfield 2004-09-15 12:58:28 UTC
It's yer threads.

You either need to call pthread_join() on the thread when it exits, or
call pthread_detach() on it after creation so it can clean up after
itself.

I tried pthread_detach() and it gets rid of the leak for me, but I'll
let you decide which solution you prefer.

Comment 7 Jonathan Earl Brassow 2004-09-15 17:29:12 UTC
I added the pthread_detach()

I still see memory increasing when using valgrind, but otherwise, not...  maybe 
something in valgrind?

Anyway, I don't see the problem anymore.

Comment 8 Christine Caulfield 2004-09-16 07:57:49 UTC
Looks fine to me.

Comment 9 Kiersten (Kerri) Anderson 2004-11-16 19:10:29 UTC
Updating version to the right level in the defects.  Sorry for the storm.