Created attachment 331573 [details] valgrind reports Description of problem: I tested bug 482954 and I found another memory-leak. See attachment. Version-Release number of selected component (if applicable): condor-7.2.2-0.1 How reproducible: always Steps to Reproduce: see bug 482954 Actual results: Memory-leak found. Expected results: No memory-leak. Additional info: 7.2.2-0.1.el5 i386: ==24262== ==24262== Process terminating with default action of signal 2 (SIGINT) ==24262== at 0x40DF1DD: ___newselect_nocancel (in /lib/libc-2.5.so) ==24262== by 0x817C13F: DaemonCore::Driver() (in /usr/sbin/condor_schedd) ==24262== by 0x818483B: main (in /usr/sbin/condor_schedd) --24262-- Discarding syms at 0x425D000-0x4268000 in /lib/libnss_files-2.5.so due to munmap() ==24262== ==24262== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 35 from 1) --24262-- --24262-- supp: 35 Fedora-Core-6-hack3-ld25 ==24262== malloc/free: in use at exit: 468,074 bytes in 4,889 blocks. ==24262== malloc/free: 408,834 allocs, 403,945 frees, 1,036,918,136 bytes allocated. ==24262== ==24262== searching for pointers to 4,889 not-freed blocks. ==24262== checked 1,328,720 bytes. ==24262== ==24262== 56 bytes in 1 blocks are definitely lost in loss record 113 of 202 ==24262== at 0x40053C0: malloc (vg_replace_malloc.c:149) ==24262== by 0xD20695: soap_init (in /usr/lib/libgsoapssl++.so.0.0.0) ==24262== by 0xD207F7: soap::soap() (in /usr/lib/libgsoapssl++.so.0.0.0) ==24262== by 0x8175173: DaemonCore::reconfig() (in /usr/sbin/condor_schedd) ==24262== by 0x818474B: main (in /usr/sbin/condor_schedd) ==24262== ==24262== ==24262== 516 bytes in 1 blocks are possibly lost in loss record 173 of 202 ==24262== at 0x40057F5: operator new[](unsigned) (vg_replace_malloc.c:195) ==24262== by 0x81C3D96: ExtArray<RuntimeConfigItem>::ExtArray(int) (in /usr/sbin/condor_schedd) ==24262== by 0x81BE7F8: (within /usr/sbin/condor_schedd) ==24262== by 0x81BE850: (within /usr/sbin/condor_schedd) ==24262== by 0x827FC3A: (within /usr/sbin/condor_schedd) ==24262== by 0x80E0B28: (within /usr/sbin/condor_schedd) ==24262== by 0x827FBC8: __libc_csu_init (in /usr/sbin/condor_schedd) ==24262== by 0x402AE30: (below main) (in /lib/libc-2.5.so) ==24262== ==24262== LEAK SUMMARY: ==24262== definitely lost: 56 bytes in 1 blocks. ==24262== possibly lost: 516 bytes in 1 blocks. ==24262== still reachable: 467,502 bytes in 4,887 blocks. ==24262== suppressed: 0 bytes in 0 blocks. ==24262== Reachable blocks (those to which a pointer was found) are not shown. ==24262== To see them, rerun with: --show-reachable=yes --24262-- memcheck: sanity checks: 9192 cheap, 368 expensive --24262-- memcheck: auxmaps: 0 auxmap entries (0k, 0M) in use --24262-- memcheck: auxmaps: 0 searches, 0 comparisons --24262-- memcheck: SMs: n_issued = 5623 (89968k, 87M) --24262-- memcheck: SMs: n_deissued = 5491 (87856k, 85M) --24262-- memcheck: SMs: max_noaccess = 65535 (1048560k, 1023M) --24262-- memcheck: SMs: max_undefined = 1 (16k, 0M) --24262-- memcheck: SMs: max_defined = 136 (2176k, 2M) --24262-- memcheck: SMs: max_non_DSM = 137 (2192k, 2M) --24262-- memcheck: max sec V bit nodes: 8 (0k, 0M) --24262-- memcheck: set_sec_vbits8 calls: 97 (new: 8, updates: 89) --24262-- memcheck: max shadow mem size: 2496k, 2M --24262-- translate: fast SP updates identified: 37,037 ( 90.4%) --24262-- translate: generic_known SP updates identified: 2,196 ( 5.3%) --24262-- translate: generic_unknown SP updates identified: 1,726 ( 4.2%) --24262-- tt/tc: 2,506,009 tt lookups requiring 2,898,943 probes --24262-- tt/tc: 2,506,009 fast-cache updates, 4 flushes --24262-- transtab: new 16,800 (419,415 -> 7,036,504; ratio 167:10) [0 scs] --24262-- transtab: dumped 0 (0 -> ??) --24262-- transtab: discarded 171 (3,070 -> ??) --24262-- scheduler: 919,221,331 jumps (bb entries). --24262-- scheduler: 9,192/6,315,618 major/minor sched events. --24262-- sanity: 9193 cheap, 368 expensive checks. --24262-- exectx: 30,011 lists, 5,240 contexts (avg 0 per list) --24262-- exectx: 812,734 searches, 809,704 full compares (996 per 1000) --24262-- exectx: 167,368 cmp2, 79 cmp4, 0 cmpAll
Created attachment 358137 [details] Patch for daemon_core memory leak relating to soap_init When traversing the documentation for gsoap I had noticed that initialization differs for heap allocated objects. Excerpt from page 17-18: My Comments >>>>>: For example, the following code stack-allocates the runtime environment which is used for multiple remote method calls: int main() { struct soap soap; ... soap init(&soap); // initialize runtime environment ... 17 soap call ns method1(&soap, ...); // make a remote call ... call ns method2(&soap, ...); // make another remote call soap ... destroy(&soap); // remove deserialized class instances (C++ only) soap soap end(&soap); // clean up and remove deserialized data soap done(&soap); // detach environment (last use and no longer in scope) ... } The runtime environment can also be heap allocated: >>>>: This is the interesting point, when changed, memory leaks are cleaned up int main() { struct soap *soap; ... soap = soap new(); // allocate and initialize runtime environment if (!soap) // couldn’t allocate: stop ... soap call ns method1(soap, ...); // make a remote call ... soap call ns method2(soap, ...); // make another remote call ... soap destroy(soap); // remove deserialized class instances (C++ only) soap end(soap); // clean up and remove deserialized data soap free(soap); // detach and free runtime environment }
The use of soap_destroy/end/free looks good as well as soap_new instead of new. The reconfig still needs to call init_soap(), which is defined in soap_core.cpp. It does all the meaningful setup. Actually, the original code looks to never re-create the soap struct on reconfig, which is wrong because it forces a restart to change any soap specific config.
Created attachment 358160 [details] Updated patch per matts comments I had talked with some folks here, and updated the patch per matts previous comments. Likely fixes a couple of potential bugs, due to the reconfig.
Looks good. I'd add a comment near the "//soap_init(soap);" explaining why it is commented out and shouldn't be done anymore.
http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=686
Fix should be in 7.4.0-0.1
Tested on RHEL5.4 condor-7.4.0-0.5 i386/x86_64 and RHEL4.8 condor-7.4.0-0.4 i386/x86_64 and it works without memory leaks(valgrind result) --> VERIFIED
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: please see bug summary.
Can someone please verify my interpretation of this bug? Thanks, LKB
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1,8 @@ -please see bug summary.+Grid bug fix + +C: A daemon is reconfigured +C: A memory leak appeared in SOAP +F: SOAP is now initialized in a different way +R: The memory leak no longer appears + +NEEDS FACT CHECKING BEFORE ADDING TO RELNOTES
From IRC: <tstclair> Lana, re 485090, yes that is correct <Lana> tstclair: thanks LKB
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -5,4 +5,4 @@ F: SOAP is now initialized in a different way R: The memory leak no longer appears -NEEDS FACT CHECKING BEFORE ADDING TO RELNOTES+When a daemon was reconfigured, a memory leak appeared in SOAP. SOAP is now initialized in a different way and the memory leak no longer appears.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHEA-2009-1633.html