Bug 1359975
Summary: | pmchart run-away mem leak replaying multi-archive when rewinding | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Mark Goodwin <mgoodwin> | ||||||||
Component: | pcp | Assignee: | Dave Brolley <brolley> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | urgent | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 24 | CC: | brolley, fche, lberk, mbenitez, mgoodwin, nathans, pcp, scox | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | 3.11.7-1 pcp-3.11.7-1.fc25 pcp-3.11.7-1.fc24 pcp-3.11.7-1.el5 | Doc Type: | If docs needed, set a value | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2017-01-02 19:50:59 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Mark Goodwin
2016-07-26 04:00:50 UTC
This is still easily reproducible with pmchart from pcp-3.11.6 and latest from the current 3.11.7, as per the Steps to Reproduce in Comment #1 I re-tested since pmchart has had some changes recently. I guess maybe the priority should be bumped up a bit since it basically kills your desktop if you hit it (until the oom killer steps in to stop it). Here's an ltrace of the repeating library call sequence - this repeats until the system eventually runs out of memory and the pmchart process is oom/killed 10440 pmFetch(7, 0x11e8866b0, 0x7ffc4a5b02c0, 7) = 0 10440 QVectorData::free(QVectorData*, int)(0x11e8866a0, 8, 0x149c400, 0) = 0 10440 __pmtimevalSub(0x3616fa0, 0x3616fb0, 0x62b2c, 0) = 0 10440 pmFreeResult(0x11e8866e0, 0x7ffc4a5b01c0, 7, 0) = 0 10440 pmUseContext(0, 1, 0x3616fd0, 0) = 0 10440 pmCtime(0x7ffc4a5b0358, 0x7677a0, 0x7f89f42141e0, 0) = 0x7677a0 10440 strlen("Mon Jul 18 13:53:12 2016\n") = 25 10440 qQNaN()(1, 0xffffcfc1, 1, 0x3685400) = 0x3689d10 10440 memmove(0x42c1560, "DDDDD\304$@\0\0\0\0\0\0\370\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42c1560 10440 memmove(0x42c9570, "\256&\026;\264\242\320?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42c9570 10440 qQNaN()(1, 0xffffcfc1, 1, 0x3685400) = 0x36aa820 10440 memmove(0x42d1580, "DDDDDD\020@\0\0\0\0\0\0\370\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42d1580 10440 memmove(0x370b360, "\342\306o\264\243&\327?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x370b360 10440 qQNaN()(1, 0xffffcfc1, 1, 0x3685400) = 0x36a7340 10440 memmove(0x42d9590, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\370\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42d9590 10440 memmove(0x36f0bc0, "\342\306o\264\243&\327?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x36f0bc0 10440 qQNaN()(1, 0xffffcfc1, 1, 0x3685400) = 0x36ba4d0 10440 memmove(0x3713380, "\021\021\021\021\021\021\221?\0\0\0\0\0\0\370\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x3713380 10440 memmove(0x42e15a0, "\227\310}\253y-\327?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42e15a0 10440 qQNaN()(1, 0xffffcfc1, 1, 0x3685400) = 0x36bfe20 10440 memmove(0x42e95b0, "UUUUUU"@\0\0\0\0\0\0\370\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42e95b0 10440 memmove(0x371b3a0, "`\271\3129\225\356\342?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x371b3a0 10440 qQNaN()(1, 0xffffcfc1, 1, 0x3685400) = 0x36caeb0 10440 memmove(0x42f15c0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\370\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42f15c0 10440 memmove(0x36f8be0, "`\271\3129\225\356\342?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x36f8be0 10440 qQNaN()(1, 0xffffcfc1, 1, 0x3685400) = 0x36c7730 10440 memmove(0x42f95d0, "~\261\344\027\v\006\257@\0\0\0\0\0\0\370\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42f95d0 10440 memmove(0x37233c0, "\0\0\0\0\0\0Y@\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x37233c0 10440 qIsNaN(double)(0x366f350, 0xffffffff, 1, 5) = 0 10440 qIsNaN(double)(0x36a8610, 0xffffffff, 1, 0) = 0 10440 qIsNaN(double)(0x36b3b70, 0xffffffff, 1, 0) = 0 10440 qIsNaN(double)(0x36bb680, 0xffffffff, 1, 0) = 0 10440 qIsNaN(double)(0x36c1f70, 0xffffffff, 1, 0) = 0 10440 qIsNaN(double)(0x36ca620, 0xffffffff, 1, 0) = 0 10440 qIsNaN(double)(0x36d1a70, 0xffffffff, 1, 0) = 0 10440 qIsNaN(double)(0x366f350, 0xffffffff, 1, 0) = 0 10440 qIsNaN(double)(0x36a8610, 0xffffffff, 1, 1) = 0 10440 qIsNaN(double)(0x36b3b70, 0xffffffff, 1, 1) = 0 10440 qIsNaN(double)(0x36bb680, 0xffffffff, 1, 1) = 0 10440 qIsNaN(double)(0x36c1f70, 0xffffffff, 1, 1) = 0 10440 qIsNaN(double)(0x36ca620, 0xffffffff, 1, 1) = 0 10440 qIsNaN(double)(0x36d1a70, 0xffffffff, 1, 1) = 0 10440 __pmtimevalFromReal(0x7ffc4a5b03d0, 1, 1, 1) = 0 10440 pmUseContext(0, 1, 0x3616f60, 0x7ffc4a5b0320) = 0 10440 pmSetMode(0x1030001, 0x7ffc4a5b03d0, 0xfffffda8, 0) = 0 10440 pmUseContext(0, 0, 0x3616fd0, 0) = 0 10440 pmCtime(0x7ffc4a5b0358, 0x7677a0, 0x7f89f42141e0, 0) = 0x7677a0 10440 strlen("Mon Jul 18 13:43:12 2016\n") = 25 10440 pmUseContext(0, 0xffffcfc1, 0, 0) = 0 10440 QVectorData::allocate(int, int)(44, 8, 0, 0) = 0x11f80d620 10440 qMemSet(void*, int, unsigned long)(0x11f80d630, 0, 28, 0) = 0x11f80d630 -- pause, then the cycle repeats 10440 pmFetch(7, 0x11f80d630, 0x7ffc4a5b02c0, 7) = 0 10440 QVectorData::free(QVectorData*, int)(0x11f80d620, 8, 0x149c400, 0) = 0 10440 __pmtimevalSub(0x3616fa0, 0x3616fb0, 0x62b2c, 0) = 0 10440 pmExtractValue(1, 0x11ec7bf30, 3, 0x7ffc4a5b01d0) = 0 10440 pmExtractValue(1, 0x11ec7c1b0, 3, 0x7ffc4a5b01d0) = 0 10440 pmExtractValue(1, 0x11ec7c200, 3, 0x7ffc4a5b01d0) = 0 10440 pmExtractValue(1, 0x11ec7c250, 3, 0x7ffc4a5b01d0) = 0 10440 pmExtractValue(1, 0x11ec7c2a0, 3, 0x7ffc4a5b01d0) = 0 10440 pmExtractValue(1, 0x11ec7c2f0, 3, 0x7ffc4a5b01d0) = 0 10440 pmExtractValue(1, 0x11ec7c340, 3, 0x7ffc4a5b01d0) = 0 10440 pmFreeResult(0x11f80d660, 0x7f89f3ffa6d8, 7, 0) = 0 10440 pmUseContext(0, 1, 0x3616fd0, 0) = 0 10440 pmCtime(0x7ffc4a5b0358, 0x7677a0, 0x7f89f42141e0, 0) = 0x7677a0 10440 strlen("Mon Jul 18 13:43:12 2016\n") = 25 10440 qQNaN()(1, 0xffffcfc1, 1, 0x3685400) = 0x3689d10 10440 memmove(0x42c1560, "\0\0\0\0\0\0\370\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42c1560 10440 memmove(0x42c9570, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42c9570 10440 qQNaN()(1, 0xffffcfc1, 1, 0x3685400) = 0x36aa820 10440 memmove(0x42d1580, "\0\0\0\0\0\0\370\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42d1580 10440 memmove(0x370b360, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x370b360 10440 qQNaN()(1, 0xffffcfc1, 1, 0x3685400) = 0x36a7340 10440 memmove(0x42d9590, "\0\0\0\0\0\0\370\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42d9590 10440 memmove(0x36f0bc0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x36f0bc0 10440 qQNaN()(1, 0xffffcfc1, 1, 0x3685400) = 0x36ba4d0 10440 memmove(0x3713380, "\0\0\0\0\0\0\370\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x3713380 10440 memmove(0x42e15a0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42e15a0 10440 qQNaN()(1, 0xffffcfc1, 1, 0x3685400) = 0x36bfe20 10440 memmove(0x42e95b0, "\0\0\0\0\0\0\370\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42e95b0 10440 memmove(0x371b3a0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x371b3a0 10440 qQNaN()(1, 0xffffcfc1, 1, 0x3685400) = 0x36caeb0 10440 memmove(0x42f15c0, "\0\0\0\0\0\0\370\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42f15c0 10440 memmove(0x36f8be0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x36f8be0 10440 qQNaN()(1, 0xffffcfc1, 1, 0x3685400) = 0x36c7730 10440 memmove(0x42f95d0, "\0\0\0\0\0\0\370\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x42f95d0 10440 memmove(0x37233c0, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 25536) = 0x37233c0 10440 qIsNaN(double)(0x366f350, 0xffffffff, 1, 5) = 0 10440 qIsNaN(double)(0x36a8610, 0xffffffff, 1, 0) = 0 10440 qIsNaN(double)(0x36b3b70, 0xffffffff, 1, 0) = 0 10440 qIsNaN(double)(0x36bb680, 0xffffffff, 1, 0) = 0 10440 qIsNaN(double)(0x36c1f70, 0xffffffff, 1, 0) = 0 10440 qIsNaN(double)(0x36ca620, 0xffffffff, 1, 0) = 0 10440 qIsNaN(double)(0x36d1a70, 0xffffffff, 1, 0) = 0 10440 qIsNaN(double)(0x366f350, 0xffffffff, 1, 0) = 0 10440 qIsNaN(double)(0x36a8610, 0xffffffff, 1, 1) = 0 10440 qIsNaN(double)(0x36b3b70, 0xffffffff, 1, 1) = 0 10440 qIsNaN(double)(0x36bb680, 0xffffffff, 1, 1) = 0 10440 qIsNaN(double)(0x36c1f70, 0xffffffff, 1, 1) = 0 10440 qIsNaN(double)(0x36ca620, 0xffffffff, 1, 1) = 0 10440 qIsNaN(double)(0x36d1a70, 0xffffffff, 1, 1) = 0 10440 __pmtimevalFromReal(0x7ffc4a5b03d0, 1, 1, 1) = 0 10440 pmUseContext(0, 1, 0x3616f60, 0x7ffc4a5b0320) = 0 10440 pmSetMode(0x1030001, 0x7ffc4a5b03d0, 0xfffffda8, 0) = 0 10440 pmUseContext(0, 0, 0x3616fd0, 0) = 0 10440 pmCtime(0x7ffc4a5b0358, 0x7677a0, 0x7f89f42141e0, 0) = 0x7677a0 10440 strlen("Mon Jul 18 13:33:12 2016\n") = 25 10440 pmUseContext(0, 0, 0, 0) = 0 10440 QVectorData::allocate(int, int)(44, 8, 0, 0) = 0x120794690 10440 qMemSet(void*, int, unsigned long)(0x1207946a0, 0, 28, 0) = 0x1207946a0 .. continues to repeat, as above env PCP_DEDBUG=14336 pmchart ... indicates that once you hit the backwards button, the app disregards stop, and keeps on chugging. One more bit of data: % build a copy of pmchart with debuginfo in it, if able (pcp configure --with-qt=debug one way) % gdb -args pmchart .... (gdb) run ... until popping up the time selector ^C (gdb) break mmap (gdb) cont #0 0x00002aaaad2ed3e0 in __mmap (addr=addr@entry=0x0, len=len@entry=1048576, prot=prot@entry=3, flags=flags@entry=34, fd=fd@entry=-1, offset=offset@entry=0) at ../sysdeps/unix/sysv/linux/wordsize-64/mmap.c:33 #1 0x00002aaaad26f070 in sysmalloc (nb=nb@entry=2032, av=0x2aaaad5acb00 <main_arena>) at malloc.c:2515 #2 0x00002aaaad26fa44 in _int_malloc (av=av@entry=0x2aaaad5acb00 <main_arena>, bytes=bytes@entry=2024) at malloc.c:3825 #3 0x00002aaaad2710b0 in __GI___libc_malloc (bytes=2024) at malloc.c:2911 #4 0x00002aaaaacfed0a in __pmLogLoadMeta (lcp=lcp@entry=0xb43790) at logmeta.c:345 #5 0x00002aaaaad03320 in __pmLogOpen (name=name@entry=0xaa76c0 "goody/20160719.12.55", ctxp=ctxp@entry=0xa08070) at logutil.c:1086 #6 0x00002aaaaace4349 in __pmFindOrOpenArchive (ctxp=ctxp@entry=0xa08070, name=0xaa76c0 "goody/20160719.12.55", multi_arch=multi_arch@entry=1) at context.c:555 #7 0x00002aaaaad03779 in __pmLogChangeArchive (ctxp=ctxp@entry=0xa08070, arch=arch@entry=1) at logutil.c:2829 #8 0x00002aaaaad053c6 in __pmLogSetTime (ctxp=ctxp@entry=0xa08070) at logutil.c:2292 #9 0x00002aaaaace78b8 in pmSetMode (mode=16973825, when=0x7fffffffcf90, delta=-600) at fetch.c:264 #10 0x00000000004a38a3 in QmcGroup::setArchiveMode(int, timeval const*, int) () #11 0x000000000048729e in GroupControl::adjustArchiveWorldViewBackward(QmcTime::Packet*, bool) (this=0xa07e70, packet=0xaad9c0, setup=false) at groupcontrol.cpp:478 ... and try a (gdb)cont a few times; many such requests come in this way. This seems to implicate libpcp multi-archive mode time scanning. (In reply to Frank Ch. Eigler from comment #4) > > This seems to implicate libpcp multi-archive mode time scanning. yes, and this more or less agrees with the repeating library call sequence in Comment #2, which is leaking memory on every iteration. It only occurs in multi-archive mode. Thanks -- Mark Created attachment 1225074 [details]
possible reproducer
this little script seems to access the archive in a similar idiosyncratic way as pmchart does (pmSetMode calls between pmFetch). Needs the sample multi-archive directory under /tmp/goody.
% valgrind --leak-check=full FOO.py
After a few seconds' run, notice how it's getting slower and slower and slower. Then ^C, valgrind reports:
[...]
==24643== 206,400 bytes in 10,320 blocks are definitely lost in loss record 2,497 of 2,497
==24643== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==24643== by 0xDCCFA72: __pmLogLoadMeta (logmeta.c:143)
==24643== by 0xDCD431F: __pmLogOpen (logutil.c:1086)
==24643== by 0xDCB5348: __pmFindOrOpenArchive (context.c:555)
==24643== by 0xDCD4778: __pmLogChangeArchive (logutil.c:2829)
==24643== by 0xDCD63C5: __pmLogSetTime (logutil.c:2292)
==24643== by 0xDCB88B7: pmSetMode (fetch.c:264)
==24643== by 0x11A20C57: ffi_call_unix64 (unix64.S:76)
==24643== by 0x11A206B9: ffi_call (ffi64.c:525)
==24643== by 0x1180D67E: _call_function_pointer (callproc.c:837)
==24643== by 0x1180D67E: _ctypes_callproc (callproc.c:1180)
==24643== by 0x11807371: PyCFuncPtr_call (_ctypes.c:3954)
==24643== by 0x4E81FD2: PyObject_Call (abstract.c:2546)
% python FOO.py
[... runs ... and bloats memory more and more and more ...]
Created attachment 1225100 [details]
fix for one small leak
One leak down with this patch.
One monster leak found with valgrind on the python reproducer, with a SIGUSR1 interrupt (to preclude libpcp cleanup on context creation):
==26053== 38,135,288 bytes in 5,072 blocks are still reachable in loss record 3,616 of 3,616
==26053== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299)
==26053== by 0xDCD0FD7: __pmLogLoadMeta (logmeta.c:313)
==26053== by 0xDCD567F: __pmLogOpen (logutil.c:1086)
==26053== by 0xDCB66B8: __pmFindOrOpenArchive (context.c:555)
==26053== by 0xDCD5AD8: __pmLogChangeArchive (logutil.c:2829)
==26053== by 0xDCD5EC9: __pmLogChangeToPreviousArchive (logutil.c:3010)
==26053== by 0xDCD645C: __pmLogRead (logutil.c:1617)
==26053== by 0xDCD9DE2: cache_read (interp.c:230)
==26053== by 0xDCDC2A7: __pmLogFetchInterp (interp.c:1034)
==26053== by 0xDCD7609: __pmLogFetch (logutil.c:1944)
==26053== by 0xDCB9664: pmFetch (fetch.c:154)
==26053== by 0xDCBBA61: pmFetchGroup (fetchgroup.c:1559)
[etc]
What's happening is that as interp.c and pals scan back and forth in time across the archives, a new copies of the entire instance domains are constantly allocated and saved via addindom(). addindom() does not perform any duplicate elimination over e.g. idp->stamp, or the contents of the indom, so these copies pile up in memory at an impressive pace.
Heh, I just arrived at exactly the same patch :) It fixes the original pmchart issue but your FOO.py repro still seems to leak (another python gc issue maybe??) (In reply to Mark Goodwin from comment #8) > [...] (another python gc issue maybe??) Nope, see above re. monster leak: a design problem in (multi-?) archive work. indoms are being repeatedly stored in memory. yes you are correct - it doesn't fix the original pmchart issue (but it does fix a leak nevertheless, so at least some progress has been made). # pmchart -z -a goody -t 10m -O-0 -s 400 -v 400 -geometry 800x450 -c CPU (where 'goody' is a directory containing multiple archives) The 'monster' indom leak is going to need some careful thought ... Cheers Created attachment 1227374 [details]
Proposed patch for indom duplication
Here is a proposed patch for filtering out the duplicate indoms. It stops the runaway memory growth, but pm chart still ends up in a loop reading from the multi-archive and remains unresposive.
Hi Dave, I applied your patch from Comment #12 but it didn't go too well. Looks like an invalid free at __pmLogLoadMeta (logmeta.c:434), which is : 430 /* 431 * This pmid is already known, and matches. We can free the newly 432 * read copy and use the one in the hash table. 433 */ 434 free(dp); 435 dp = olddp; i.e. your new patch seems to be incompatible with the following commit which is in my tree but maybe not in yours : commit 6ed4b2452ff24356f90d37ac658cdbae4053e8c0 Author: Frank Ch. Eigler <fche> Date: Thu Dec 1 15:03:01 2016 +1100 libpcp: plug a small mem leak in __pmLogLoadMeta() Plug a leaking pmDesc in __pmLogLoadMeta() for the case where the pmid is already in the hash table. Partial fix for RHBZ#1359975. There are still other leaks involving multi-archive replay remaining for this bug. Should we just back-out that change instead since the code has now been re-worked? $ valgrind pmchart -a goody -O-0 -c CPU ==25136== Memcheck, a memory error detector ==25136== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al. ==25136== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info ==25136== Command: pmchart -a goody -O-0 -c CPU ==25136== ==25136== Conditional jump or move depends on uninitialised value(s) ==25136== at 0x4C2CD0D: free (vg_replace_malloc.c:530) ==25136== by 0x4E65E7A: __pmLogLoadMeta (logmeta.c:434) ==25136== by 0x4E6A43F: __pmLogOpen (logutil.c:1086) ==25136== by 0x4E4B348: __pmFindOrOpenArchive (context.c:555) ==25136== by 0x4E4CAF1: initarchive (context.c:759) ==25136== by 0x4E4CAF1: pmNewContext (context.c:1126) ==25136== by 0x4A603B: QmcSource::retryConnect(int, QString&) (in /usr/bin/pmchart) ==25136== by 0x4A6F16: QmcSource::QmcSource(int, QString&, int) (in /usr/bin/pmchart) ==25136== by 0x4A71C6: QmcSource::getSource(int, QString&, int, bool) (in /usr/bin/pmchart) ==25136== by 0x49C1B1: QmcGroup::use(int, QString const&, int) (in /usr/bin/pmchart) ==25136== by 0x424DB0: main (in /usr/bin/pmchart) ==25136== ==25136== Invalid free() / delete / delete[] / realloc() ==25136== at 0x4C2CD5A: free (vg_replace_malloc.c:530) ==25136== by 0x4E65E7A: __pmLogLoadMeta (logmeta.c:434) ==25136== by 0x4E6A43F: __pmLogOpen (logutil.c:1086) ==25136== by 0x4E4B348: __pmFindOrOpenArchive (context.c:555) ==25136== by 0x4E4CAF1: initarchive (context.c:759) ==25136== by 0x4E4CAF1: pmNewContext (context.c:1126) ==25136== by 0x4A603B: QmcSource::retryConnect(int, QString&) (in /usr/bin/pmchart) ==25136== by 0x4A6F16: QmcSource::QmcSource(int, QString&, int) (in /usr/bin/pmchart) ==25136== by 0x4A71C6: QmcSource::getSource(int, QString&, int, bool) (in /usr/bin/pmchart) ==25136== by 0x49C1B1: QmcGroup::use(int, QString const&, int) (in /usr/bin/pmchart) ==25136== by 0x424DB0: main (in /usr/bin/pmchart) ==25136== Address 0x4e68af9 is in the Text segment of /usr/lib64/libpcp.so.3 ==25136== at 0x4E68AF9: __pmLogChkLabel (logutil.c:233) ==25136== ==25136== Invalid read of size 4 ==25136== at 0x4E65C01: __pmLogLoadMeta (logmeta.c:243) ==25136== by 0x4E6A43F: __pmLogOpen (logutil.c:1086) ==25136== by 0x4E4B348: __pmFindOrOpenArchive (context.c:555) ==25136== by 0x4E4CAF1: initarchive (context.c:759) ==25136== by 0x4E4CAF1: pmNewContext (context.c:1126) ==25136== by 0x4A603B: QmcSource::retryConnect(int, QString&) (in /usr/bin/pmchart) ==25136== by 0x4A6F16: QmcSource::QmcSource(int, QString&, int) (in /usr/bin/pmchart) ==25136== by 0x4A71C6: QmcSource::getSource(int, QString&, int, bool) (in /usr/bin/pmchart) ==25136== by 0x49C1B1: QmcGroup::use(int, QString const&, int) (in /usr/bin/pmchart) ==25136== by 0x424DB0: main (in /usr/bin/pmchart) ==25136== Address 0x1f35e374 is 4 bytes inside an unallocated block of size 112 in arena "client" ==25136== ==25136== Invalid read of size 4 ==25136== at 0x4E65C0E: __pmLogLoadMeta (logmeta.c:248) ==25136== by 0x4E6A43F: __pmLogOpen (logutil.c:1086) ==25136== by 0x4E4B348: __pmFindOrOpenArchive (context.c:555) ==25136== by 0x4E4CAF1: initarchive (context.c:759) ==25136== by 0x4E4CAF1: pmNewContext (context.c:1126) ==25136== by 0x4A603B: QmcSource::retryConnect(int, QString&) (in /usr/bin/pmchart) ==25136== by 0x4A6F16: QmcSource::QmcSource(int, QString&, int) (in /usr/bin/pmchart) ==25136== by 0x4A71C6: QmcSource::getSource(int, QString&, int, bool) (in /usr/bin/pmchart) ==25136== by 0x49C1B1: QmcGroup::use(int, QString const&, int) (in /usr/bin/pmchart) ==25136== by 0x424DB0: main (in /usr/bin/pmchart) ==25136== Address 0x1f35e37c is 12 bytes inside an unallocated block of size 112 in arena "client" ==25136== ==25136== Invalid read of size 4 ==25136== at 0x4E65C1B: __pmLogLoadMeta (logmeta.c:253) ==25136== by 0x4E6A43F: __pmLogOpen (logutil.c:1086) ==25136== by 0x4E4B348: __pmFindOrOpenArchive (context.c:555) ==25136== by 0x4E4CAF1: initarchive (context.c:759) ==25136== by 0x4E4CAF1: pmNewContext (context.c:1126) ==25136== by 0x4A603B: QmcSource::retryConnect(int, QString&) (in /usr/bin/pmchart) ==25136== by 0x4A6F16: QmcSource::QmcSource(int, QString&, int) (in /usr/bin/pmchart) ==25136== by 0x4A71C6: QmcSource::getSource(int, QString&, int, bool) (in /usr/bin/pmchart) ==25136== by 0x49C1B1: QmcGroup::use(int, QString const&, int) (in /usr/bin/pmchart) ==25136== by 0x424DB0: main (in /usr/bin/pmchart) ==25136== Address 0x1f35e378 is 8 bytes inside an unallocated block of size 112 in arena "client" ==25136== ==25136== Invalid read of size 1 ==25136== at 0x4E65C28: __pmLogLoadMeta (logmeta.c:258) ==25136== by 0x4E6A43F: __pmLogOpen (logutil.c:1086) ==25136== by 0x4E4B348: __pmFindOrOpenArchive (context.c:555) ==25136== by 0x4E4CAF1: initarchive (context.c:759) ==25136== by 0x4E4CAF1: pmNewContext (context.c:1126) ==25136== by 0x4A603B: QmcSource::retryConnect(int, QString&) (in /usr/bin/pmchart) ==25136== by 0x4A6F16: QmcSource::QmcSource(int, QString&, int) (in /usr/bin/pmchart) ==25136== by 0x4A71C6: QmcSource::getSource(int, QString&, int, bool) (in /usr/bin/pmchart) ==25136== by 0x49C1B1: QmcGroup::use(int, QString const&, int) (in /usr/bin/pmchart) ==25136== by 0x424DB0: main (in /usr/bin/pmchart) ==25136== Address 0x1f35e383 is 19 bytes inside an unallocated block of size 112 in arena "client" ==25136== ==25136== Invalid read of size 1 ==25136== at 0x4E65C36: __pmLogLoadMeta (logmeta.c:260) ==25136== by 0x4E6A43F: __pmLogOpen (logutil.c:1086) ==25136== by 0x4E4B348: __pmFindOrOpenArchive (context.c:555) ==25136== by 0x4E4CAF1: initarchive (context.c:759) ==25136== by 0x4E4CAF1: pmNewContext (context.c:1126) ==25136== by 0x4A603B: QmcSource::retryConnect(int, QString&) (in /usr/bin/pmchart) ==25136== by 0x4A6F16: QmcSource::QmcSource(int, QString&, int) (in /usr/bin/pmchart) ==25136== by 0x4A71C6: QmcSource::getSource(int, QString&, int, bool) (in /usr/bin/pmchart) ==25136== by 0x49C1B1: QmcGroup::use(int, QString const&, int) (in /usr/bin/pmchart) ==25136== by 0x424DB0: main (in /usr/bin/pmchart) ==25136== Address 0x1f35e382 is 18 bytes inside an unallocated block of size 112 in arena "client" ==25136== ==25136== Invalid read of size 1 ==25136== at 0x4E65C44: __pmLogLoadMeta (logmeta.c:262) ==25136== by 0x4E6A43F: __pmLogOpen (logutil.c:1086) ==25136== by 0x4E4B348: __pmFindOrOpenArchive (context.c:555) ==25136== by 0x4E4CAF1: initarchive (context.c:759) ==25136== by 0x4E4CAF1: pmNewContext (context.c:1126) ==25136== by 0x4A603B: QmcSource::retryConnect(int, QString&) (in /usr/bin/pmchart) ==25136== by 0x4A6F16: QmcSource::QmcSource(int, QString&, int) (in /usr/bin/pmchart) ==25136== by 0x4A71C6: QmcSource::getSource(int, QString&, int, bool) (in /usr/bin/pmchart) ==25136== by 0x49C1B1: QmcGroup::use(int, QString const&, int) (in /usr/bin/pmchart) ==25136== by 0x424DB0: main (in /usr/bin/pmchart) ==25136== Address 0x1f35e381 is 17 bytes inside an unallocated block of size 112 in arena "client" ==25136== ==25136== Invalid read of size 4 ==25136== at 0x4E65CA7: __pmLogLoadMeta (logmeta.c:349) ==25136== by 0x4E6A43F: __pmLogOpen (logutil.c:1086) ==25136== by 0x4E4B348: __pmFindOrOpenArchive (context.c:555) ==25136== by 0x4E4CAF1: initarchive (context.c:759) ==25136== by 0x4E4CAF1: pmNewContext (context.c:1126) ==25136== by 0x4A603B: QmcSource::retryConnect(int, QString&) (in /usr/bin/pmchart) ==25136== by 0x4A6F16: QmcSource::QmcSource(int, QString&, int) (in /usr/bin/pmchart) ==25136== by 0x4A71C6: QmcSource::getSource(int, QString&, int, bool) (in /usr/bin/pmchart) ==25136== by 0x49C1B1: QmcGroup::use(int, QString const&, int) (in /usr/bin/pmchart) ==25136== by 0x424DB0: main (in /usr/bin/pmchart) ==25136== Address 0x1f35e370 is 0 bytes inside an unallocated block of size 112 in arena "client" ==25136== ==25136== Invalid free() / delete / delete[] / realloc() ==25136== at 0x4C2CD5A: free (vg_replace_malloc.c:530) ==25136== by 0x4E68260: logFreePMNS (logutil.c:707) ==25136== by 0x4E6A393: __pmLogOpen (logutil.c:1107) ==25136== by 0x4E4B348: __pmFindOrOpenArchive (context.c:555) ==25136== by 0x4E4CAF1: initarchive (context.c:759) ==25136== by 0x4E4CAF1: pmNewContext (context.c:1126) ==25136== by 0x4A603B: QmcSource::retryConnect(int, QString&) (in /usr/bin/pmchart) ==25136== by 0x4A6F16: QmcSource::QmcSource(int, QString&, int) (in /usr/bin/pmchart) ==25136== by 0x4A71C6: QmcSource::getSource(int, QString&, int, bool) (in /usr/bin/pmchart) ==25136== by 0x49C1B1: QmcGroup::use(int, QString const&, int) (in /usr/bin/pmchart) ==25136== by 0x424DB0: main (in /usr/bin/pmchart) ==25136== Address 0x1f361640 is 0 bytes inside a block of size 20 free'd ==25136== at 0x4C2CD5A: free (vg_replace_malloc.c:530) ==25136== by 0x4E65C59: __pmLogLoadMeta (logmeta.c:273) ==25136== by 0x4E6A43F: __pmLogOpen (logutil.c:1086) ==25136== by 0x4E4B348: __pmFindOrOpenArchive (context.c:555) ==25136== by 0x4E4CAF1: initarchive (context.c:759) ==25136== by 0x4E4CAF1: pmNewContext (context.c:1126) ==25136== by 0x4A603B: QmcSource::retryConnect(int, QString&) (in /usr/bin/pmchart) ==25136== by 0x4A6F16: QmcSource::QmcSource(int, QString&, int) (in /usr/bin/pmchart) ==25136== by 0x4A71C6: QmcSource::getSource(int, QString&, int, bool) (in /usr/bin/pmchart) ==25136== by 0x49C1B1: QmcGroup::use(int, QString const&, int) (in /usr/bin/pmchart) ==25136== by 0x424DB0: main (in /usr/bin/pmchart) ==25136== Block was alloc'd at ==25136== at 0x4C2BBAD: malloc (vg_replace_malloc.c:299) ==25136== by 0x4E65B92: __pmLogLoadMeta (logmeta.c:208) ==25136== by 0x4E6A43F: __pmLogOpen (logutil.c:1086) ==25136== by 0x4E4B348: __pmFindOrOpenArchive (context.c:555) ==25136== by 0x4E4CAF1: initarchive (context.c:759) ==25136== by 0x4E4CAF1: pmNewContext (context.c:1126) ==25136== by 0x4A603B: QmcSource::retryConnect(int, QString&) (in /usr/bin/pmchart) ==25136== by 0x4A6F16: QmcSource::QmcSource(int, QString&, int) (in /usr/bin/pmchart) ==25136== by 0x4A71C6: QmcSource::getSource(int, QString&, int, bool) (in /usr/bin/pmchart) ==25136== by 0x49C1B1: QmcGroup::use(int, QString const&, int) (in /usr/bin/pmchart) ==25136== by 0x424DB0: main (in /usr/bin/pmchart) ==25136== I backed out 6ed4b2452ff24356f90d37ac658cdbae4053e8c0 from my tree and rebuilt just with Dave's patch - it's much better behaved, even works actually, doesn't spin nor get killed like it used to using the repro from Comment #0 I think this patch is a tremendous improvement. 6ed4b2452ff24 can be reverted since on inspection it's included in Dave's patch (hence the double free reported in previous comment) -- Mark Both leaks must be fixed in the final version. I'll review and make sure that's the case. Dave's current patch - a quick test shows it seems to work In the tree/branch: git://git.pcp.io/brolley/pcp rhbz1359975 commit 0015487d1fd1c70f94bb75b08109a13de81dc878 Author: Dave Brolley <brolley> Date: Wed Dec 7 11:50:40 2016 -0500 RHBZ 1359975: pmchart run-away mem leak replaying multi-archive when rewinding Two leaks were fixed. Both occurred during the transition between archives is a multi-archive context. 1) a leak of pmDesc * used to check for consistency of the PMNS between archives. 2) a build up of duplicate instance domains. Additional proposed commit (on top of the previous one): In the tree/branch: git://git.pcp.io/brolley/pcp rhbz1359975 commit 82e02ffea204d26e7617a4bcf4be0b9326c64457 Author: Dave Brolley <brolley> Date: Mon Dec 12 14:39:09 2016 -0500 RHBZ 1359975: pmchart run-away mem leak replaying multi-archive when rewinding Refinement. Exposed by qa regressions. Ensure that instance domains are in the correct order in the hash chain: - Primary sort by timestamp (descending) - Secondary sort: latest added at the head of each time slot. This includes moving duplicates to the head of their time slot when detected. Before multi-archive contexts and duplicate-filtering, this happened automatically. More commits for this bug: In the tree/branch: git://git.pcp.io/brolley/pcp rhbz1359975 commit e574b71de8420c719ae2924a64ca9a5a4411bd52 Author: Dave Brolley <brolley> Date: Wed Dec 14 11:10:45 2016 -0500 RHBZ 1359975: pmchart run-away mem leak replaying multi-archive when rewinding. Don't assume that instances within the same instance domain are always sorted in the same order. commit f32bf1e7a669a2bee719e9d3f6c3828acfb8c518 Author: Dave Brolley <brolley> Date: Tue Dec 13 12:35:30 2016 -0500 Create and use __pmTimevalCmp for comparing __pmTimeval. Similar to __pmtimevalcmp which compares struct timeval. commit af8cac3b71b3104157a268c6ebb6c652c87d01ca Author: Nathan Scott <nathans> Date: Tue Dec 13 12:06:52 2016 -0500 Typos and style corrections for recent duplicate __pmLogInDom changes. All commits have been pushed upstream to git://git.pcp.io/pcp master pcp-3.11.7-1.el5 has been submitted as an update to Fedora EPEL 5. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-10df136b72 pcp-3.11.7-1.fc24 has been submitted as an update to Fedora 24. https://bodhi.fedoraproject.org/updates/FEDORA-2016-98254aaaa3 pcp-3.11.7-1.fc25 has been submitted as an update to Fedora 25. https://bodhi.fedoraproject.org/updates/FEDORA-2016-9cbe6566bd pcp-3.11.7-1.fc24 has been pushed to the Fedora 24 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-98254aaaa3 pcp-3.11.7-1.fc25 has been pushed to the Fedora 25 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2016-9cbe6566bd pcp-3.11.7-1.el5 has been pushed to the Fedora EPEL 5 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2016-10df136b72 pcp-3.11.7-1.fc25 has been pushed to the Fedora 25 stable repository. If problems still persist, please make note of it in this bug report. pcp-3.11.7-1.fc24 has been pushed to the Fedora 24 stable repository. If problems still persist, please make note of it in this bug report. pcp-3.11.7-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report. |