Bug 518079
Summary: | memory leak in tar xattr | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Issue Tracker <tao> | ||||||||
Component: | tar | Assignee: | Ondrej Vasik <ovasik> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Alex Sersen <asersen> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 6.0 | CC: | asersen, azelinka, cww, kdudka, ohudlick, tao | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | All | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | tar-1.22-10.fc12 | Doc Type: | Bug Fix | ||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2010-11-10 21:45:40 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Issue Tracker
2009-08-18 19:40:19 UTC
Event posted on 08-18-2009 03:25pm EDT by woodard Date: Tue, 18 Aug 2009 10:29:23 -0700 From: Jim Garlick <garlick> To: bwoodard Subject: tar memory leak with --xattrs Message-ID: <20090818172923.GA12239> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.2i Ben, Just a heads-up: we found and fixed a massive memory leak in tar's --xattr support, documented here: https://bugzilla.lustre.org/show_bug.cgi?id=20213 I just verified that the leak is present in tar-1.22-2.fc11 included in the rhel6 alpha. Might want to pick up the (one-line) patch in the above bug for rhel6. Jim p.s. I probably should have mentioned to you earlier, but the --xattr support in RHEL5 is pretty broken. Segfaults on restore as I recall. We got around that by packaging the fc10 tar as "ltar" on the side. This event sent from IssueTracker by kbaxley [LLNL (HPC)] issue 331903 Event posted on 08-18-2009 03:26pm EDT by woodard We attempted to use tar to move a production MDS to a new lun this weekend because of a problem. We mounted both file systems as ldiskfs with -ouser_xattr on /mnt/from and /mnt/to and ran: tar --xattrs --no-selinux --sparse --exclude ’./OBJECTS/*’ \ --exclude ’./CATALOGS’ -C/mnt/from -cf- . | tar -C /mnt/to -xf- After several hours the second tar was shot by the oom killer. Probing a bit further we noted that the memory footprint of the restoring tar grows rapidly. It is reproduceable if the second tar just lists the table of contents (...|tar -tf-). We ran an archive produced produced by 1.20-5.fc10 through the following versions: tar-1.19.sun1-6 tar-1.20-5.fc10 tar-1.15.1-23.0.1.el5 The first two exhibited the leak. The third did not, however its xattr support is incomplete. ------- Comment #2 From Colin Faber 2009-07-20 09:11:01 ------- Hi Jim, Thanks for the report. I'm looking into this for you now. -cf ------- Comment #3 From Colin Faber 2009-07-20 15:05:50 ------- Hi Jim, How many files are you taring up. Also does the tar (archive) complete then the tar (restore) fails? -cf ------- Comment #4 From Jim Garlick 2009-07-20 15:58:44 ------- In this case there were about 70M files in the fs. Neither tar (archive) nor tar (restore) actually failed on its own. The problem was the memory footprint of the tar (restore) grew until it ate all available system RAM, so the kernel OOM killer got it. You don't need that many files to observe the problem though. Just use top to watch the memory footprint while doing a restore and you will see the obvious leak right away. ------- Comment #6 From Colin Faber 2009-07-21 13:31:51 ------- Hi Jim, Thanks I'm escalating this to engineering now. -cf ------- Comment #8 From Terry Rutledge 2009-07-22 08:57:18 ------- Reassign to Beaver team. ------- Comment #9 From Johann Lombardi 2009-07-27 15:15:50 ------- (In reply to comment #0) > We ran an archive produced produced by 1.20-5.fc10 through the following versions: > tar-1.19.sun1-6 > tar-1.20-5.fc10 > tar-1.15.1-23.0.1.el5 > > The first two exhibited the leak. The third did not, however its xattr support is > incomplete. Jim, to be clear (i'm just a bit confused by the bug summary), this means that the problem is in the upstream tar, right? not in the lustre patches applied on top of it? ------- Comment #10 From Jim Garlick 2009-07-27 15:43:46 ------- The problem is present in 1.20-5.fc10 also, but that version has xattr patches embedded in the .src.rpm, so I cannot vouch for the ultimate upstream. In fact my impression was that the patches were not submitted upstream, but I could be mistaken. ------- Comment #11 From Johann Lombardi 2009-07-27 16:19:18 ------- (In reply to comment #10) > The problem is present in 1.20-5.fc10 also, but that version has xattr patches embedded in the > .src.rpm, so I cannot vouch for the ultimate upstream. In fact my impression was that the patches > were not submitted upstream, but I could be mistaken. sorry, it is me who is mistaken. I thought the xattr patch was merged long time ago in the upstream tar, but i'm wrong. Looking at the xattr patch (please note that i'm not at all familiar with this code), there is a place that looks suspicious, it is in xattrs_xattrs_get(): ... static char *xatrs = NULL; ... if (!xatrs) xatrs = xmalloc (xsz); ... xatrs = xrealloc (xatrs, xsz); ... I don't see where xatrs is actually freed. Girish, could you please have a look? ------- Comment #12 From Girish Shilamkar 2009-07-27 21:18:48 ------- Created an attachment (id=24988) [details] fix memory leak in tar xattr patch I ran tar --xattrs -cf in a loop for about 30 mins and I do see the free memory has reduced. But I wanted to see an OOM occur so I increased xsz to 1024*512 and ran tar in a loop for another 30 mins. But I did not see an OOM occur. Still looking at the code I think this patch is needed. ------- Comment #13 From Johann Lombardi 2009-07-28 02:15:00 ------- (From update of attachment 24988 [details]) Inspection template(s): Bug: 20213 Developer: girish.shilamkar Size: 4 Lines of Change Date: 2009-7-28 Defects: 0 Type: CODE Inspector: johann -------------- ------- Comment #15 From Jim Garlick 2009-07-28 11:18:17 ------- I applied attachment 24988 [details] to tar-1.20-5.fc10 and repeated my test with an MDT archive file. There is still quite a large leak. When running top in one window and tar tvf file in another, VIRT and RES of the tar process grow at about 50 megabytes per second (limited by the speed of my xterm listing the table of contents I am sure). If it would help for me to upload this 1.7G tar file, let me know where to drop it. ------- Comment #16 From Johann Lombardi 2009-07-28 17:02:21 ------- Don't block 1.8.2 since this is an issue in an external tool. ------- Comment #17 From Dmitry Zogin 2009-07-29 09:35:30 ------- (From update of attachment 24988 [details]) Inspection template(s): Bug: 20213 Developer: girish.shilamkar Size: 4 Lines of Change Date: 2009-7-29 Defects: 0 Type: CODE Inspector: dmitry.zogin -------------- This looks fine to me ------- Comment #18 From Girish Shilamkar 2009-08-03 09:49:01 ------- 2009-08-03: Attached patch does not solve issue. Trying to find the remaining the memory leak. ------- Comment #20 From Girish Shilamkar 2009-08-09 20:49:54 ------- 2009-08-10: Attached patch does not solve issue. Trying to find the remaining the memory leak. ------- Comment #21 From Girish Shilamkar 2009-08-10 03:05:11 ------- - The VIRT and RES fields in top increase very fast due to the memory leak - without the xattr patch the problem cannot be reproduced - the problem only occurs during extraction or listing of the archive and not during creation of the archive. I tried out valgrind but its unable to point to any proper memory leak. ------- Comment #22 From Girish Shilamkar 2009-08-10 03:36:52 ------- Here is the relevant output from valgrind. ==13601== malloc/free: in use at exit: 83,522,601 bytes in 110,295 blocks. ==13601== malloc/free: 1,764,653 allocs, 1,654,358 frees, 647,653,340 bytes allocated. ==13601== ==13601== searching for pointers to 110,295 not-freed blocks. ==13601== checked 189,936 bytes. ==13601== ==13601== 4 bytes in 1 blocks are definitely lost in loss record 1 of 8 ==13601== at 0x4005B18: malloc (vg_replace_malloc.c:207) ==13601== by 0x4DD37088: strndup (in /lib/libc-2.5.so) ==13601== by 0x8082C77: xstrndup (xstrndup.c:33) ==13601== by 0x80723AF: base_name (basename.c:104) ==13601== by 0x806CAFA: argp_parse (argp-parse.c:561) ==13601== by 0x8063DDF: main (tar.c:2131) ==13601== ==13601== ==13601== 80 bytes in 1 blocks are still reachable in loss record 2 of 8 ==13601== at 0x4005B18: malloc (vg_replace_malloc.c:207) ==13601== by 0x4DCE2009: __gconv_open (in /lib/libc-2.5.so) ==13601== by 0x4DCE1C19: iconv_open (in /lib/libc-2.5.so) ==13601== by 0x806771C: utf8_convert (utf8.c:57) ==13601== by 0x80559F7: decode_string (xheader.c:915) ==13601== by 0x8057053: decode_record (xheader.c:622) ==13601== by 0x8057155: xheader_decode (xheader.c:667) ==13601== by 0x805CB90: decode_header (list.c:621) ==13601== by 0x805D439: list_archive (list.c:209) ==13601== by 0x805D7C5: read_and (list.c:124) ==13601== by 0x80644FF: main (tar.c:2472) ==13601== ==13601== ==13601== 120 bytes in 1 blocks are still reachable in loss record 3 of 8 ==13601== at 0x4005B18: malloc (vg_replace_malloc.c:207) ==13601== by 0x4DCE9DC5: __gconv_lookup_cache (in /lib/libc-2.5.so) ==13601== by 0x4DCE333B: __gconv_find_transform (in /lib/libc-2.5.so) ==13601== by 0x4DCE1FB3: __gconv_open (in /lib/libc-2.5.so) ==13601== by 0x4DCE1C19: iconv_open (in /lib/libc-2.5.so) ==13601== by 0x806771C: utf8_convert (utf8.c:57) ==13601== by 0x80559F7: decode_string (xheader.c:915) ==13601== by 0x8057053: decode_record (xheader.c:622) ==13601== by 0x8057155: xheader_decode (xheader.c:667) ==13601== by 0x805CB90: decode_header (list.c:621) ==13601== by 0x805D439: list_archive (list.c:209) ==13601== by 0x805D7C5: read_and (list.c:124) ==13601== ==13601== ==13601== 4,076 bytes in 2 blocks are still reachable in loss record 4 of 8 ==13601== at 0x4005B18: malloc (vg_replace_malloc.c:207) ==13601== by 0x80824B1: xmalloc (xmalloc.c:50) ==13601== by 0x4DD36654: _obstack_begin (in /lib/libc-2.5.so) ==13601== by 0x8063BAD: main (tar.c:2423) ==13601== ==13601== ==13601== 10,607 bytes in 17 blocks are indirectly lost in loss record 5 of 8 ==13601== at 0x4005B18: malloc (vg_replace_malloc.c:207) ==13601== by 0x80824B1: xmalloc (xmalloc.c:50) ==13601== by 0x80824E1: xmemdup (xmalloc.c:114) ==13601== by 0x8082520: xstrdup (xmalloc.c:122) ==13601== by 0x8063D10: main (tar.c:2103) ==13601== ==13601== ==13601== 32,640 bytes in 1 blocks are still reachable in loss record 6 of 8 ==13601== at 0x4005B18: malloc (vg_replace_malloc.c:207) ==13601== by 0x4DCE2188: __gconv_open (in /lib/libc-2.5.so) ==13601== by 0x4DCE1C19: iconv_open (in /lib/libc-2.5.so) ==13601== by 0x806771C: utf8_convert (utf8.c:57) ==13601== by 0x80559F7: decode_string (xheader.c:915) ==13601== by 0x8057053: decode_record (xheader.c:622) ==13601== by 0x8057155: xheader_decode (xheader.c:667) ==13601== by 0x805CB90: decode_header (list.c:621) ==13601== by 0x805D439: list_archive (list.c:209) ==13601== by 0x805D7C5: read_and (list.c:124) ==13601== by 0x80644FF: main (tar.c:2472) ==13601== ==13601== ==13601== 44,663 bytes in 59 blocks are possibly lost in loss record 7 of 8 ==13601== at 0x4005B18: malloc (vg_replace_malloc.c:207) ==13601== by 0x80824B1: xmalloc (xmalloc.c:50) ==13601== by 0x8056DB8: xheader_read (xheader.c:730) ==13601== by 0x805D1AE: read_header_primitive (list.c:389) ==13601== by 0x805D3F9: read_header (list.c:469) ==13601== by 0x805D516: read_and (list.c:80) ==13601== by 0x80644FF: main (tar.c:2472) ==13601== ==13601== ==13601== 83,441,018 (83,430,411 direct, 10,607 indirect) bytes in 110,213 blocks are definitely lost in loss record 8 of 8 ==13601== at 0x4005B18: malloc (vg_replace_malloc.c:207) ==13601== by 0x80824B1: xmalloc (xmalloc.c:50) ==13601== by 0x8063CE0: main (tar.c:2090) ==13601== ==13601== LEAK SUMMARY: ==13601== definitely lost: 83,430,415 bytes in 110,214 blocks. ==13601== indirectly lost: 10,607 bytes in 17 blocks. ==13601== possibly lost: 44,663 bytes in 59 blocks. ==13601== still reachable: 36,916 bytes in 5 blocks. ==13601== suppressed: 0 bytes in 0 blocks. ------- Comment #23 From Jim Garlick 2009-08-10 12:15:44 ------- When I built tar with no optimization, valgrind reported the major leak under the xheader_read() path. ------- Comment #24 From Jim Garlick 2009-08-10 12:16:34 ------- Created an attachment (id=25233) [details] fix another leak This seems to fix the leak, but I'm not at all certain it is correct. Please have a look. ------- Comment #25 From Jim Garlick 2009-08-10 12:53:51 ------- Created an attachment (id=25235) [details] fix another leak That was not right - this may be better. ------- Comment #27 From Girish Shilamkar 2009-08-11 03:07:46 ------- (From update of attachment 24988 [details]) This patch is actually incorrect since both val and xatrs are static variables and are not declared during each function call. ------- Comment #29 From Girish Shilamkar 2009-08-16 21:36:38 ------- (From update of attachment 25235 [details]) Inspection template(s): Bug: 20213 Developer: garlick Size: 1 Lines of Change Date: 2009-8-17 Defects: 0 Type: CODE Inspector: girish.shilamkar -------------- yes, this looks good and fixes the problem. I will make the new RPMs with this patch. ------- Comment #30 From Girish Shilamkar 2009-08-17 21:57:31 ------- 2009-08-18: Creating RPMs with attached patch for all distros/architectures ------- Comment #32 From Johann Lombardi 2009-08-18 07:52:48 ------- (From update of attachment 25235 [details]) Inspection template(s): Bug: 20213 Developer: garlick Size: 1 Lines of Change Date: 2009-8-18 Defects: 0 Type: CODE Inspector: johann -------------- ------- Comment #34 From Peter Jones 2009-08-18 10:48:18 ------- Girish What work is outstanding for this fix to be landed? Please advise PJones woodard assigned to issue for LLNL (HPC). Status set to: Waiting on Tech This event sent from IssueTracker by kbaxley [LLNL (HPC)] issue 331903 Created attachment 357843 [details]
Patch that fixed the issue for LLNL
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion. Thanks for report, I don't think that patch is correct way to fix that leak - as that removed xheader_init() is coming not from xattr patch, but from upstream - so is such removal could cause regression in other functionality (http://git.savannah.gnu.org/cgit/tar.git/commit/?id=64cddf2fbc246e331edcbc82e419785af7f819a5) I would prefer to find out where the memory cleanup should be made and actually clean up the memory properly rather than removing initialization of header stack. I'll try to find something better, but I would appreciate simpler reproducer of the leak (it doesn't leak for me on ext3 with xattrs)... Created attachment 357964 [details]
test case
Trying to untar this file or even list the contents should reproduce the problem or so I've been told.
The person that made the patch seems to be generally in agreement that his patch might not be the right approach.
Event posted on 2009-08-19 10:56 PDT by woodard Date: Wed, 19 Aug 2009 09:51:25 -0700 From: Jim Garlick <garlick> To: Ben Woodard <bwoodard> Subject: Re: tar memory leak with --xattrs Message-ID: <20090819165125.GA28491> References: <20090818172923.GA12239> <4A8B0470.8060809> <20090818201820.GA12706> <4A8B1CC1.5010205> <20090818222647.GA13239> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20090818222647.GA13239> User-Agent: Mutt/1.4.2.2i Ben, Ondrej is asking for a reproducer for the leak in 518079. Could you provide him with test-ltar--1.20-5.3.ch4.2.tar? It should make the leak evident in valgrind (though I had to build tar with no optimization for valgrind to report the correct stack. He has a good point that my patch may be the wrong way to go about it since I am changing the upstream code not code added by the xattr patch. The problem would seem to be that xheader_read() mallocs xhdr->buffer and its call to xheader_init() that I removed allocates xhdr->stack; yet xheader_destroy() only frees xhdr->buffer if xhdr->stack is NULL, which xheader_read() ensures is never the case. Jim This event sent from IssueTracker by woodard issue 331903 Created attachment 358115 [details] Fix for xheader_destroy memleak Thanks for testcase, I'm now able to reproduce the leak in xheader_destroy. I guess attached patch should safely solve the leak with unfreed xhdr->buffer in xheader_destroy ... but is very similar to https://bugzilla.lustre.org/attachment.cgi?id=25233 abandoned by Jim Garlick in lustre bugzilla. Could he explain what was not correct on his attachment? The only thing I can imagine was xhdr->size was not set to 0 for non-NULL xhdr->stk and NULL xhdr->buffer (which is solved by my patch). Or is there something else? Comment on attachment 358115 [details]
Fix for xheader_destroy memleak
Aaah, I see ... double free errors in testsuite... bad luck, I'll check what could be done.
Just as a sidenote - reported upstream : http://lists.gnu.org/archive/html/bug-tar/2009-09/msg00000.html - with no response so far ... probably removing that xheader_init is really safe, as I don't see any reason for it. Fixed in 'tar-1.22-10.fc12'. 'tar-1.22-11.el6' included in compose 'RHEL6.0-20091216.nightly'. Moving to ON_QA. Red Hat Enterprise Linux 6.0 is now available and should resolve the problem described in this bug report. This report is therefore being closed with a resolution of CURRENTRELEASE. You may reopen this bug report if the solution does not work for you. |