Bug 518079 - memory leak in tar xattr
Summary: memory leak in tar xattr
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: tar
Version: 6.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Ondrej Vasik
QA Contact: Alex Sersen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-08-18 19:40 UTC by Issue Tracker
Modified: 2018-11-14 19:52 UTC (History)
6 users (show)

Fixed In Version: tar-1.22-10.fc12
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-11-10 21:45:40 UTC


Attachments (Terms of Use)
Patch that fixed the issue for LLNL (281 bytes, patch)
2009-08-18 19:41 UTC, Kent Baxley
no flags Details | Diff
test case (2.56 MB, application/x-tar)
2009-08-19 17:54 UTC, Ben Woodard
no flags Details
Fix for xheader_destroy memleak (437 bytes, patch)
2009-08-20 15:11 UTC, Ondrej Vasik
no flags Details | Diff

Description Issue Tracker 2009-08-18 19:40:19 UTC
Escalated to Bugzilla from IssueTracker

Comment 1 Issue Tracker 2009-08-18 19:40:20 UTC
Event posted on 08-18-2009 03:25pm EDT by woodard

Date: Tue, 18 Aug 2009 10:29:23 -0700
From: Jim Garlick <garlick@llnl.gov>
To: bwoodard@llnl.gov
Subject: tar memory leak with --xattrs
Message-ID: <20090818172923.GA12239@llnl.gov>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.2i

Ben,

Just a heads-up: we found and fixed a massive memory leak in tar's
--xattr support, documented here:

https://bugzilla.lustre.org/show_bug.cgi?id=20213

I just verified that the leak is present in tar-1.22-2.fc11 included
in the rhel6 alpha.  Might want to pick up the (one-line) patch in the
above bug for rhel6.

Jim

p.s. I probably should have mentioned to you earlier, but the --xattr
support in RHEL5 is pretty broken.  Segfaults on restore as I recall.
We got around that by packaging the fc10 tar as "ltar" on the side.

This event sent from IssueTracker by kbaxley  [LLNL (HPC)]
 issue 331903

Comment 2 Issue Tracker 2009-08-18 19:40:22 UTC
Event posted on 08-18-2009 03:26pm EDT by woodard

We attempted to use tar to move a production MDS to a new lun this weekend
because
of a problem.  We mounted both file systems as ldiskfs with -ouser_xattr
on 
/mnt/from and /mnt/to and ran:

tar --xattrs --no-selinux --sparse --exclude ’./OBJECTS/*’ \
    --exclude ’./CATALOGS’ -C/mnt/from -cf- . | tar -C /mnt/to -xf-

After several hours the second tar was shot by the oom killer.
Probing a bit further we noted that the memory footprint of the restoring
tar grows rapidly.
It is reproduceable if the second tar just lists the table of contents
(...|tar -tf-).

We ran an archive produced produced by 1.20-5.fc10 through the following
versions:
tar-1.19.sun1-6
tar-1.20-5.fc10
tar-1.15.1-23.0.1.el5

The first two exhibited the leak.  The third did not, however its xattr
support is
incomplete.

------- Comment #2 From Colin Faber 2009-07-20 09:11:01 -------

Hi Jim,

Thanks for the report. I'm looking into this for you now.

-cf

------- Comment #3 From Colin Faber 2009-07-20 15:05:50 -------

Hi Jim,

How many files are you taring up. Also does the tar (archive) complete
then the tar (restore)
fails?

-cf

------- Comment #4 From Jim Garlick 2009-07-20 15:58:44 -------

In this case there were about 70M files in the fs.
Neither tar (archive) nor tar (restore) actually failed on its own.
The problem was the memory footprint of the tar (restore) grew until it
ate all available system RAM, so the kernel OOM killer got it.

You don't need that many files to observe the problem though.
Just use top to watch the memory footprint while doing a restore and you
will
see the obvious leak right away.

------- Comment #6 From Colin Faber 2009-07-21 13:31:51 -------

Hi Jim,

Thanks I'm escalating this to engineering now.

-cf

------- Comment #8 From Terry Rutledge 2009-07-22 08:57:18 -------

Reassign to Beaver team.

------- Comment #9 From Johann Lombardi 2009-07-27 15:15:50 -------

(In reply to comment #0)
> We ran an archive produced produced by 1.20-5.fc10 through the following
versions:
> tar-1.19.sun1-6
> tar-1.20-5.fc10
> tar-1.15.1-23.0.1.el5
> 
> The first two exhibited the leak.  The third did not, however its xattr
support is
> incomplete.

Jim, to be clear (i'm just a bit confused by the bug summary), this means
that the
problem is in the upstream tar, right? not in the lustre patches applied
on top
of it?

------- Comment #10 From Jim Garlick 2009-07-27 15:43:46 -------

The problem is present in 1.20-5.fc10 also, but that version has xattr
patches embedded in the
.src.rpm, so I cannot vouch for the ultimate upstream.  In fact my
impression was that the patches
were not submitted upstream, but I could be mistaken.

------- Comment #11 From Johann Lombardi 2009-07-27 16:19:18 -------

(In reply to comment #10)
> The problem is present in 1.20-5.fc10 also, but that version has xattr
patches embedded in the
> .src.rpm, so I cannot vouch for the ultimate upstream.  In fact my
impression was that the patches
> were not submitted upstream, but I could be mistaken.

sorry, it is me who is mistaken. I thought the xattr patch was merged long
time ago
in the upstream tar, but i'm wrong.

Looking at the xattr patch (please note that i'm not at all familiar with
this code),
there is a place that looks suspicious, it is in xattrs_xattrs_get():

...
static char *xatrs = NULL;
...
if (!xatrs) xatrs = xmalloc (xsz);
...
xatrs = xrealloc (xatrs, xsz);
...

I don't see where xatrs is actually freed.

Girish, could you please have a look?

------- Comment #12 From Girish Shilamkar 2009-07-27 21:18:48 -------

Created an attachment (id=24988) [details]
fix memory leak in tar xattr patch

I ran tar --xattrs -cf in a loop for about 30 mins and I do see the free
memory has reduced. But I
wanted to see an OOM occur so I increased xsz to 1024*512 and ran tar in a
loop for another 30
mins. But I did not see an OOM occur.

Still looking at the code I think this patch is needed.

------- Comment #13 From Johann Lombardi 2009-07-28 02:15:00 -------

(From update of attachment 24988 [details])
Inspection template(s):
Bug:       20213 
Developer: girish.shilamkar@sun.com 
Size:      4 Lines of Change 
Date:      2009-7-28
Defects:   0
Type:      CODE
Inspector: johann@sun.com

--------------

------- Comment #15 From Jim Garlick 2009-07-28 11:18:17 -------

I applied attachment 24988 [details] to tar-1.20-5.fc10 and repeated my
test with an MDT archive file.
There is still quite a large leak.  When running top in one window and tar
tvf file in another,
VIRT and RES of the tar process grow at about 50 megabytes per second
(limited by the speed
of my xterm listing the table of contents I am sure).

If it would help for me to upload this 1.7G tar file, let me know where to
drop it.

------- Comment #16 From Johann Lombardi 2009-07-28 17:02:21 -------

Don't block 1.8.2 since this is an issue in an external tool.

------- Comment #17 From Dmitry Zogin 2009-07-29 09:35:30 -------

(From update of attachment 24988 [details])
Inspection template(s):
Bug:       20213 
Developer: girish.shilamkar@sun.com 
Size:      4 Lines of Change 
Date:      2009-7-29
Defects:   0
Type:      CODE
Inspector: dmitry.zogin@sun.com

--------------
This looks fine to me

------- Comment #18 From Girish Shilamkar 2009-08-03 09:49:01 -------

2009-08-03: Attached patch does not solve issue. Trying to find the
remaining the memory leak.

------- Comment #20 From Girish Shilamkar 2009-08-09 20:49:54 -------

2009-08-10: Attached patch does not solve issue. Trying to find the
remaining the memory leak.

------- Comment #21 From Girish Shilamkar 2009-08-10 03:05:11 -------

- The VIRT and RES fields in top increase very fast due to the memory
leak
- without the xattr patch the problem cannot be reproduced
- the problem only occurs during extraction or listing of the archive and
not during creation of
the archive.

I tried out valgrind but its unable to point to any proper memory leak.

------- Comment #22 From Girish Shilamkar 2009-08-10 03:36:52 -------

Here is the relevant output from valgrind.

==13601== malloc/free: in use at exit: 83,522,601 bytes in 110,295
blocks.
==13601== malloc/free: 1,764,653 allocs, 1,654,358 frees, 647,653,340
bytes allocated.
==13601== 
==13601== searching for pointers to 110,295 not-freed blocks.
==13601== checked 189,936 bytes.
==13601== 
==13601== 4 bytes in 1 blocks are definitely lost in loss record 1 of 8
==13601==    at 0x4005B18: malloc (vg_replace_malloc.c:207)
==13601==    by 0x4DD37088: strndup (in /lib/libc-2.5.so)
==13601==    by 0x8082C77: xstrndup (xstrndup.c:33)
==13601==    by 0x80723AF: base_name (basename.c:104)
==13601==    by 0x806CAFA: argp_parse (argp-parse.c:561)
==13601==    by 0x8063DDF: main (tar.c:2131)
==13601== 
==13601== 
==13601== 80 bytes in 1 blocks are still reachable in loss record 2 of 8
==13601==    at 0x4005B18: malloc (vg_replace_malloc.c:207)
==13601==    by 0x4DCE2009: __gconv_open (in /lib/libc-2.5.so)
==13601==    by 0x4DCE1C19: iconv_open (in /lib/libc-2.5.so)
==13601==    by 0x806771C: utf8_convert (utf8.c:57)
==13601==    by 0x80559F7: decode_string (xheader.c:915)
==13601==    by 0x8057053: decode_record (xheader.c:622)
==13601==    by 0x8057155: xheader_decode (xheader.c:667)
==13601==    by 0x805CB90: decode_header (list.c:621)
==13601==    by 0x805D439: list_archive (list.c:209)
==13601==    by 0x805D7C5: read_and (list.c:124)
==13601==    by 0x80644FF: main (tar.c:2472)
==13601== 
==13601== 
==13601== 120 bytes in 1 blocks are still reachable in loss record 3 of 8
==13601==    at 0x4005B18: malloc (vg_replace_malloc.c:207)
==13601==    by 0x4DCE9DC5: __gconv_lookup_cache (in /lib/libc-2.5.so)
==13601==    by 0x4DCE333B: __gconv_find_transform (in /lib/libc-2.5.so)
==13601==    by 0x4DCE1FB3: __gconv_open (in /lib/libc-2.5.so)
==13601==    by 0x4DCE1C19: iconv_open (in /lib/libc-2.5.so)
==13601==    by 0x806771C: utf8_convert (utf8.c:57)
==13601==    by 0x80559F7: decode_string (xheader.c:915)
==13601==    by 0x8057053: decode_record (xheader.c:622)
==13601==    by 0x8057155: xheader_decode (xheader.c:667)
==13601==    by 0x805CB90: decode_header (list.c:621)
==13601==    by 0x805D439: list_archive (list.c:209)
==13601==    by 0x805D7C5: read_and (list.c:124)
==13601== 
==13601== 
==13601== 4,076 bytes in 2 blocks are still reachable in loss record 4 of
8
==13601==    at 0x4005B18: malloc (vg_replace_malloc.c:207)
==13601==    by 0x80824B1: xmalloc (xmalloc.c:50)
==13601==    by 0x4DD36654: _obstack_begin (in /lib/libc-2.5.so)
==13601==    by 0x8063BAD: main (tar.c:2423)
==13601== 
==13601== 
==13601== 10,607 bytes in 17 blocks are indirectly lost in loss record 5
of 8
==13601==    at 0x4005B18: malloc (vg_replace_malloc.c:207)
==13601==    by 0x80824B1: xmalloc (xmalloc.c:50)
==13601==    by 0x80824E1: xmemdup (xmalloc.c:114)
==13601==    by 0x8082520: xstrdup (xmalloc.c:122)
==13601==    by 0x8063D10: main (tar.c:2103)
==13601== 
==13601== 
==13601== 32,640 bytes in 1 blocks are still reachable in loss record 6 of
8
==13601==    at 0x4005B18: malloc (vg_replace_malloc.c:207)
==13601==    by 0x4DCE2188: __gconv_open (in /lib/libc-2.5.so)
==13601==    by 0x4DCE1C19: iconv_open (in /lib/libc-2.5.so)
==13601==    by 0x806771C: utf8_convert (utf8.c:57)
==13601==    by 0x80559F7: decode_string (xheader.c:915)
==13601==    by 0x8057053: decode_record (xheader.c:622)
==13601==    by 0x8057155: xheader_decode (xheader.c:667)
==13601==    by 0x805CB90: decode_header (list.c:621)
==13601==    by 0x805D439: list_archive (list.c:209)
==13601==    by 0x805D7C5: read_and (list.c:124)
==13601==    by 0x80644FF: main (tar.c:2472)
==13601== 
==13601== 
==13601== 44,663 bytes in 59 blocks are possibly lost in loss record 7 of
8
==13601==    at 0x4005B18: malloc (vg_replace_malloc.c:207)
==13601==    by 0x80824B1: xmalloc (xmalloc.c:50)
==13601==    by 0x8056DB8: xheader_read (xheader.c:730)
==13601==    by 0x805D1AE: read_header_primitive (list.c:389)
==13601==    by 0x805D3F9: read_header (list.c:469)
==13601==    by 0x805D516: read_and (list.c:80)
==13601==    by 0x80644FF: main (tar.c:2472)
==13601== 
==13601== 
==13601== 83,441,018 (83,430,411 direct, 10,607 indirect) bytes in 110,213
blocks are definitely
lost in loss record 8 of 8
==13601==    at 0x4005B18: malloc (vg_replace_malloc.c:207)
==13601==    by 0x80824B1: xmalloc (xmalloc.c:50)
==13601==    by 0x8063CE0: main (tar.c:2090)
==13601== 
==13601== LEAK SUMMARY:
==13601==    definitely lost: 83,430,415 bytes in 110,214 blocks.
==13601==    indirectly lost: 10,607 bytes in 17 blocks.
==13601==      possibly lost: 44,663 bytes in 59 blocks.
==13601==    still reachable: 36,916 bytes in 5 blocks.
==13601==         suppressed: 0 bytes in 0 blocks.

------- Comment #23 From Jim Garlick 2009-08-10 12:15:44 -------

When I built tar with no optimization, valgrind reported the major leak
under the xheader_read()
path.

------- Comment #24 From Jim Garlick 2009-08-10 12:16:34 -------

Created an attachment (id=25233) [details]
fix another leak

This seems to fix the leak, but I'm not at all certain it is correct. 
Please have a look.

------- Comment #25 From Jim Garlick 2009-08-10 12:53:51 -------

Created an attachment (id=25235) [details]
fix another leak

That was not right - this may be better.

------- Comment #27 From Girish Shilamkar 2009-08-11 03:07:46 -------

(From update of attachment 24988 [details])
This patch is actually incorrect since both val and xatrs are static
variables and are not declared
during each function call.

------- Comment #29 From Girish Shilamkar 2009-08-16 21:36:38 -------

(From update of attachment 25235 [details])
Inspection template(s):
Bug:       20213 
Developer: garlick@llnl.gov 
Size:      1 Lines of Change 
Date:      2009-8-17
Defects:   0
Type:      CODE
Inspector: girish.shilamkar@sun.com

--------------
yes, this looks good and fixes the problem. I will make the new RPMs with
this patch.

------- Comment #30 From Girish Shilamkar 2009-08-17 21:57:31 -------

2009-08-18: Creating RPMs with attached patch for all
distros/architectures

------- Comment #32 From Johann Lombardi 2009-08-18 07:52:48 -------

(From update of attachment 25235 [details])
Inspection template(s):
Bug:       20213 
Developer: garlick@llnl.gov 
Size:      1 Lines of Change 
Date:      2009-8-18
Defects:   0
Type:      CODE
Inspector: johann@sun.com

--------------

------- Comment #34 From Peter Jones 2009-08-18 10:48:18 -------

Girish

What work is outstanding for this fix to be landed?

Please advise

PJones



woodard assigned to issue for LLNL (HPC).
Status set to: Waiting on Tech

This event sent from IssueTracker by kbaxley  [LLNL (HPC)]
 issue 331903

Comment 3 Kent Baxley 2009-08-18 19:41:50 UTC
Created attachment 357843 [details]
Patch that fixed the issue for LLNL

Comment 5 RHEL Product and Program Management 2009-08-18 20:24:36 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux major release.  Product Management has requested further
review of this request by Red Hat Engineering, for potential inclusion in a Red
Hat Enterprise Linux Major release.  This request is not yet committed for
inclusion.

Comment 6 Ondrej Vasik 2009-08-19 10:53:10 UTC
Thanks for report, I don't think that patch is correct way to fix that leak - as that removed xheader_init() is coming not from xattr patch, but from upstream - so is such removal could cause regression in other functionality (http://git.savannah.gnu.org/cgit/tar.git/commit/?id=64cddf2fbc246e331edcbc82e419785af7f819a5)

I would prefer to find out where the memory cleanup should be made and actually clean up the memory properly rather than removing initialization of header stack.
I'll try to find something better, but I would appreciate simpler reproducer of the leak (it doesn't leak for me on ext3 with xattrs)...

Comment 7 Ben Woodard 2009-08-19 17:54:23 UTC
Created attachment 357964 [details]
test case

Trying to untar this file or even list the contents should reproduce the problem or so I've been told. 

The person that made the patch seems to be generally in agreement that his patch might not be the right approach.

Comment 8 Issue Tracker 2009-08-19 17:56:02 UTC
Event posted on 2009-08-19 10:56 PDT by woodard

Date: Wed, 19 Aug 2009 09:51:25 -0700
From: Jim Garlick <garlick@llnl.gov>
To: Ben Woodard <bwoodard@llnl.gov>
Subject: Re: tar memory leak with --xattrs
Message-ID: <20090819165125.GA28491@llnl.gov>
References: <20090818172923.GA12239@llnl.gov> <4A8B0470.8060809@llnl.gov>
<20090818201820.GA12706@llnl.gov> <4A8B1CC1.5010205@llnl.gov>
<20090818222647.GA13239@llnl.gov>
Mime-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <20090818222647.GA13239@llnl.gov>
User-Agent: Mutt/1.4.2.2i

Ben, Ondrej is asking for a reproducer for the leak in 518079.
Could you provide him with test-ltar--1.20-5.3.ch4.2.tar?
It should make the leak evident in valgrind (though I had to build
tar with no optimization for valgrind to report the correct stack.

He has a good point that my patch may be the wrong way to go about it
since I am changing the upstream code not code added by the xattr patch.

The problem would seem to be that xheader_read() mallocs xhdr->buffer and
its call to xheader_init() that I removed allocates xhdr->stack;
yet xheader_destroy() only frees xhdr->buffer if xhdr->stack
is NULL, which xheader_read() ensures is never the case.

Jim



This event sent from IssueTracker by woodard 
 issue 331903

Comment 9 Ondrej Vasik 2009-08-20 15:11:22 UTC
Created attachment 358115 [details]
Fix for xheader_destroy memleak

Thanks for testcase, I'm now able to reproduce the leak in xheader_destroy.

I guess attached patch should safely solve the leak with unfreed xhdr->buffer in xheader_destroy ... but is very similar to https://bugzilla.lustre.org/attachment.cgi?id=25233 abandoned by Jim Garlick in lustre bugzilla. Could he explain what was not correct on his attachment? The only thing I can imagine was xhdr->size was not set to 0 for non-NULL xhdr->stk and NULL xhdr->buffer (which is solved by my patch). Or is there something else?

Comment 10 Ondrej Vasik 2009-08-20 16:06:31 UTC
Comment on attachment 358115 [details]
Fix for xheader_destroy memleak

Aaah, I see ... double free errors in testsuite... bad luck, I'll check what could be done.

Comment 11 Ondrej Vasik 2009-09-08 07:24:16 UTC
Just as a sidenote - reported upstream : http://lists.gnu.org/archive/html/bug-tar/2009-09/msg00000.html - with no response so far ... probably removing that xheader_init is really safe, as I don't see any reason for it.

Comment 14 releng-rhel@redhat.com 2009-12-16 21:52:51 UTC
Fixed in 'tar-1.22-10.fc12'. 'tar-1.22-11.el6' included in compose 'RHEL6.0-20091216.nightly'.
Moving to ON_QA.

Comment 16 releng-rhel@redhat.com 2010-11-10 21:45:40 UTC
Red Hat Enterprise Linux 6.0 is now available and should resolve
the problem described in this bug report. This report is therefore being closed
with a resolution of CURRENTRELEASE. You may reopen this bug report if the
solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.