Bug 132947

Summary: [PATCH] kernel memory leak on x86_64 in 32/64 mixed mode
Product: [Fedora] Fedora Reporter: Axel Thimm <axel.thimm>
Component: kernelAssignee: Jim Paradis <jparadis>
Status: CLOSED RAWHIDE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: barryn, benny+bugzilla, bryans, davej, dgunchev, oliva, peterm, stesmi, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-11-03 19:49:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 130887, 135876    
Attachments:
Description Flags
Patch that fixes the Fedora-local patch that introduces a ia32-compat memory leak in x86_64 none

Description Axel Thimm 2004-09-20 10:01:21 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2)
Gecko/20040809

Description of problem:
Running 32 bits applications on x86_64 FC distributions with kernels
based on 2.6.8 (2.6.8-1.521 and 2.6.8-1.541 currently) generate
continuous memory leaks. Returning to the previous 2.6.7 based kernels
turns this behaviour off.


Version-Release number of selected component (if applicable):
2.6.8-1.521 and 2.6.8-1.541

How reproducible:
Always

Steps to Reproduce:
1. Install FC2/x86_64 or FC3t2/x86_64
2a. Setup an i386 chroot for say FC2/i386
3a. Work in the chroot like building packages etc.

2b. run 32bit application directly like uvscan
    

Actual Results:  kernel memory leaks at a rate of a couple MB/sec.
When all the memory is consumed the oom-killer strikes.

Expected Results:  No kernel memory leak ;)

Additional info:

This bug was first added to bug #131251 which sounded
phenomenologically similar. Since it turns out to be a x86_64 specific
bug, I am opening a new entry, as the causes/fixes will be different
than these of the original reporter on bug #131251.

See also vmstat/free etc. listings in bug #131251.

Also another reference to this bug can be found at
http://nic.phys.ethz.ch/news/1093511266/index_html

Comment 1 Warren Togami 2004-09-20 10:34:05 UTC
http://people.redhat.com/arjanv/2.6/
Please see if it is still an issue with newer kernels from here.


Comment 2 Axel Thimm 2004-09-22 17:55:05 UTC
Are there any hints these packages should have this leak fixed?
Diffing 541 to it, and checking the Changelogs of the base kernels I
couldn't find anything addressing this.

I'd love to test those kernels but they don't even boot on both my
Tyan systems. The Oopsing I see seems to be an unrelated bug, so I'll
file it seperately.

Could someone@redhat verify this memory leak on x86_64 running 32 bit
apps? Any longer configure script will eat up all your memory. Or you
can try to rebuild the kernel, works also :(


Comment 3 Axel Thimm 2004-09-22 19:01:35 UTC
In reply to comment #2:
> I'd love to test those kernels but they don't even boot on both my
> Tyan systems.

That was true for 582. 584 boots fine, but shows the same memory leak
behaviour.


Comment 4 Arjan van de Ven 2004-09-23 10:25:06 UTC
does slabtop show any obvious signs of leakage?


Comment 5 Axel Thimm 2004-09-23 15:27:08 UTC
When I checked it with 541 slabtop did not show any numbers summing up
anything near the 1GB that had leaked. Would that be the indication,
or are there other scales relevant here?

Comment 6 Axel Thimm 2004-09-24 08:06:15 UTC
Migrating to FC3t2 (problem stil persists in FC2, but I hope FC3t2
gets more focus, especially through bug #130887)

Comment 7 Axel Thimm 2004-09-27 00:58:13 UTC
It looks like it's a process creation/destruction issue, e.g. the
memory leak seems to go with the number of generated processes and not
process life time.

That's why it is only visible in certain process generating scenarios
like configure scripts or email scanners. With openoffice it would
take you days to create as many 32 bit processes to detect the memory
leak.

Comment 8 Stefan Smietanowski 2004-10-03 16:23:15 UTC
I definately agree with that. I am currently running a 64bit FC2 521
kernel on my testserver with a 32bit mail antivirus scanner installed
and every 8 or so days it OOMs. I have not started digging yet but the
bug definately exists. Turning off the antivirus scanner does not help
at all.

Comment 9 Jim Paradis 2004-10-18 21:53:05 UTC
This looks like it might be a page table leak of some kind.  I brought
up a system to text mode login (i.e. relatively quiescent) and logged
in on two VTs.  On one of them I mounted a 32-bit FC root, chroot'ed
to it, and mounted /proc under that.  Then on each I just did several
iterations of "cat /proc/meminfo".  Every time the 32-bit "cat" ran,
the PageTables entry increased by exactly 24kb, whereas multiple
iterations of the 64-bit "cat" didn't change the number at all. 
Continuing to investigate.


Comment 10 Alexandre Oliva 2004-10-18 22:04:40 UTC
This is still a problem in 1.624 :-(

Comment 11 Alexandre Oliva 2004-10-19 05:42:55 UTC
I've looked into it a little bit, adding debug statements before every
increment and decrement of nr_page_table_pages.  I found out that,
when I run /lib/ld-linux.so.2 from an init=/bin/bash session, it leaks
exactly one PTE page, allocated by install_arg_page, called from
ia32_setup_arg_pages, called by load_elf32_binary (that's
load_elf_binary, #define-renamed in arch/x86_64/ia32/ia32_binfmt.c).

When I run /lib/libc.so.6, it leaks two PTE pages, one exactly as
above, and one allocated by handle_mm_fault, called by inode_has_perm,
called by do_page_fault, called by vma_prio_tree_insert, called by
error_exit, called by __clear_user, called by __clear_user (presumably
the stack dump just got confused because of the MMU exception), called
by load_elf_interp, called by load_elf32_binary.

AFAICT, the latter is zeroing out the BSS for libc.so, that had been
previously allocated with a MAP_ANON mmap.

Unfortunately, I can't see anything particularly wrong with the way
these PTEs are allocated, so presumably the problem is on the other
end: whatever should be deallocating them isn't.  I haven't
investigated this possibility yet.

vsyscall32, which was my first suspicion, doesn't make any difference
as far as the leak is concerned.

Comment 12 Jim Paradis 2004-10-20 16:19:09 UTC
It seems to be happening on static as well as dynamic executables, so
it's nothing that ld-linux.so or libc is doing (not that I thought
so).  Diff'ing 2.6.7 and fc3 doesn't show anything obvious to me, but
I'll keep looking


Comment 13 Alexandre Oliva 2004-10-22 18:11:11 UTC
If I had to hazard a guess, I'd say it's something from the 64-bit
executable that fails to be deallocated at exec() time.  The new
definition of TASK_SIZE limits the maps considered valid for the
executable, and I don't see as many ptes being freed with 2.6.9 as I
do with 2.6.7 at exec time.

Comment 14 Alexandre Oliva 2004-10-22 18:43:03 UTC
The comment above the first SET_PERSONALITY in fs/binfmt_elf.c is
particularly enlightening, and pretty much proves my theory is
correct.  Now on to figure out how to fix it.

Comment 15 Alexandre Oliva 2004-10-22 21:12:37 UTC
Created attachment 105676 [details]
Patch that fixes the Fedora-local patch that introduces a ia32-compat memory leak in x86_64

As it turns out, the culprit is a Fedora-local patch:
linux-2.6.8-flexmmap-x86-64.patch, that modifies the way TASK_SIZE is defined. 
I'm not entirely sure about why it breaks, since SET_PERSONALITY appears to be
defined in such a way that the TIF_IA32 flag is only set at the point it should
be, but it somehow still causes memory to leak.  Since the modified TASK_SIZE
setting is not upstream, and it apparently is only necessary for exec-shield
randomized mmap (?), I came up with this patch for the patch we install at
build time.  I've rebuilt vmlinuz on a tree on which I'd previously built 1.640
plus a few unrelated patches, booted into it, and am now half-way through a
32-bit-only GCC+binutils+GDB bootstrap, without apparent leaks.  Yay!

Comment 16 Arjan van de Ven 2004-10-23 11:03:35 UTC
flexmmap has nothing to do with execshield but about getting the
maximum use out of the virtual address space, and in this case also
compatibility with our 32 bit distro

Comment 17 Alexandre Oliva 2004-10-23 17:13:34 UTC
Right, but I was concerned that the upstream definition of TASK_SIZE
might break exec-shield randomization, since, without the
32-bit-limiting TASK_SIZE, exec-shield might choose memory addresses
for the executable or the dynamic loader that were not within the
32-bit address space.  As it turns out, it apparently doesn't, but it
now occurs to me that this box is already fully prelinked, so maybe
that would hide any problem in randomizing the dynamic loader load
location.  But then, TASK_SIZE *is* overridden to 0xffffffff in
ia32_binfmt.c, so perhaps that would be enough to avoid trouble.

Comment 18 Bryan Stillwell 2004-10-25 18:06:51 UTC
I'm also experiencing this same problem on 2.6.8-1.521smp x86_64 on
FC2. Is there a planned update from Red Hat that will resolve this
problem?  Also does the 2.6.9-1.640smp kernel in the development tree
have this fixed?

Comment 19 Axel Thimm 2004-10-26 16:17:20 UTC
Alexandre, thanks for spotting and fixing this!

I have rebuilt FC2 kernels with your fix and voila, x86_64 can run
i386 binaries w/o leaking again! This is just a verification on the
FC2 platform.

I hope this fix makes it not only to rawhide/fc3, but also to the next
FC2 kernel errata. Thanks, again, it was quite a painful bug ... :)

Bryan, just install the 2.6.8-1.521 src.rpm, go to where the
sources/patches were extracted (usually /usr/src/redhat/SOURCES),
apply Alexandre's patch in attachment (id=105676), and rebuild the
kernel rpm.

Comment 21 Warren Togami 2004-10-29 07:57:05 UTC
*** Bug 137518 has been marked as a duplicate of this bug. ***

Comment 22 Dave Jones 2004-11-01 19:05:53 UTC
will be fixed in next build.


Comment 23 Benny Amorsen 2004-11-03 19:33:33 UTC
2.6.9-1.667 does indeed fix the problem for me. As far as I'm
concerned this bug can be closed.