Bug 491685 - vmalloc_user() panics 2.6.18-128.1.1.el5 if a kmem cache grows
Summary: vmalloc_user() panics 2.6.18-128.1.1.el5 if a kmem cache grows
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: All
OS: Linux
low
high
Target Milestone: ---
: ---
Assignee: Jiri Olsa
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-03-23 16:15 UTC by Brice Goglin
Modified: 2009-09-02 08:58 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 08:58:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
kernel module reproducing the BUG (830 bytes, application/x-bzip-compressed-tar)
2009-03-23 16:15 UTC, Brice Goglin
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1243 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.4 kernel security and bug fix update 2009-09-01 08:53:34 UTC

Description Brice Goglin 2009-03-23 16:15:14 UTC
Created attachment 336313 [details]
kernel module reproducing the BUG

Description of problem:
=======================
If vmalloc_user() is called and a kmem cache must grow before allocating vmalloc internal structures, the kernel panics (because of BUG_ON() at mm/slab.c:2650) because __GFP_ZERO is passed from vmalloc_user() down to cache_grow().



Version-Release number:
=======================
kernel 2.6.18-128.1.1.el5



How reproducible:
=================
Very easy to reproduce. Just call vmalloc_user() many times (without vfree() in between) until one of the kernel kmem caches has to grow.

Steps to Reproduce:
===================
Compile the attached kernel module and load it with module parameter nb=1000 (for instance). It will call vmalloc_user() 1000 times. Assuming 1000 is enough to get one of the kmem caches to grow, the kernel will panic.

Panics on amd64, ia64 and i386.

Vanilla kernels do not fail. Seems to be specific to Redhat patches.


  
Actual results:
===============
Here's the BUG/panic from dmesg below.

Note: "tainted" probably comes from my earlier attempts to reproduce without MODULE_LICENSE() in the attached module. No proprietary/custom module is actually loaded.

[...]
trying vmalloc_user #182
Kernel BUG at mm/slab.c:2650
invalid opcode: 0000 [1] SMP 
last sysfs file: /class/net/lo/operstate
CPU 0 
Modules linked in: vmalloc(U) ipv6 xfrm_nalgo crypto_api nfs lockd fscache nfs_2
Pid: 4475, comm: insmod Tainted: G      2.6.18-128.1.1.el5 #1
RIP: 0010:[<ffffffff80017388>]  [<ffffffff80017388>] cache_grow+0x1e/0x395
RSP: 0018:ffff81012a2c9e08  EFLAGS: 00010006
RAX: 0000000000000000 RBX: 00000000000080d0 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 00000000000080d0 RDI: ffff8101042ca040
RBP: ffff8101042ced60 R08: ffff8101042f5000 R09: ffff8101042f9000
R10: 0000000000000000 R11: 0000000000000080 R12: ffff8101042ca040
R13: ffff8101042ced40 R14: 0000000000000000 R15: ffff8101042ca040
FS:  00002af3a41926f0(0000) GS:ffffffff803ac000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000001612102f CR3: 000000012a929000 CR4: 00000000000006e0
Process insmod (pid: 4475, threadinfo ffff81012a2c8000, task ffff81012fd0e0c0)
Stack:  000000d000000040 0000004400000001 000000000000013f 000000000000013f
 ffff810126f6c000 00000000ffffffff ffff8101042ced60 ffff8101042f5000
 ffff8101042ced40 000000000000003c ffff8101042ca040 ffffffff8005bc9a
Call Trace:
 [<ffffffff8005bc9a>] cache_alloc_refill+0x136/0x186
 [<ffffffff800d6f51>] kmem_cache_alloc_node+0x98/0xb2
 [<ffffffff800cd111>] __vmalloc_area_node+0x62/0x153
 [<ffffffff800cd457>] vmalloc_user+0x15/0x50
 [<ffffffff882e4042>] :vmalloc:init_module+0x41/0x93
 [<ffffffff800a3e6a>] sys_init_module+0xaf/0x1e8
 [<ffffffff8005d116>] system_call+0x7e/0x83


Code: 0f 0b 68 34 ca 29 80 c2 5a 0a f6 c7 20 0f 85 53 03 00 00 89 
RIP  [<ffffffff80017388>] cache_grow+0x1e/0x395
 RSP <ffff81012a2c9e08>
 <0>Kernel panic - not syncing: Fatal exception



Additional info:
================
Each vmalloc area requires an array of page pointers to be allocated first. If you allocate many areas with vmalloc_user() without freeing them in between, one of the kernel object allocator caches will end-up having to grow, and it will panic because of BUG() at mm/slab.c:2650.

I reproduced the bug by just allocating a lot of memory areas with vmalloc_user(). Extract the attached tarball, compile it with make, and load it with "insmod vmalloc.ko nb=1000". Note that "nb" should be chosen to make sure that the kernel allocator will have to grow its cache, which may happen after a couple or a lot of iterations depending on the cache status (default is nb=50).

What's happening is:
1) vmalloc_user() calls __vmalloc() with gfp_mask containing __GFP_ZERO. This one then calls __vmalloc_node() which calls __vmalloc_area_node()
2) __vmalloc_area_node() calls kmalloc_node() to allocate the array of pages. It may actually use __vmalloc_node() instead if the array is very large (Open-MX uses this path), but we'll end up in __vmalloc_area_node() calling kmalloc_node() later anyway.
3) kmalloc_node() is called with the same gfp_mask except that __GFP_HIGHMEM is removed. __GFP_ZERO is still there.
4) kmem_cache_alloc_node() is called, it probably goes inside ____cache_alloc() since the machine isn't NUMA
5) if the cache has to grow, cache_alloc_refill() is called, and it calls cache_grow() with the flags containing GFP_ZERO
6) cache_grow() fails at mm/slab.c:2650:
        BUG_ON(flags & ~(SLAB_DMA | SLAB_LEVEL_MASK | SLAB_NO_GROW));
because of __GFP_ZERO isn't in SLAB_DMA | SLAB_LEVEL_MASK | SLAB_NO_GROW.

One way to solve this might be to clear __GFP_ZERO from gfp_mask in (3) since GFP_ZERO only applies to the actual vmalloc'ed pages, not to the array that will point to these struct page. Or maybe just change the BUG_ON() line at slab.c:2650 to accept GFP_ZERO, but it looks risky to me...

All this may be related to the following commit in 2.6.23. It confirms that passing __GFP_ZERO down to cache_grow is not supposed to work with old kernels such as Redhat's 2.6.18...

commit 94f6030ca792c57422f04a73e7a872d8325946d3
Author: Christoph Lameter <clameter>
Date:   Tue Jul 17 04:03:29 2007 -0700

Slab allocators: Replace explicit zeroing with __GFP_ZERO
   
kmalloc_node() and kmem_cache_alloc_node() were not available in a zeroing
variant in the past.  But with __GFP_ZERO it is possible now to do zeroing
while allocating.
 
Use __GFP_ZERO to remove the explicit clearing of memory via memset whereever
we can.

Comment 1 Brice Goglin 2009-03-23 17:43:15 UTC
When I said "vanilla is ok", I meant "recent vanilla". I haven't actually tested vanilla 2.6.18, but Debian's 2.6.18 seems to have the same problem.

The same problem has been discussed at http://www.nabble.com/-PATCH%2CRFC--Add-__GFP_ZERO-to-GFP_LEVEL_MASK-tt6803305.html#a6803305

It has been fixed in 2.6.19 with the following patch. Redhat should apply it. Thanks.


commit 286e1ea3ac1ca4f503ebbb3020bdb0cbe6adffac
Author: Andrew Morton <akpm>
Date:   Tue Oct 17 00:09:57 2006 -0700

[PATCH] vmalloc(): don't pass __GFP_ZERO to slab
    
A recent change to the vmalloc() code accidentally resulted in us passing
__GFP_ZERO into the slab allocator.  But we only wanted __GFP_ZERO for the
actual pages whcih are being vmalloc()ed, and passing __GFP_ZERO into slab is
not a rational thing to ask for.
    
Cc: Jonathan Corbet <corbet>
Signed-off-by: Andrew Morton <akpm>
Signed-off-by: Linus Torvalds <torvalds>

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 750ab6e..1133dd3 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -428,8 +428,11 @@ void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
        if (array_size > PAGE_SIZE) {
                pages = __vmalloc_node(array_size, gfp_mask, PAGE_KERNEL, node);
                area->flags |= VM_VPAGES;
-       } else
-               pages = kmalloc_node(array_size, (gfp_mask & ~__GFP_HIGHMEM), node);
+       } else {
+               pages = kmalloc_node(array_size,
+                               (gfp_mask & ~(__GFP_HIGHMEM | __GFP_ZERO)),
+                               node);
+       }
        area->pages = pages;
        if (!area->pages) {
                remove_vm_area(area->addr);

Comment 2 RHEL Program Management 2009-04-27 14:49:14 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 Don Zickus 2009-05-06 17:17:06 UTC
in kernel-2.6.18-144.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 5 Brice Goglin 2009-06-14 10:09:56 UTC
Sorry for the delay, I couldn't test earlier.
2.6.18-144.el5 isn't available in your directory anymore.
But 2.6.18-153.el5 seems to work fine, thanks.

Comment 6 Brice Goglin 2009-06-15 16:23:33 UTC
By the way, is there an easy way to know at module-built time whether your fix was applied? Maybe by checking if RHEL_RELEASE_CODE is bigger than 1283 in include/linux/version.h ?

thanks,
Brice

Comment 10 errata-xmlrpc 2009-09-02 08:58:59 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html


Note You need to log in before you can comment on or make changes to this bug.