Red Hat Bugzilla – Bug 474347
[REG][5.3] Kernel panics when you prepare hugepages.
Last modified: 2015-05-04 21:15:59 EDT
This bug has been copied from bug #472802 and has been proposed
to be backported to 5.2 z-stream (EUS).
This bugzilla has Keywords: Regression.
Since no regressions are allowed between releases,
it is also being proposed as a blocker for this release.
Please resolve ASAP.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.
*** Bug 473941 has been marked as a duplicate of this bug. ***
added email@example.com to the cc list for mirroring to IBM
Can you reproduce this problem on a Power box with rhel5.3 snap4 by following the steps below (posted by Larry) and post your findings asap?
To reproduce the problem I just ran these commands over and over until ths
# cat /proc/meminfo look for no hugepages allocated
# echo 100 > /proc/sys/vm/nr_hugepages allocate 100 hugepages
# cat /proc/meminfo look for 100 hugepages allocated
# echo 0 > /proc/sys/vm/nr_hugepages free the 100 hugepages
The system panic()'d within a few itterations without the patch but it stays up
forever with the patch applied. The act of allocating hugepages overflows the
kernel stack and corrupts the memory below it so the system will crash as soon
as the overflow results in corruption that damages anything important.
I was not able to reproduce this issue here.
[root@keechi-lp1 ~]# uname -a
Linux keechi-lp1.ltc.austin.ibm.com 2.6.18-124.el5 #1 SMP Mon Nov 17 16:58:59 EST 2008 ppc64 ppc64 ppc64 GNU/Linux
[root@keechi-lp1 ~]# cat /proc/meminfo
MemTotal: 33452928 kB
MemFree: 31227136 kB
Buffers: 97920 kB
Cached: 191488 kB
SwapCached: 0 kB
Active: 268928 kB
Inactive: 145664 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 33452928 kB
LowFree: 31227136 kB
SwapTotal: 1048448 kB
SwapFree: 1048448 kB
Dirty: 512 kB
Writeback: 0 kB
AnonPages: 124736 kB
Mapped: 47616 kB
Slab: 125440 kB
PageTables: 9024 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
CommitLimit: 16955712 kB
Committed_AS: 449856 kB
VmallocTotal: 8589934592 kB
VmallocUsed: 11776 kB
VmallocChunk: 8589921856 kB
Hugepagesize: 16384 kB
With 10.000 iterations of Larry's steps I could not trigger this issue:
Ramon could not reproduce this bug with rhel5.3 snap4 on a p 575 (with 32GB memory).
Can you check with RedHat to see if there any specific system setup/configuration was used when they reproduced the bug.
This has been a difficult bug for me to reproduce as well, I've done it sucessfully on an ia64 system with a very large amount of memory. Try putting the system under a load which uses most of the memory (ltp-stress, sys_basher's memory test or something similar) then try the reproducer again.
IBM, it turns out this repro case only occurs on IA64 however it would be good to run POWER and x86/64 through the usual largepage testing to assure the patch does not affect any other functionality.
I found a little hugepage test program on lkml, I think the program itself is
buggy, but it along with toggling hugepages as described before will reproduce
the bug very quickly and easily with the -124 kernel on ia64. I couldn't reproduce it in about 30 minutes of running with the -125 kernel.
*** This bug has been marked as a duplicate of 474347 ***