Description of problem: on RHEL3, U6 beta kernels, when running under the IBM pounder test suite, the following panic can be observed: ------------[ cut here ]------------ kernel BUG at page_alloc.c:391! invalid operand: 0000 nfsd nfs lockd sunrpc nls_iso8859-1 ide-cd cdrom udf audit usbserial lp parport autofs4 tg3 floppy sg microcode keybdev mousedev hid input usb-ohci usbcore ex CPU: 0 EIP: 0060:[<c0159120>] Not tainted EFLAGS: 00010202 EIP is at rmqueue [kernel] 0x310 (2.4.21-32.11.ELsmp/i686) eax: 0104008c ebx: c03a8e00 ecx: 00001000 edx: 0003372f esi: 00037000 edi: c03a8e00 ebp: c1c0ef30 esp: d7241d00 ds: 0068 es: 0068 ss: 0068 Process snake.exe (pid: 5017, stackpage=d7241000) Stack: 00030481 00030480 00001000 00000000 00000000 0003272f 0003272c 00000202 00000000 c03a8e00 c03a8e00 c03ab428 00000003 00000000 c01592fd c01593f4 c03ab42c 00000000 000001d2 00000000 c01593f4 c03ab420 00000000 00000003 Call Trace: [<c01592fd>] __alloc_pages_limit [kernel] 0x7d (0xd7241d38) [<c01593f4>] __alloc_pages [kernel] 0xb4 (0xd7241d3c) [<c01593f4>] __alloc_pages [kernel] 0xb4 (0xd7241d50) [<c015d85c>] shmem_getpage_locked [kernel] 0x2ac (0xd7241d94) [<c015d9a3>] shmem_getpage [kernel] 0x63 (0xd7241db8) [<c015da39>] shmem_nopage [kernel] 0x39 (0xd7241dd8) [<c0143648>] do_no_page [kernel] 0x138 (0xd7241df0) [<c0243d79>] rt_hash_code [kernel] 0x29 (0xd7241e44) [<c024a035>] ip_local_deliver_finish [kernel] 0xc5 (0xd7241e78) [<c0143ea1>] handle_mm_fault [kernel] 0xe1 (0xd7241eb8) [<c012006c>] do_page_fault [kernel] 0x14c (0xd7241ef4) [<f8ab2378>] tg3_restart_ints [tg3] 0x28 (0xd7241f1c) [<c0135422>] timer_bh [kernel] 0x62 (0xd7241f48) [<c0230229>] net_rx_action [kernel] 0x99 (0xd7241f50) [<c012fff5>] bh_action [kernel] 0x55 (0xd7241f5c) [<c012fe97>] tasklet_hi_action [kernel] 0x67 (0xd7241f64) [<c010e018>] do_IRQ [kernel] 0x148 (0xd7241f98) [<c011ff20>] do_page_fault [kernel] 0x0 (0xd7241fb0) Code: 0f 0b 87 01 1f ed 2b c0 8b 45 18 a9 00 01 00 00 74 08 0f 0b Kernel panic: Fatal exception Version-Release number of selected component (if applicable): How reproducible: sometimes Steps to Reproduce: 1.Boot RHEL3 U6 beta kernel 2.Run IBMs pounder test suite 3. Actual results: Expected results: Additional info:
Does this occur on other arches besides i386/i686?
Please make the "IBM pounder test suite" available, with instructions on how to use it.
Jeff Burke, By any chance do you already have this test suite running amongst your bag of tricks?
Dave, I am currently running it on RHEL4-U2 beta but I can reboot into RHEL3-U6 if you would like. I have also made changes to the pounder21.tgz that we originally got from IBM. The changes allow it to fully operate in our environment. If you have a RHEL3 U6 test system I can set it up on there. Jeff
No, I don't have a test i386 machine to test it on -- that was going to be my next question to you -- you guys have hoarded all the hardware!
The pounder test suite is now running on an i686 test machine here in Westford; so we'll wait for it to crash, and look at the resultant netdump. Many thanks to Jeff Burke for his help in setting this up.
Ok, we can reproduce the "kernel BUG at page_alloc.c:391!" in-house with the pounder test: crash> bt PID: 31 TASK: f6ec0000 CPU: 2 COMMAND: "kjournald" #0 [f6ec1afc] netconsole_netdump at fa433783 #1 [f6ec1b10] try_crashdump at c0128e83 #2 [f6ec1b20] die at c010c682 #3 [f6ec1b34] do_invalid_op at c010c892 #4 [f6ec1bd4] error_code (via invalid_op) at c03f61c0 EAX: 012c0008 EBX: c03a8e80 ECX: 00001000 EDX: 00017c1b EBP: c1591680 DS: 0068 ESI: 00037000 ES: 0068 EDI: c03a8e80 CS: 0060 EIP: c015921e ERR: ffffffff EFLAGS: 00010206 #5 [f6ec1c10] rmqueue at c015921e #6 [f6ec1c4c] __alloc_pages_limit at c0159408 #7 [f6ec1c64] __alloc_pages at c015956f #8 [f6ec1ca8] alloc_bounce_page at c0161a5e #9 [f6ec1cb4] create_bounce at c0161c17 #10 [f6ec1cf8] __make_request at c01d3064 #11 [f6ec1d54] generic_make_request at c01d37a7 #12 [f6ec1d7c] lvm_push_callback at f88509c2 #13 [f6ec1d94] lvm_map at f885057f #14 [f6ec1dec] lvm_make_request_fn at f8850a62 #15 [f6ec1df8] generic_make_request at c01d37a7 #16 [f6ec1e20] submit_bh_rsector at c01d3844 #17 [f6ec1e3c] ll_rw_block at c01d3c60 #18 [f6ec1e64] journal_commit_transaction at f8864e01 #19 [f6ec1fb0] kjournald at f88675a5 #20 [f6ec1ff0] kernel_thread_helper at c01095ab crash> This is the page that, during an alloc_bounce_page() allocation, rmqueue() removed this page from the free page list: crash> page c1591680 struct page { list = { next = 0xc13739e8, prev = 0xc1803ef4 }, mapping = 0xea467a44, index = 0x18a095d, next_hash = 0x0, count = { counter = 0x1 }, flags = 0x12c0008, lru = { next = 0xc1373a04, prev = 0xc1803f10 }, pte = { chain = 0x0, direct = 0x0 }, age = 0x1, pprev_hash = 0xc98a4a80, buffers = 0x0, virtual = 0xd7c1b000 } The counter of 1 is OK, as rmqueue() just bumped it, but its relevant flags bits equate to: PG_uptodate PG_lru PG_active_cache PG_fresh_page This is pretty bad. I haven't a clue how it could get into this state. The setting of the PG_lru, PG_active_cache and PG_fresh_page bits, would have to have been done after the page was originally freed. Larry, any ideas on how to possibly debug this?
Upon further investigation, the problem is not due to a single page being mishandled. Rather, one of the LRU lists, the "active cache" list in the first vmcore, and in my latest vmcore, the "inactive dirty" list, gets linked into a buddy-allocator free list. As soon as that happens, literally thousands of LRU list pages get linked into one of the buddy-allocator's free lists. Eventually rmqueue() come along, processing a page allocation request, and unlinks one of the LRU pages from a free list. But even given this new piece of debug information, I'm still at a loss as to how to catch this bogus LRU-to-free-list manipulation in the act.
It's probably worth noting that during a subsequent "pounder" run on the same test machine in Westford, the system experienced a catastrophic root filesystem melt-down (which was where the poinder test suite was running), requiring a re-installation. I don't know whether the transposing of one of the page cache lists onto one of the buddy allocator free lists could have somehow caused a misdirected filesystem write to occur? What's bothersome is that the buddy allocator free list pages are linked using the page->list list_head, which is also used when linking pages used by an inode. Note that the page cache LRU lists are linked with the page->lru list_head, so when they are traversed, they would never veer off into the buddy allocator list. Anyway, the rmqueue() BUG() happens well after the point of corruption. Code inspection of the locations in the kernel using the page->list list_head doesn't show any obvious manners of a page being erroneously connected to a buddy allocator list. I'll keep adding debug code and re-testing in hopes of catching it earlier in time.
tao: please pass this request back to the IBM test team: If you run the pounder test suite *without* invoking any NFS tests, can you still reproduce the problem?
The Westford machine that we were originally able to reproduce the problem on was re-installed after the root filesystem corruption occurred. A fresh RHEL-U6 installation was done. However, since that time we cannot get the problem to reproduce itself because the pounder test suite, shortly after the NFS tests start, eventually causes the system to degenerate into the issuance a non-ending stream of the ENOMEM error messages: RPC: sendmsg returned error 12 flooding the logs, and effectively shutting the machine down. We don't understand why this did not occur prior to the re-installation. Unfortunately, when running without the NFS tests, we cannot reproduce the failure, although that is not to say that the problem is specific to NFS. That is why the question to IBM in comment #14 was posed, and we're still interested in their answer.
> Turning off NFS does not affect my x445. The box still hangs. > looks like turning off nfs still crashes the system... No, he says "The box still hangs." -- which completely confuses the issue. We are specifically debugging the "kernel BUG at page_alloc.c:391!" BUG(), which most definitely causes a system *crash*. We have never seen any "hangs" while running the pounder test suite. Are we even talking about the same problem now? In any case, I'll run our machine in Westford without any highmem (mem=1GB), and see if it can avoid the "sendmsg" error flurry. But again, I have absolutely no idea what he's talking about regarding "hangs". If the system *hangs*, then they need to send us alt-sysrq-w output, or force-crash the system with alt-sysrq-c. And then file a completely new issue, because the "kernel BUG at page_alloc.c:391!" crash is most definitely not a hang. Please clarify...
A fix for this problem has just been committed to the RHEL3 U7 patch pool this evening (in kernel version 2.4.21-37.9.EL).
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0144.html
Internal Status set to 'Resolved' Status set to: Closed by Client Resolution set to: 'Closed by Client' This event sent from IssueTracker by Chris McDermott issue 68975