168681 – kernel BUG at page_alloc.c:391!

Bug 168681 - kernel BUG at page_alloc.c:391!

Summary: kernel BUG at page_alloc.c:391!

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Dave Anderson
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	168424
TreeView+	depends on / blocked

Reported:	2005-09-19 14:58 UTC by Neil Horman
Modified:	2007-11-30 22:07 UTC (History)
CC List:	8 users (show)
Fixed In Version:	RHSA-2006-0144
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-03-15 16:40:10 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2006:0144	0	qe-ready	SHIPPED_LIVE	Moderate: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 7	2006-03-15 05:00:00 UTC

Description Neil Horman 2005-09-19 14:58:47 UTC

Description of problem:
on RHEL3, U6 beta kernels, when running under the IBM pounder test suite, the
following panic can be observed:
------------[ cut here ]------------
kernel BUG at page_alloc.c:391!
invalid operand: 0000
nfsd nfs lockd sunrpc nls_iso8859-1 ide-cd cdrom udf audit usbserial lp parport
autofs4 tg3 floppy sg microcode keybdev mousedev hid input usb-ohci usbcore ex
CPU:    0
EIP:    0060:[<c0159120>]    Not tainted
EFLAGS: 00010202

EIP is at rmqueue [kernel] 0x310 (2.4.21-32.11.ELsmp/i686)
eax: 0104008c   ebx: c03a8e00   ecx: 00001000   edx: 0003372f
esi: 00037000   edi: c03a8e00   ebp: c1c0ef30   esp: d7241d00
ds: 0068   es: 0068   ss: 0068
Process snake.exe (pid: 5017, stackpage=d7241000)
Stack: 00030481 00030480 00001000 00000000 00000000 0003272f 0003272c 00000202
       00000000 c03a8e00 c03a8e00 c03ab428 00000003 00000000 c01592fd c01593f4
       c03ab42c 00000000 000001d2 00000000 c01593f4 c03ab420 00000000 00000003
Call Trace:   [<c01592fd>] __alloc_pages_limit [kernel] 0x7d (0xd7241d38)
[<c01593f4>] __alloc_pages [kernel] 0xb4 (0xd7241d3c)
[<c01593f4>] __alloc_pages [kernel] 0xb4 (0xd7241d50)
[<c015d85c>] shmem_getpage_locked [kernel] 0x2ac (0xd7241d94)
[<c015d9a3>] shmem_getpage [kernel] 0x63 (0xd7241db8)
[<c015da39>] shmem_nopage [kernel] 0x39 (0xd7241dd8)
[<c0143648>] do_no_page [kernel] 0x138 (0xd7241df0)
[<c0243d79>] rt_hash_code [kernel] 0x29 (0xd7241e44)
[<c024a035>] ip_local_deliver_finish [kernel] 0xc5 (0xd7241e78)
[<c0143ea1>] handle_mm_fault [kernel] 0xe1 (0xd7241eb8)
[<c012006c>] do_page_fault [kernel] 0x14c (0xd7241ef4)
[<f8ab2378>] tg3_restart_ints [tg3] 0x28 (0xd7241f1c)
[<c0135422>] timer_bh [kernel] 0x62 (0xd7241f48)
[<c0230229>] net_rx_action [kernel] 0x99 (0xd7241f50)
[<c012fff5>] bh_action [kernel] 0x55 (0xd7241f5c)
[<c012fe97>] tasklet_hi_action [kernel] 0x67 (0xd7241f64)
[<c010e018>] do_IRQ [kernel] 0x148 (0xd7241f98)
[<c011ff20>] do_page_fault [kernel] 0x0 (0xd7241fb0)

Code: 0f 0b 87 01 1f ed 2b c0 8b 45 18 a9 00 01 00 00 74 08 0f 0b

Kernel panic: Fatal exception


Version-Release number of selected component (if applicable):


How reproducible:
sometimes

Steps to Reproduce:
1.Boot RHEL3 U6 beta kernel
2.Run IBMs pounder test suite
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Ernie Petrides 2005-09-19 22:21:05 UTC

Does this occur on other arches besides i386/i686?

Comment 2 Dave Anderson 2005-09-20 13:19:28 UTC

Please make the "IBM pounder test suite" available, with instructions
on how to use it.

Comment 4 Dave Anderson 2005-09-20 14:09:44 UTC

Jeff Burke, 

By any chance do you already have this test suite running
amongst your bag of tricks?

Comment 5 Jeff Burke 2005-09-20 14:29:30 UTC

Dave,
   I am currently running it on RHEL4-U2 beta but I can reboot into RHEL3-U6 if
you would like. 

   I have also made changes to the pounder21.tgz that we originally got from IBM.
The changes allow it to fully operate in our environment. If you have a RHEL3 U6
test system I can set it up on there.

Jeff

Comment 6 Dave Anderson 2005-09-20 14:35:20 UTC

No, I don't have a test i386 machine to test it on -- that was going
to be my next question to you -- you guys have hoarded all the hardware!

Comment 7 Dave Anderson 2005-09-20 19:09:41 UTC

The pounder test suite is now running on an i686 test machine
here in Westford; so we'll wait for it to crash, and look at the
resultant netdump.

Many thanks to Jeff Burke for his help in setting this up.

Comment 8 Dave Anderson 2005-09-21 14:28:46 UTC

Ok, we can reproduce the "kernel BUG at page_alloc.c:391!" in-house
with the pounder test:

crash> bt
PID: 31     TASK: f6ec0000  CPU: 2   COMMAND: "kjournald"
 #0 [f6ec1afc] netconsole_netdump at fa433783
 #1 [f6ec1b10] try_crashdump at c0128e83
 #2 [f6ec1b20] die at c010c682
 #3 [f6ec1b34] do_invalid_op at c010c892
 #4 [f6ec1bd4] error_code (via invalid_op) at c03f61c0
    EAX: 012c0008  EBX: c03a8e80  ECX: 00001000  EDX: 00017c1b  EBP: c1591680
    DS:  0068      ESI: 00037000  ES:  0068      EDI: c03a8e80
    CS:  0060      EIP: c015921e  ERR: ffffffff  EFLAGS: 00010206
 #5 [f6ec1c10] rmqueue at c015921e
 #6 [f6ec1c4c] __alloc_pages_limit at c0159408
 #7 [f6ec1c64] __alloc_pages at c015956f
 #8 [f6ec1ca8] alloc_bounce_page at c0161a5e
 #9 [f6ec1cb4] create_bounce at c0161c17
#10 [f6ec1cf8] __make_request at c01d3064
#11 [f6ec1d54] generic_make_request at c01d37a7
#12 [f6ec1d7c] lvm_push_callback at f88509c2
#13 [f6ec1d94] lvm_map at f885057f
#14 [f6ec1dec] lvm_make_request_fn at f8850a62
#15 [f6ec1df8] generic_make_request at c01d37a7
#16 [f6ec1e20] submit_bh_rsector at c01d3844
#17 [f6ec1e3c] ll_rw_block at c01d3c60
#18 [f6ec1e64] journal_commit_transaction at f8864e01
#19 [f6ec1fb0] kjournald at f88675a5
#20 [f6ec1ff0] kernel_thread_helper at c01095ab
crash>

This is the page that, during an alloc_bounce_page() allocation, rmqueue()
removed this page from the free page list:

crash> page c1591680
struct page {
  list = {
    next = 0xc13739e8,
    prev = 0xc1803ef4
  },
  mapping = 0xea467a44,
  index = 0x18a095d,
  next_hash = 0x0,
  count = {
    counter = 0x1
  },
  flags = 0x12c0008,
  lru = {
    next = 0xc1373a04,
    prev = 0xc1803f10
  },
  pte = {
    chain = 0x0,
    direct = 0x0
  },
  age = 0x1,
  pprev_hash = 0xc98a4a80,
  buffers = 0x0,
  virtual = 0xd7c1b000
}

The counter of 1 is OK, as rmqueue() just bumped it, but its relevant flags
bits equate to:

PG_uptodate
PG_lru
PG_active_cache
PG_fresh_page

This is pretty bad.  I haven't a clue how it could get into this state.
The setting of the PG_lru, PG_active_cache and PG_fresh_page bits, would 
have to have been done after the page was originally freed.

Larry, any ideas on how to possibly debug this?

Comment 12 Dave Anderson 2005-10-11 20:56:57 UTC

Upon further investigation, the problem is not due to a single
page being mishandled.  Rather, one of the LRU lists, the "active cache"
list in the first vmcore, and in my latest vmcore, the "inactive dirty" list,
gets linked into a buddy-allocator free list.  As soon as that happens,
literally thousands of LRU list pages get linked into one of the
buddy-allocator's free lists.  Eventually rmqueue() come along, processing
a page allocation request, and unlinks one of the LRU pages from a free list.

But even given this new piece of debug information, I'm still at a loss
as to how to catch this bogus LRU-to-free-list manipulation in the act.

Comment 13 Dave Anderson 2005-10-13 18:31:04 UTC

It's probably worth noting that during a subsequent "pounder" run on the
same test machine in Westford, the system experienced a catastrophic
root filesystem melt-down (which was where the poinder test suite was
running), requiring a re-installation.

I don't know whether the transposing of one of the page cache lists
onto one of the buddy allocator free lists could have somehow caused
a misdirected filesystem write to occur?  What's bothersome is that
the buddy allocator free list pages are linked using the page->list
list_head, which is also used when linking pages used by an inode.  

Note that the page cache LRU lists are linked with the page->lru list_head,
so when they are traversed, they would never veer off into the buddy allocator
list.

Anyway, the rmqueue() BUG() happens well after the point of corruption.
Code inspection of the locations in the kernel using the page->list
list_head doesn't show any obvious manners of a page being erroneously
connected to a buddy allocator list.  I'll keep adding debug code and
re-testing in hopes of catching it earlier in time.

Comment 14 Dave Anderson 2005-10-17 13:38:11 UTC

tao: please pass this request back to the IBM test team:

If you run the pounder test suite *without* invoking any NFS tests,
can you still reproduce the problem?

Comment 15 Dave Anderson 2005-10-20 14:59:11 UTC

The Westford machine that we were originally able to reproduce the
problem on was re-installed after the root filesystem corruption
occurred.  A fresh RHEL-U6 installation was done.  However, since
that time we cannot get the problem to reproduce itself because the
pounder test suite, shortly after the NFS tests start, eventually
causes the system to degenerate into the issuance a non-ending stream
of the ENOMEM error messages:

RPC: sendmsg returned error 12

flooding the logs, and effectively shutting the machine down.  We
don't understand why this did not occur prior to the re-installation.

Unfortunately, when running without the NFS tests, we cannot reproduce
the failure, although that is not to say that the problem is specific
to NFS.  That is why the question to IBM in comment #14 was posed,
and we're still interested in their answer.

Comment 17 Dave Anderson 2005-10-25 18:34:21 UTC


> Turning off NFS does not affect my x445.  The box still hangs.

> looks like turning off nfs still crashes the system...

No, he says "The box still hangs." -- which completely confuses the issue.

We are specifically debugging the "kernel BUG at page_alloc.c:391!"
BUG(), which most definitely causes a system *crash*.

We have never seen any "hangs" while running the pounder test suite.
Are we even talking about the same problem now?

In any case, I'll run our machine in Westford without any highmem (mem=1GB),
and see if it can avoid the "sendmsg" error flurry.  But again, I have 
absolutely no idea what he's talking about regarding "hangs".  If the system
*hangs*, then they need to send us alt-sysrq-w output, or force-crash the
system with alt-sysrq-c.  And then file a completely new issue, because the
"kernel BUG at page_alloc.c:391!" crash is most definitely not a hang.

Please clarify...

Comment 23 Ernie Petrides 2005-11-18 04:57:41 UTC

A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.9.EL).

Comment 28 Red Hat Bugzilla 2006-03-15 16:40:10 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0144.html

Comment 31 Issue Tracker 2007-06-15 22:23:54 UTC

Internal Status set to 'Resolved'
Status set to: Closed by Client
Resolution set to: 'Closed by Client'

This event sent from IssueTracker by Chris McDermott 
 issue 68975

Note You need to log in before you can comment on or make changes to this bug.