Red Hat Bugzilla – Bug 59212
kernel reports "Assertion failure" n unmap_underlying_metadata() at buffer.c: 1540
Last modified: 2007-04-18 12:39:41 EDT
From Bugzilla Helper:
User-Agent: Mozilla/4.79 [en] (Windows NT 5.0; U)
Description of problem:
When compiling using gcc, kernel runing on my node "server" reports error:
server kernel: Assertion failure in unmap_underlying_metadata() at buffer.c: 1540: "!buffer_jlist_eq(old_bh, 3)".
The errors occurs at random times during th compilation sequence, and appears to occur in association with a cascade of file system faults
Version-Release number of selected component (if applicable): 2.4.7-10
Steps to Reproduce:
1. Run gcc using command like "cc -c -o myprog.o myprog.c". I expect that gcc is not the cause, just a suitable tool to trigger the fault in the kernel.
2. The above message is generated
3. The is also at times a cascade of log messages of form such as
Feb 2 22:22:58 server kernel: EXT3-fs error (device ide(22,1)): ext3_free_blocks: bit already cleared for block 17844
Actual Results: System hangs with message
(time) server kernel: kernel BUG at buffer.c:1540!
(time) server kernel: invalid operand: 0000
(time) server kernel: CPU: 0
(time) server kernel: EIP: 0010[unmap_underlying_metadata+180/244]
(time) server kernel: EIP: 0010[<c01322f78>]
(time) server kernel: EFLAGS: 00010282
... plus lots more of probably decreasing diagnostic value
Expected Results: Nothing, I guess
I have tried a new copy of vmlinuz, run a full bad block check ("badblocks -nvs /dev/hda"), and a full run of fsck
I have today noticed a similar bug report at:
(A reply in corresponding URL http://www.cygnus.co.uk/mailing-lists/ext3-users/msg00981.html did not offer any resolution)
First of all, upgrading to the errata 2.4.9-21 kernel is recommened; there are a
few ext3 corner cases fixed in that.
However you're the first to report THIS failure; do you have anything
not-so-common as your hardware ? Do you have any uncommon modules loaded ?
The problem has now been traced, to a failed fan on the processor chip. Presumably the errors were associated with instructions using the most
temperature-sensitive pathways on the chip.
It's good to confirm that it takes a severe hardware failure to take out Linux.
Sounds like "NOTABUG" to me.... closing. (well, NOTOURBUG anyway:)
thanks for following up.