From Bugzilla Helper: User-Agent: Mozilla/4.79 [en] (Windows NT 5.0; U) Description of problem: When compiling using gcc, kernel runing on my node "server" reports error: server kernel: Assertion failure in unmap_underlying_metadata() at buffer.c: 1540: "!buffer_jlist_eq(old_bh, 3)". The errors occurs at random times during th compilation sequence, and appears to occur in association with a cascade of file system faults Version-Release number of selected component (if applicable): 2.4.7-10 How reproducible: Sometimes Steps to Reproduce: 1. Run gcc using command like "cc -c -o myprog.o myprog.c". I expect that gcc is not the cause, just a suitable tool to trigger the fault in the kernel. 2. The above message is generated 3. The is also at times a cascade of log messages of form such as Feb 2 22:22:58 server kernel: EXT3-fs error (device ide(22,1)): ext3_free_blocks: bit already cleared for block 17844 Actual Results: System hangs with message (time) server kernel: kernel BUG at buffer.c:1540! (time) server kernel: invalid operand: 0000 (time) server kernel: CPU: 0 (time) server kernel: EIP: 0010[unmap_underlying_metadata+180/244] (time) server kernel: EIP: 0010[<c01322f78>] (time) server kernel: EFLAGS: 00010282 ... plus lots more of probably decreasing diagnostic value Expected Results: Nothing, I guess Additional info: I have tried a new copy of vmlinuz, run a full bad block check ("badblocks -nvs /dev/hda"), and a full run of fsck
I have today noticed a similar bug report at: http://www.cygnus.co.uk/mailing-lists/ext3-users/msg00978.html (A reply in corresponding URL http://www.cygnus.co.uk/mailing-lists/ext3-users/msg00981.html did not offer any resolution)
First of all, upgrading to the errata 2.4.9-21 kernel is recommened; there are a few ext3 corner cases fixed in that. However you're the first to report THIS failure; do you have anything not-so-common as your hardware ? Do you have any uncommon modules loaded ?
The problem has now been traced, to a failed fan on the processor chip. Presumably the errors were associated with instructions using the most temperature-sensitive pathways on the chip. It's good to confirm that it takes a severe hardware failure to take out Linux.
Sounds like "NOTABUG" to me.... closing. (well, NOTOURBUG anyway:) thanks for following up.