While trying to compile OCaml under the current x86_64 Rawhide tree and the kernel-smp-2.4.20-18.9 from http://people.redhat.com/arjanv/amd64/, got the following Oops: ocamlrun[4543]: segfault at 000000000000002f rip 0000002a95c9d455 rsp 0000007fbfffd7b0 error 4 Unable to handle kernel paging request at virtual address ffffffffffffffff printing rip: ffffffff80169bcb PML4 103027 PGD 2067 PMD 0 Oops: 0002 CPU 0 Pid: 4543, comm: ocamlrun Not tainted RIP: 0010:[<ffffffff80169bcb>]{inode_init_once+11} RSP: 0000:00000103fa233a90 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000040 RCX: 000000000000005f RDX: 0000000000000001 RSI: ffffffffffffffff RDI: ffffffffffffffff RBP: ffffffffffffffff R08: 0000000000000000 R09: 00000100164fff98 R10: 0000000000000001 R11: 0000000000000000 R12: 000001001719a840 R13: 00000000000001f0 R14: 0000000000000000 R15: 00000103dfffe000 FS: 0000000000525e80(0000) GS:ffffffff804b8fc0(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffff CR3: 0000000000101000 CR4: 00000000000006e0 Call Trace: [<ffffffff80141faa>]{kmem_cache_grow+634} [<ffffffffa00157c5>]{:ext3:ext3_getblk+181} [<ffffffff801427e1>]{kmem_cache_alloc+641} [<ffffffff80169a6d>]{alloc_inode+45} [<ffffffff8016b5a9>]{new_inode+9} [<ffffffff801509a2>]{end_buffer_io_sync+34} [<ffffffffa001393b>]{:ext3:ext3_new_inode+75} [<ffffffffa0008f8c>]{:jbd:.rodata.str1.1+60} [<ffffffffa0000332>]{:jbd:start_this_handle+306} [<ffffffffa0019634>]{:ext3:ext3_create+164} [<ffffffff8015defe>]{vfs_create+430} [<ffffffff8015dbd3>]{lookup_hash+243} [<ffffffff8015e0fe>]{open_namei+350} [<ffffffff8014e333>]{filp_open+51} [<ffffffff8015aee1>]{do_coredump+289} [<ffffffff8012dc7a>]{collect_signal+202} [<ffffffff8010ee70>]{do_signal+768} [<ffffffff8015c2d0>]{permission+224} [<ffffffff8010f99a>]{error_signal_test+0} Process ocamlrun (pid: 4543, stackpage=103fa233000) Stack: 00000103fa233a90 0000000000000000 ffffffff80141faa 000001001719a87c 0000000000000001 0000000000000000 0000010001000048 0000000000000202 ffffffffa00157c5 000001001719a87c 0000000000000000 00000103fceea928 00000000000001f0 000001001719a840 00000103fceea928 0000000000000000 00000103e07a4258 0000000000000180 ffffffff801427e1 0000000000000010 0000000000000246 0000000000000001 00000103fa233c48 0000000000000000 00000103fed75800 00000103fceea928 00000103fc0c3a60 00000103e07a4258 ffffffff80169a6d 4e4d4c4b4a494847 0000000000008180 00000103fed75800 ffffffff8016b5a9 3736353433323130 ffffffff801509a2 00000103fcb5b988 ffffffffa001393b 000001007a797877 0000000000000001 0000000000000000 Call Trace: [<ffffffff80141faa>]{kmem_cache_grow+634} [<ffffffffa00157c5>]{:ext3:ext3_getblk+181} [<ffffffff801427e1>]{kmem_cache_alloc+641} [<ffffffff80169a6d>]{alloc_inode+45} [<ffffffff8016b5a9>]{new_inode+9} [<ffffffff801509a2>]{end_buffer_io_sync+34} [<ffffffffa001393b>]{:ext3:ext3_new_inode+75} [<ffffffffa0008f8c>]{:jbd:.rodata.str1.1+60} [<ffffffffa0000332>]{:jbd:start_this_handle+306} [<ffffffffa0019634>]{:ext3:ext3_create+164} [<ffffffff8015defe>]{vfs_create+430} [<ffffffff8015dbd3>]{lookup_hash+243} [<ffffffff8015e0fe>]{open_namei+350} [<ffffffff8014e333>]{filp_open+51} [<ffffffff8015aee1>]{do_coredump+289} [<ffffffff8012dc7a>]{collect_signal+202} [<ffffffff8010ee70>]{do_signal+768} [<ffffffff8015c2d0>]{permission+224} [<ffffffff8010f99a>]{error_signal_test+0} Code: f3 48 ab 48 8d 96 18 01 00 00 48 b9 01 00 00 00 ad 4e ad de "cat /proc/version": Linux version 2.4.20-18.9smp (bhcompile.redhat.com) (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)) #1 SMP Thu May 29 06:45:34 EDT 2003
This seems to be perfectly reproducible - after reboot I tried compiling again and at the exact same place (sh ./runocamldoc true -man -d stdlib_man ...) machine Oopsed and froze.
The same problem exists in 2.5.69-ac1, but it exibits itself in a somewhat different way. At the same place in OCaml compilation process, I get: ocamlrun[4737] segfault at rip:2a95bda18d rsp:7fbfffe860 adr:fffffffffffffff7 err:4 Slab corruption: start=00000103dfffe000, expend=00000103dfffefff, problemat=00000103dfffe000 Data: FF FF [... a huge number of FFs ...] FF Next: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF slab error in check_poison_obj(): cache `size-4096': object was modified after freeing Call Trace:<ffffffff801d7432>{journal_get_undo_access+130} <ffffffff80164cc7>{kmalloc+231} <ffffffff801d7432>{journal_get_undo_access+130} <ffffffff801c4775>{ext3_new_block+933} <ffffffff801c7101>{ext3_alloc_block+17} <ffffffff801c74d5>{ext3_alloc_branch+85} <ffffffff801c7b7f>{ext3_get_block_handle+751} <ffffffff80184d1f>{alloc_buffer_head+111} <ffffffff80181f86>{create_buffers+102} <ffffffff8018310d>{__block_prepare_write+333} <ffffffff801c7c40>{ext3_get_block+0} <ffffffff80183c0a>{block_prepare_write+26} <ffffffff801c842e>{ext3_prepare_write+302} <ffffffff8015daa1>{generic_file_aio_write_nolock+1297} <ffffffff8016e47d>{do_anonymous_page+1661} <ffffffff801c79ad>{ext3_get_block_handle+285} <ffffffff8015e01d>{generic_file_aio_write+109} <ffffffff801c57f3>{ext3_file_write+35} <ffffffff8017f343>{do_sync_write+115} <ffffffff801d7961>{journal_dirty_metadata+465} <ffffffff80164469>{cache_alloc_refill+1129} <ffffffff80162b9c>{check_poison_obj+60} <ffffffff801b0956>{elf_core_dump+262} <ffffffff80164cc7>{kmalloc+231} <ffffffff801b02c2>{dump_write+18} <ffffffff801b0dce>{elf_core_dump+1406} <ffffffff801a57af>{__mark_inode_dirty+47} <ffffffff8019f043>{notify_change+483} <ffffffff8017d385>{do_truncate+69} <ffffffff8018d2e4>{do_coredump+452} <ffffffff80147e18>{__dequeue_signal+392} <ffffffff8014a85c>{get_signal_to_deliver+1548} <ffffffff801206bc>{do_page_fault+668} <ffffffff80111a8d>{do_signal+125} <ffffffff80171514>{do_brk+340} <ffffffff80112360>{retint_signal+62} Assertion failure in ext3_new_block() at fs/ext3/balloc.c:562: "!(__builtin_constant_p((ret_block)) ? constant_test_bit(((ret_block)),((unsigned long*)bh2jh(bitmap_bh)->b_committed_data)) : variable_test_bit(((ret_block)),((unsigned long*)bh2jh(bitmap_bh)->b_committed_data)))" ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at balloc:562 invalid operand: 0000 [1] CPU 0 Pid: 4737, comm: ocamlrun Not tainted RIP: 0010:[<ffffffff801c4948>] <ffffffff801c4948>{ext3_new_block+1400} RSP: 0018:00000103fc511688 EFLAGS: 00010212 RAX: 0000000000000119 RBX: 0000000000020875 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000103fe789af0 RBP: 00000103ffcc8400 R08: 0000000000000000 R09: 0000000000000720 R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000004 R14: 00000103ed92d080 R15: 00000103f05a7168 FS: 0000002a95571fe0(0000) GS:ffffffff8046f380(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: fffffffffffffff7 CR3: 0000000000101000 CR4: 00000000000006a0 Call Trace:<ffffffff801c4948>{ext3_new_block+1400} <ffffffff801c7101>{ext3_alloc_block+17} <ffffffff801c74d5>{ext3_alloc_branch+85} <ffffffff801c7b7f>{ext3_get_block_handle+751} <ffffffff80184d1f>{alloc_buffer_head+111} <ffffffff80181f86>{create_buffers+102} <ffffffff8018310d>{__block_prepare_write+333} <ffffffff801c7c40>{ext3_get_block+0} <ffffffff80183c0a>{block_prepare_write+26} <ffffffff801c842e>{ext3_prepare_write+302} <ffffffff8015daa1>{generic_file_aio_write_nolock+1297} <ffffffff8016e47d>{do_anonymous_page+1661} <ffffffff801c79ad>{ext3_get_block_handle+285} <ffffffff8015e01d>{generic_file_aio_write+109} <ffffffff801c57f3>{ext3_file_write+35} <ffffffff8017f343>{do_sync_write+115} <ffffffff801d7961>{journal_dirty_metadata+465} <ffffffff80164469>{cache_alloc_refill+1129} <ffffffff80162b9c>{check_poison_obj+60} <ffffffff801b0956>{elf_core_dump+262} <ffffffff80164cc7>{kmalloc+231} <ffffffff801b02c2>{dump_write+18} <ffffffff801b0dce>{elf_core_dump+1406} <ffffffff801a57af>{__mark_inode_dirty+47} <ffffffff8019f043>{notify_change+483} <ffffffff8017d385>{do_truncate+69} <ffffffff8018d2e4>{do_coredump+452} <ffffffff80147e18>{__dequeue_signal+392} <ffffffff8014a85c>{get_signal_to_deliver+1548} <ffffffff801206bc>{do_page_fault+668} <ffffffff80111a8d>{do_signal+125} <ffffffff80171514>{do_brk+340} <ffffffff80112360>{retint_signal+62} Process ocamlrun (pid: 4737, stackpage=103f99c0460) Stack: 0000000000000005 00000103ffcc8498 00000103edc4f400 0000000000000001 0000087500020875 00000103ee216320 00000103fc51174c 00000103ee009780 00000103fa903c80 aaaaaaaaaaaaaaab Call Trace:<ffffffff801c7101>{ext3_alloc_block+17} <ffffffff801c74d5>{ext3_alloc_branch+85} <ffffffff801c7b7f>{ext3_get_block_handle+751} <ffffffff80184d1f>{alloc_buffer_head+111} <ffffffff80181f86>{create_buffers+102} <ffffffff8018310d>{__block_prepare_write+333} <ffffffff801c7c40>{ext3_get_block+0} <ffffffff80183c0a>{block_prepare_write+26} <ffffffff801c842e>{ext3_prepare_write+302} <ffffffff8015daa1>{generic_file_aio_write_nolock+1297} <ffffffff8016e47d>{do_anonymous_page+1661} <ffffffff801c79ad>{ext3_get_block_handle+285} <ffffffff8015e01d>{generic_file_aio_write+109} <ffffffff801c57f3>{ext3_file_write+35} <ffffffff8017f343>{do_sync_write+115} <ffffffff801d7961>{journal_dirty_metadata+465} <ffffffff80164469>{cache_alloc_refill+1129} <ffffffff80162b9c>{check_poison_obj+60} <ffffffff801b0956>{elf_core_dump+262} <ffffffff80164cc7>{kmalloc+231} <ffffffff801b02c2>{dump_write+18} <ffffffff801b0dce>{elf_core_dump+1406} <ffffffff801a57af>{__mark_inode_dirty+47} <ffffffff8019f043>{notify_change+483} <ffffffff8017d385>{do_truncate+69} <ffffffff8018d2e4>{do_coredump+452} <ffffffff80147e18>{__dequeue_signal+392} <ffffffff8014a85c>{get_signal_to_deliver+1548} <ffffffff801206bc>{do_page_fault+668} <ffffffff80111a8d>{do_signal+125} <ffffffff80171514>{do_brk+340} <ffffffff80112360>{retint_signal+62} Code: 0f 0b b1 da 32 80 ff ff ff ff 32 02 48 8b 74 24 28 48 8b 7c
I also filed http://bugme.osdl.org/show_bug.cgi?id=862 for the 2.5.69-ac1 crash.
I tried (under 2.5.69-ac1) mounting the partition as ext2 and it still crashed.
Acording to Xavier Leroy (OCaml author), this place of OCaml compilation is probably doing a huge malloc: > This is a problem we've seen on other 64-bit Linux platforms, and it > is due to the fact that malloc() can return *widely* spaced pointers. > Since OCaml likes to maintain a table of memory pages it has > allocated, this causes the page table to become *huge* and its > allocation fails. > > The workaround is ... > > However, a failed malloc() request shouldn't cause a kernel oops
Turned out that machines had a buggy version of the BIOS. Upgrading the BIOS solved a lot of problems - not sure if this particular one was also solved, but it probably was.