Bug 98327
Summary: | [x86_64] Reproducible kernel Oops in fs code (caused by a huge malloc?) | ||
---|---|---|---|
Product: | [Retired] Red Hat Raw Hide | Reporter: | Aleksey Nogin <aleksey> |
Component: | kernel | Assignee: | Arjan van de Ven <arjanv> |
Status: | CLOSED WORKSFORME | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 1.0 | CC: | crt, jyh |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
URL: | http://people.redhat.com/arjanv/amd64/ | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2004-01-07 05:31:00 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Aleksey Nogin
2003-07-01 04:19:57 UTC
This seems to be perfectly reproducible - after reboot I tried compiling again and at the exact same place (sh ./runocamldoc true -man -d stdlib_man ...) machine Oopsed and froze. The same problem exists in 2.5.69-ac1, but it exibits itself in a somewhat different way. At the same place in OCaml compilation process, I get: ocamlrun[4737] segfault at rip:2a95bda18d rsp:7fbfffe860 adr:fffffffffffffff7 err:4 Slab corruption: start=00000103dfffe000, expend=00000103dfffefff, problemat=00000103dfffe000 Data: FF FF [... a huge number of FFs ...] FF Next: FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF slab error in check_poison_obj(): cache `size-4096': object was modified after freeing Call Trace:<ffffffff801d7432>{journal_get_undo_access+130} <ffffffff80164cc7>{kmalloc+231} <ffffffff801d7432>{journal_get_undo_access+130} <ffffffff801c4775>{ext3_new_block+933} <ffffffff801c7101>{ext3_alloc_block+17} <ffffffff801c74d5>{ext3_alloc_branch+85} <ffffffff801c7b7f>{ext3_get_block_handle+751} <ffffffff80184d1f>{alloc_buffer_head+111} <ffffffff80181f86>{create_buffers+102} <ffffffff8018310d>{__block_prepare_write+333} <ffffffff801c7c40>{ext3_get_block+0} <ffffffff80183c0a>{block_prepare_write+26} <ffffffff801c842e>{ext3_prepare_write+302} <ffffffff8015daa1>{generic_file_aio_write_nolock+1297} <ffffffff8016e47d>{do_anonymous_page+1661} <ffffffff801c79ad>{ext3_get_block_handle+285} <ffffffff8015e01d>{generic_file_aio_write+109} <ffffffff801c57f3>{ext3_file_write+35} <ffffffff8017f343>{do_sync_write+115} <ffffffff801d7961>{journal_dirty_metadata+465} <ffffffff80164469>{cache_alloc_refill+1129} <ffffffff80162b9c>{check_poison_obj+60} <ffffffff801b0956>{elf_core_dump+262} <ffffffff80164cc7>{kmalloc+231} <ffffffff801b02c2>{dump_write+18} <ffffffff801b0dce>{elf_core_dump+1406} <ffffffff801a57af>{__mark_inode_dirty+47} <ffffffff8019f043>{notify_change+483} <ffffffff8017d385>{do_truncate+69} <ffffffff8018d2e4>{do_coredump+452} <ffffffff80147e18>{__dequeue_signal+392} <ffffffff8014a85c>{get_signal_to_deliver+1548} <ffffffff801206bc>{do_page_fault+668} <ffffffff80111a8d>{do_signal+125} <ffffffff80171514>{do_brk+340} <ffffffff80112360>{retint_signal+62} Assertion failure in ext3_new_block() at fs/ext3/balloc.c:562: "!(__builtin_constant_p((ret_block)) ? constant_test_bit(((ret_block)),((unsigned long*)bh2jh(bitmap_bh)->b_committed_data)) : variable_test_bit(((ret_block)),((unsigned long*)bh2jh(bitmap_bh)->b_committed_data)))" ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at balloc:562 invalid operand: 0000 [1] CPU 0 Pid: 4737, comm: ocamlrun Not tainted RIP: 0010:[<ffffffff801c4948>] <ffffffff801c4948>{ext3_new_block+1400} RSP: 0018:00000103fc511688 EFLAGS: 00010212 RAX: 0000000000000119 RBX: 0000000000020875 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000000 RDI: 00000103fe789af0 RBP: 00000103ffcc8400 R08: 0000000000000000 R09: 0000000000000720 R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000004 R14: 00000103ed92d080 R15: 00000103f05a7168 FS: 0000002a95571fe0(0000) GS:ffffffff8046f380(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: fffffffffffffff7 CR3: 0000000000101000 CR4: 00000000000006a0 Call Trace:<ffffffff801c4948>{ext3_new_block+1400} <ffffffff801c7101>{ext3_alloc_block+17} <ffffffff801c74d5>{ext3_alloc_branch+85} <ffffffff801c7b7f>{ext3_get_block_handle+751} <ffffffff80184d1f>{alloc_buffer_head+111} <ffffffff80181f86>{create_buffers+102} <ffffffff8018310d>{__block_prepare_write+333} <ffffffff801c7c40>{ext3_get_block+0} <ffffffff80183c0a>{block_prepare_write+26} <ffffffff801c842e>{ext3_prepare_write+302} <ffffffff8015daa1>{generic_file_aio_write_nolock+1297} <ffffffff8016e47d>{do_anonymous_page+1661} <ffffffff801c79ad>{ext3_get_block_handle+285} <ffffffff8015e01d>{generic_file_aio_write+109} <ffffffff801c57f3>{ext3_file_write+35} <ffffffff8017f343>{do_sync_write+115} <ffffffff801d7961>{journal_dirty_metadata+465} <ffffffff80164469>{cache_alloc_refill+1129} <ffffffff80162b9c>{check_poison_obj+60} <ffffffff801b0956>{elf_core_dump+262} <ffffffff80164cc7>{kmalloc+231} <ffffffff801b02c2>{dump_write+18} <ffffffff801b0dce>{elf_core_dump+1406} <ffffffff801a57af>{__mark_inode_dirty+47} <ffffffff8019f043>{notify_change+483} <ffffffff8017d385>{do_truncate+69} <ffffffff8018d2e4>{do_coredump+452} <ffffffff80147e18>{__dequeue_signal+392} <ffffffff8014a85c>{get_signal_to_deliver+1548} <ffffffff801206bc>{do_page_fault+668} <ffffffff80111a8d>{do_signal+125} <ffffffff80171514>{do_brk+340} <ffffffff80112360>{retint_signal+62} Process ocamlrun (pid: 4737, stackpage=103f99c0460) Stack: 0000000000000005 00000103ffcc8498 00000103edc4f400 0000000000000001 0000087500020875 00000103ee216320 00000103fc51174c 00000103ee009780 00000103fa903c80 aaaaaaaaaaaaaaab Call Trace:<ffffffff801c7101>{ext3_alloc_block+17} <ffffffff801c74d5>{ext3_alloc_branch+85} <ffffffff801c7b7f>{ext3_get_block_handle+751} <ffffffff80184d1f>{alloc_buffer_head+111} <ffffffff80181f86>{create_buffers+102} <ffffffff8018310d>{__block_prepare_write+333} <ffffffff801c7c40>{ext3_get_block+0} <ffffffff80183c0a>{block_prepare_write+26} <ffffffff801c842e>{ext3_prepare_write+302} <ffffffff8015daa1>{generic_file_aio_write_nolock+1297} <ffffffff8016e47d>{do_anonymous_page+1661} <ffffffff801c79ad>{ext3_get_block_handle+285} <ffffffff8015e01d>{generic_file_aio_write+109} <ffffffff801c57f3>{ext3_file_write+35} <ffffffff8017f343>{do_sync_write+115} <ffffffff801d7961>{journal_dirty_metadata+465} <ffffffff80164469>{cache_alloc_refill+1129} <ffffffff80162b9c>{check_poison_obj+60} <ffffffff801b0956>{elf_core_dump+262} <ffffffff80164cc7>{kmalloc+231} <ffffffff801b02c2>{dump_write+18} <ffffffff801b0dce>{elf_core_dump+1406} <ffffffff801a57af>{__mark_inode_dirty+47} <ffffffff8019f043>{notify_change+483} <ffffffff8017d385>{do_truncate+69} <ffffffff8018d2e4>{do_coredump+452} <ffffffff80147e18>{__dequeue_signal+392} <ffffffff8014a85c>{get_signal_to_deliver+1548} <ffffffff801206bc>{do_page_fault+668} <ffffffff80111a8d>{do_signal+125} <ffffffff80171514>{do_brk+340} <ffffffff80112360>{retint_signal+62} Code: 0f 0b b1 da 32 80 ff ff ff ff 32 02 48 8b 74 24 28 48 8b 7c I also filed http://bugme.osdl.org/show_bug.cgi?id=862 for the 2.5.69-ac1 crash. I tried (under 2.5.69-ac1) mounting the partition as ext2 and it still crashed. Acording to Xavier Leroy (OCaml author), this place of OCaml compilation is probably doing a huge malloc: > This is a problem we've seen on other 64-bit Linux platforms, and it > is due to the fact that malloc() can return *widely* spaced pointers. > Since OCaml likes to maintain a table of memory pages it has > allocated, this causes the page table to become *huge* and its > allocation fails. > > The workaround is ... > > However, a failed malloc() request shouldn't cause a kernel oops Turned out that machines had a buggy version of the BIOS. Upgrading the BIOS solved a lot of problems - not sure if this particular one was also solved, but it probably was. |