Bug 175925
Summary: | Bad page state at free_hot_cold_page (in process 'kswapd0'... | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Frode Tennebø <frodet> | ||||||
Component: | kernel | Assignee: | Dave Jones <davej> | ||||||
Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Brian Brock <bbrock> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 5 | CC: | jonstanley, pfrields, trevor, wtogami | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i386 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | MassClosed | ||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2008-01-20 04:42:00 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Frode Tennebø
2005-12-16 12:05:54 UTC
The first post was almost unreadable. I have re-submitted the actual contents below: Description of problem: At relatively frequent intervals (app. every two weeks) the OS hangs and I'm forced to do a hardware reset. Just prior to the incident I have in /var/log/messages one or two of those: Dec 16 10:03:44 garvin kernel: Bad page state at free_hot_cold_page (in process 'kswapd0', page c115c580) Dec 16 10:03:44 garvin kernel: flags:0x40000000 mapping:00000000 mapcount:-1 count:0 (Tainted: G B) Dec 16 10:03:44 garvin kernel: Backtrace: Dec 16 10:03:44 garvin kernel: [<c013f38d>] bad_page+0x8c/0xc3 Dec 16 10:03:44 garvin kernel: [<c013fbf7>] free_hot_cold_page+0x47/0xca Dec 16 10:03:44 garvin kernel: [<c01403a5>] __pagevec_free+0x1f/0x2e Dec 16 10:03:44 garvin kernel: [<c0145123>] __pagevec_release_nonlru+0x29/0x8a Dec 16 10:03:44 garvin kernel: [<c01460fd>] shrink_list+0x207/0x47b Dec 16 10:03:44 garvin kernel: [<c014651e>] shrink_cache+0xe7/0x29a Dec 16 10:03:44 garvin kernel: [<c0146b41>] shrink_zone+0x88/0xd6 Dec 16 10:03:44 garvin kernel: [<c0146f95>] balance_pgdat+0x20d/0x3e7 Dec 16 10:03:44 garvin kernel: [<c014723a>] kswapd+0xcb/0x109 Dec 16 10:03:44 garvin kernel: [<c012db56>] autoremove_wake_function+0x0/0x37 Dec 16 10:03:44 garvin kernel: [<c014716f>] kswapd+0x0/0x109 Dec 16 10:03:44 garvin kernel: [<c0101301>] kernel_thread_helper+0x5/0xb Dec 16 10:03:44 garvin kernel: Trying to fix it up, but a reboot is needed Version-Release number of selected component (if applicable): It has been like this for all FC4 kernels I have tried. This includes: kernel-2.6.11-1.1369_FC4 kernel-2.6.12-1.1398_FC4 kernel-2.6.12-1.1447_FC4 kernel-2.6.13-1.1526_FC4 kernel-2.6.13-1.1532_FC4 kernel-2.6.14-1.1637_FC4 Currently I'm running (for a few hours): kernel-2.6.14-1.1644_FC4 ...but will shortly boot into: kernel-2.6.14-1.1653_FC4 I will confirm when/if it happens again with any of these releases. How reproducible: It's periodically and I have no deterministic way of reproducing it. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Prior to the actual hang I have the following in messages: Dec 15 02:57:13 garvin kernel: ------------[ cut here ]------------ Dec 15 02:57:13 garvin kernel: kernel BUG at mm/rmap.c:487! Dec 15 02:57:13 garvin kernel: invalid operand: 0000 [#1] Dec 15 02:57:13 garvin kernel: Modules linked in: loop parport_pc lp parport nfs lockd nfs_acl autofs4 sunrpc dm_mod ipv6 uhci_hcd i2c_piix4 i2c_core snd_es18xx snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_opl3_lib snd_timer snd_hwdep snd_mpu401_uart snd_rawm idi snd_seq_device snd soundcore tlan floppy ext3 jbd aic7xxx scsi_transport_spi sd_mod scsi_mod Dec 15 02:57:13 garvin kernel: CPU: 0 Dec 15 02:57:13 garvin kernel: EIP: 0060:[<c014f97b>] Not tainted VLI Dec 15 02:57:13 garvin kernel: EFLAGS: 00010286 (2.6.14-1.1637_FC4) Dec 15 02:57:13 garvin kernel: EIP is at page_remove_rmap+0x37/0x41 Dec 15 02:57:13 garvin kernel: eax: ffffffff ebx: c85d5e30 ecx: 00000006 e dx: c115c580 Dec 15 02:57:13 garvin kernel: esi: c115c580 edi: 0038c000 ebp: c03f7a7c e sp: cd7ddec8 Dec 15 02:57:13 garvin kernel: ds: 007b es: 007b ss: 0068 Dec 15 02:57:13 garvin kernel: Process udev (pid: 4008, threadinfo=cd7dd000 task =c7059ab0) Dec 15 02:57:13 garvin kernel: Stack: c0149137 00000000 00391000 c03f7a7c c0a7d0 00 00391000 00391000 00390fff Dec 15 02:57:13 garvin kernel: c01492ca 00391000 00000000 c03f7a7c 000090 00 00391000 c4ce3ddc 00391000 Dec 15 02:57:13 garvin kernel: c0149401 00391000 00000000 cd7dd000 cdb671 c0 cd7ddf58 002d7000 00000000 Dec 15 02:57:13 garvin kernel: Call Trace: Dec 15 02:57:13 garvin kernel: [<c0149137>] zap_pte_range+0xe5/0x1f5 Dec 15 02:57:13 garvin kernel: [<c01492ca>] unmap_page_range+0x83/0xb7 Dec 15 02:57:13 garvin kernel: [<c0149401>] unmap_vmas+0x103/0x222 Dec 15 02:57:13 garvin kernel: [<c014dc05>] exit_mmap+0x7c/0x14c Dec 15 02:57:13 garvin kernel: [<c01189a0>] mmput+0x1f/0x95 Dec 15 02:57:13 garvin kernel: [<c011d33d>] do_exit+0xe0/0x3b8 Dec 15 02:57:13 garvin kernel: [<c011d66a>] do_group_exit+0x29/0x90 Dec 15 02:57:13 garvin kernel: [<c0102edd>] syscall_call+0x7/0xb Dec 15 02:57:13 garvin kernel: Code: ff 0f 98 c0 84 c0 75 01 c3 8b 42 08 83 c0 0 1 90 78 19 ba ff ff ff ff b8 10 00 00 00 e9 43 0c ff ff 0f 0b e4 01 ad 4a 32 c0 eb d2 <0f> 0b e7 01 ad 4a 32 c0 eb dd 55 57 56 53 83 ec 04 89 c7 89 d3 Dec 15 02:57:13 garvin kernel: <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 Dec 15 02:57:13 garvin kernel: in_atomic():1, irqs_disabled():0 Dec 15 02:57:13 garvin kernel: [<c011ba33>] profile_task_exit+0x13/0x48 Dec 15 02:57:13 garvin kernel: [<c011d278>] do_exit+0x1b/0x3b8 Dec 15 02:57:13 garvin kernel: [<c0103827>] do_divide_error+0x0/0xa8 Dec 15 02:57:14 garvin kernel: [<c01039b7>] do_invalid_op+0x0/0xab Dec 15 02:57:14 garvin kernel: [<c0103a59>] do_invalid_op+0xa2/0xab Dec 15 02:57:14 garvin kernel: [<c014f97b>] page_remove_rmap+0x37/0x41 Dec 15 02:57:14 garvin kernel: [<c0148c69>] pte_alloc_map+0x29/0xab Dec 15 02:57:14 garvin kernel: [<c0148e3b>] copy_pte_range+0xe8/0x214 Dec 15 02:57:14 garvin kernel: [<c0103107>] error_code+0x4f/0x54 Dec 15 02:57:14 garvin kernel: [<c014007b>] __alloc_pages+0x14e/0x403 Dec 15 02:57:14 garvin kernel: [<c014f97b>] page_remove_rmap+0x37/0x41 Dec 15 02:57:14 garvin kernel: [<c0149137>] zap_pte_range+0xe5/0x1f5 Dec 15 02:57:14 garvin kernel: [<c01492ca>] unmap_page_range+0x83/0xb7 Dec 15 02:57:14 garvin kernel: [<c0149401>] unmap_vmas+0x103/0x222 Dec 15 02:57:14 garvin kernel: [<c014dc05>] exit_mmap+0x7c/0x14c Dec 15 02:57:14 garvin kernel: [<c01189a0>] mmput+0x1f/0x95 Dec 15 02:57:14 garvin kernel: [<c011d33d>] do_exit+0xe0/0x3b8 Dec 15 02:57:14 garvin kernel: [<c011d66a>] do_group_exit+0x29/0x90 Dec 15 02:57:14 garvin kernel: [<c0102edd>] syscall_call+0x7/0xb Dec 15 02:57:14 garvin kernel: Fixing recursive fault but reboot is needed! Dec 15 02:57:14 garvin kernel: scheduling while atomic: udev/0x00000001/4008 Dec 15 02:57:14 garvin kernel: [<c030b8b4>] schedule+0x504/0x5bb Dec 15 02:57:14 garvin kernel: [<c0102edd>] syscall_call+0x7/0xb Dec 15 02:57:14 garvin kernel: [<c012b1a6>] __kernel_text_address+0x1c/0x27 Dec 15 02:57:14 garvin kernel: [<c0103329>] show_trace+0x2a/0x78 Dec 15 02:57:14 garvin kernel: [<c0102edd>] syscall_call+0x7/0xb Dec 15 02:57:14 garvin kernel: [<c011d599>] do_exit+0x33c/0x3b8 Dec 15 02:57:14 garvin kernel: [<c0103827>] do_divide_error+0x0/0xa8 Dec 15 02:57:14 garvin kernel: [<c01039b7>] do_invalid_op+0x0/0xab Dec 15 02:57:14 garvin kernel: [<c0103a59>] do_invalid_op+0xa2/0xab Dec 15 02:57:14 garvin kernel: [<c014f97b>] page_remove_rmap+0x37/0x41 Dec 15 02:57:14 garvin kernel: [<c0148c69>] pte_alloc_map+0x29/0xab Dec 15 02:57:14 garvin kernel: [<c0148e3b>] copy_pte_range+0xe8/0x214 Dec 15 02:57:15 garvin kernel: [<c0103107>] error_code+0x4f/0x54 Dec 15 02:57:15 garvin kernel: [<c014007b>] __alloc_pages+0x14e/0x403 Dec 15 02:57:15 garvin kernel: [<c014f97b>] page_remove_rmap+0x37/0x41 Dec 15 02:57:15 garvin kernel: [<c0149137>] zap_pte_range+0xe5/0x1f5 Dec 15 02:57:15 garvin kernel: [<c01492ca>] unmap_page_range+0x83/0xb7 Dec 15 02:57:15 garvin kernel: [<c0149401>] unmap_vmas+0x103/0x222 Dec 15 02:57:15 garvin kernel: [<c014dc05>] exit_mmap+0x7c/0x14c Dec 15 02:57:15 garvin kernel: [<c01189a0>] mmput+0x1f/0x95 Dec 15 02:57:15 garvin kernel: [<c011d33d>] do_exit+0xe0/0x3b8 Dec 15 02:57:15 garvin kernel: [<c011d66a>] do_group_exit+0x29/0x90 Dec 15 02:57:15 garvin kernel: [<c0102edd>] syscall_call+0x7/0xb Dec 15 04:40:04 garvin kernel: ------------[ cut here ]------------ Dec 15 04:40:04 garvin kernel: kernel BUG at mm/rmap.c:487! Dec 15 04:40:04 garvin kernel: invalid operand: 0000 [#2] Dec 15 04:40:04 garvin kernel: Modules linked in: loop parport_pc lp parport nfs lockd nfs_acl autofs4 sunrpc dm_mod ipv6 uhci_hcd i2c_piix4 i2c_core snd_es18xx snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_opl3_lib snd_timer snd_hwdep snd_mpu401_uart snd_rawm idi snd_seq_device snd soundcore tlan floppy ext3 jbd aic7xxx scsi_transport_spi sd_mod scsi_mod Dec 15 04:40:04 garvin kernel: CPU: 0 Dec 15 04:40:04 garvin kernel: EIP: 0060:[<c014f97b>] Not tainted VLI Dec 15 04:40:04 garvin kernel: EFLAGS: 00010286 (2.6.14-1.1637_FC4) Dec 15 04:40:04 garvin kernel: EIP is at page_remove_rmap+0x37/0x41 Dec 15 04:40:04 garvin kernel: eax: ffffffff ebx: c82ffe30 ecx: 00000002 e dx: c111b8c0 Dec 15 04:40:04 garvin kernel: esi: c111b8c0 edi: 09b8c000 ebp: c03f7a7c e sp: c1843de8 Dec 15 04:40:04 garvin kernel: ds: 007b es: 007b ss: 0068 Dec 15 04:40:04 garvin kernel: Process udev (pid: 6933, threadinfo=c1843000 task =cd943570) Dec 15 04:40:04 garvin kernel: Stack: c0149137 00000000 09bec000 c03f7a7c c80430 98 09bec000 09bec000 09bebfff Dec 15 04:40:04 garvin kernel: c01492ca 09bec000 00000000 c03f7a7c 000c60 00 09bec000 cf329284 09bec000 Dec 15 04:40:04 garvin kernel: c0149401 09bec000 00000000 c1843000 cdb661 40 c1843e78 00298000 00000000 Dec 15 04:40:04 garvin kernel: Call Trace: Dec 15 04:40:04 garvin kernel: [<c0149137>] zap_pte_range+0xe5/0x1f5 Dec 15 04:40:04 garvin kernel: [<c01492ca>] unmap_page_range+0x83/0xb7 Dec 15 04:40:04 garvin kernel: [<c0149401>] unmap_vmas+0x103/0x222 Dec 15 04:40:04 garvin kernel: [<c014dc05>] exit_mmap+0x7c/0x14c Dec 15 04:40:04 garvin kernel: [<c01189a0>] mmput+0x1f/0x95 Dec 15 04:40:04 garvin kernel: [<c011d33d>] do_exit+0xe0/0x3b8 Dec 15 04:40:04 garvin kernel: [<c012444e>] __dequeue_signal+0xef/0x1b6 Dec 15 04:40:04 garvin kernel: [<c011d66a>] do_group_exit+0x29/0x90 Dec 15 04:40:04 garvin kernel: [<c0125f38>] get_signal_to_deliver+0x260/0x36d Dec 15 04:40:05 garvin kernel: [<c030d6c0>] do_page_fault+0x0/0x640 Dec 15 04:40:05 garvin kernel: [<c0102ced>] do_signal+0x4b/0x105 Dec 15 04:40:05 garvin kernel: [<c01616a1>] vfs_lstat+0x11/0x37 Dec 15 04:40:05 garvin kernel: [<c0148c69>] pte_alloc_map+0x29/0xab Dec 15 04:40:05 garvin kernel: [<c014ae5a>] __handle_mm_fault+0x14a/0x190 Dec 15 04:40:05 garvin kernel: [<c0126f96>] notifier_call_chain+0x17/0x27 Dec 15 04:40:05 garvin kernel: [<c030d9fd>] do_page_fault+0x33d/0x640 Dec 15 04:40:05 garvin kernel: [<c030d6c0>] do_page_fault+0x0/0x640 Dec 15 04:40:05 garvin kernel: [<c0102dce>] do_notify_resume+0x27/0x35 Dec 15 04:40:05 garvin kernel: [<c0102f6e>] work_notifysig+0x13/0x19 Dec 15 04:40:05 garvin kernel: Code: ff 0f 98 c0 84 c0 75 01 c3 8b 42 08 83 c0 0 1 90 78 19 ba ff ff ff ff b8 10 00 00 00 e9 43 0c ff ff 0f 0b e4 01 ad 4a 32 c0 eb d2 <0f> 0b e7 01 ad 4a 32 c0 eb dd 55 57 56 53 83 ec 04 89 c7 89 d3 Dec 15 04:40:05 garvin kernel: <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 Dec 15 04:40:05 garvin kernel: in_atomic():1, irqs_disabled():0 Dec 15 04:40:05 garvin kernel: [<c011ba33>] profile_task_exit+0x13/0x48 Dec 15 04:40:05 garvin kernel: [<c011d278>] do_exit+0x1b/0x3b8 Dec 15 04:40:05 garvin kernel: [<c0103827>] do_divide_error+0x0/0xa8 Dec 15 04:40:05 garvin kernel: [<c01039b7>] do_invalid_op+0x0/0xab Dec 15 04:40:05 garvin kernel: [<c0103a59>] do_invalid_op+0xa2/0xab Dec 15 04:40:05 garvin kernel: [<c014f97b>] page_remove_rmap+0x37/0x41 Dec 15 04:40:05 garvin kernel: [<c030cb5e>] _read_unlock_irq+0x5/0x7 Dec 15 04:40:05 garvin kernel: [<c013bd4d>] find_get_page+0x36/0x41 Dec 15 04:40:05 garvin kernel: [<c017039e>] alloc_inode+0xee/0x18c Dec 15 04:40:05 garvin kernel: [<c01045b8>] do_IRQ+0x51/0x82 Dec 15 04:40:05 garvin kernel: [<c0103107>] error_code+0x4f/0x54 Dec 15 04:40:05 garvin kernel: [<c014007b>] __alloc_pages+0x14e/0x403 Dec 15 04:40:05 garvin kernel: [<c014f97b>] page_remove_rmap+0x37/0x41 Dec 15 04:40:05 garvin kernel: [<c0149137>] zap_pte_range+0xe5/0x1f5 Dec 15 04:40:05 garvin kernel: [<c01492ca>] unmap_page_range+0x83/0xb7 Dec 15 04:40:05 garvin kernel: [<c0149401>] unmap_vmas+0x103/0x222 Dec 15 04:40:05 garvin kernel: [<c014dc05>] exit_mmap+0x7c/0x14c Dec 15 04:40:06 garvin kernel: [<c01189a0>] mmput+0x1f/0x95 Dec 15 04:40:06 garvin kernel: [<c011d33d>] do_exit+0xe0/0x3b8 Dec 15 04:40:06 garvin kernel: [<c012444e>] __dequeue_signal+0xef/0x1b6 Dec 15 04:40:06 garvin kernel: [<c011d66a>] do_group_exit+0x29/0x90 Dec 15 04:40:06 garvin kernel: [<c0125f38>] get_signal_to_deliver+0x260/0x36d Dec 15 04:40:06 garvin kernel: [<c030d6c0>] do_page_fault+0x0/0x640 Dec 15 04:40:06 garvin kernel: [<c0102ced>] do_signal+0x4b/0x105 Dec 15 04:40:06 garvin kernel: [<c01616a1>] vfs_lstat+0x11/0x37 Dec 15 04:40:06 garvin kernel: [<c0148c69>] pte_alloc_map+0x29/0xab Dec 15 04:40:06 garvin kernel: [<c014ae5a>] __handle_mm_fault+0x14a/0x190 Dec 15 04:40:06 garvin kernel: [<c0126f96>] notifier_call_chain+0x17/0x27 Dec 15 04:40:06 garvin kernel: [<c030d9fd>] do_page_fault+0x33d/0x640 Dec 15 04:40:06 garvin kernel: [<c030d6c0>] do_page_fault+0x0/0x640 Dec 15 04:40:06 garvin kernel: [<c0102dce>] do_notify_resume+0x27/0x35 Dec 15 04:40:06 garvin kernel: [<c0102f6e>] work_notifysig+0x13/0x19 Dec 15 04:40:06 garvin kernel: Fixing recursive fault but reboot is needed! Dec 15 04:40:06 garvin kernel: scheduling while atomic: udev/0x00000001/6933 Dec 15 04:40:06 garvin kernel: [<c030b8b4>] schedule+0x504/0x5bb Dec 15 04:40:06 garvin kernel: [<c0102f6e>] work_notifysig+0x13/0x19 Dec 15 04:40:06 garvin kernel: [<c012b1a6>] __kernel_text_address+0x1c/0x27 Dec 15 04:40:06 garvin kernel: [<c0103329>] show_trace+0x2a/0x78 Dec 15 04:40:06 garvin kernel: [<c0102f6e>] work_notifysig+0x13/0x19 Dec 15 04:40:06 garvin kernel: [<c011d599>] do_exit+0x33c/0x3b8 Dec 15 04:40:06 garvin kernel: [<c0103827>] do_divide_error+0x0/0xa8 Dec 15 04:40:06 garvin kernel: [<c01039b7>] do_invalid_op+0x0/0xab Dec 15 04:40:06 garvin kernel: [<c0103a59>] do_invalid_op+0xa2/0xab Dec 15 04:40:06 garvin kernel: [<c014f97b>] page_remove_rmap+0x37/0x41 Dec 15 04:40:06 garvin kernel: [<c030cb5e>] _read_unlock_irq+0x5/0x7 Dec 15 04:40:07 garvin kernel: [<c013bd4d>] find_get_page+0x36/0x41 Dec 15 04:40:07 garvin kernel: [<c017039e>] alloc_inode+0xee/0x18c Dec 15 04:40:07 garvin kernel: [<c01045b8>] do_IRQ+0x51/0x82 Dec 15 04:40:07 garvin kernel: [<c0103107>] error_code+0x4f/0x54 Dec 15 04:40:07 garvin kernel: [<c014007b>] __alloc_pages+0x14e/0x403 Dec 15 04:40:07 garvin kernel: [<c014f97b>] page_remove_rmap+0x37/0x41 Dec 15 04:40:07 garvin kernel: [<c0149137>] zap_pte_range+0xe5/0x1f5 Dec 15 04:40:07 garvin kernel: [<c01492ca>] unmap_page_range+0x83/0xb7 Dec 15 04:40:07 garvin kernel: [<c0149401>] unmap_vmas+0x103/0x222 Dec 15 04:40:07 garvin kernel: [<c014dc05>] exit_mmap+0x7c/0x14c Dec 15 04:40:07 garvin kernel: [<c01189a0>] mmput+0x1f/0x95 Dec 15 04:40:07 garvin kernel: [<c011d33d>] do_exit+0xe0/0x3b8 Dec 15 04:40:07 garvin kernel: [<c012444e>] __dequeue_signal+0xef/0x1b6 Dec 15 04:40:07 garvin kernel: [<c011d66a>] do_group_exit+0x29/0x90 Dec 15 04:40:07 garvin kernel: [<c0125f38>] get_signal_to_deliver+0x260/0x36d Dec 15 04:40:07 garvin kernel: [<c030d6c0>] do_page_fault+0x0/0x640 Dec 15 04:40:07 garvin kernel: [<c0102ced>] do_signal+0x4b/0x105 Dec 15 04:40:07 garvin kernel: [<c01616a1>] vfs_lstat+0x11/0x37 Dec 15 04:40:07 garvin kernel: [<c0148c69>] pte_alloc_map+0x29/0xab Dec 15 04:40:07 garvin kernel: [<c014ae5a>] __handle_mm_fault+0x14a/0x190 Dec 15 04:40:07 garvin kernel: [<c0126f96>] notifier_call_chain+0x17/0x27 Dec 15 04:40:07 garvin kernel: [<c030d9fd>] do_page_fault+0x33d/0x640 Dec 15 04:40:07 garvin kernel: [<c030d6c0>] do_page_fault+0x0/0x640 Dec 15 04:40:07 garvin kernel: [<c0102dce>] do_notify_resume+0x27/0x35 Dec 15 04:40:07 garvin kernel: [<c0102f6e>] work_notifysig+0x13/0x19 *** Bug 175924 has been marked as a duplicate of this bug. *** Can you try the test kernels at http://people.redhat.com/davej/kernels/Fedora/FC4 ? There's recently been quite a bit of churn upstream in this area, and it'll be very interesting to know how that behaves. I can report that kernel-2.6.14-1.1644_FC4 exhibit the same problem as previously reported. I also tried kernel-2.6.14-1.1769_FC4, and it worked as expected for some time. Then I did a 'find / -name "*.mp3"' and a few minutes later everything froze. The machine continued to answer ping, but did not respond to either already logged in sessions or the console. And there was no indications in the /var/log/ messages as to what happened. Created attachment 122685 [details]
/var/log/messages from the machine in trouble
During xmas (with very little activity) it has happened again. This time it has
logged various kernel bugs, bad page and other ooops.
This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you. It happened again, the printouts were a bit different, but still appears to be the same (or related). [root@garvin log]# uptime 09:18:09 up 1 day, 29 min, 4 users, load average: 1.06, 1.10, 0.76 [root@garvin log]# uname -a Linux garvin 2.6.15-1.1830_FC4 #1 Thu Feb 2 17:23:41 EST 2006 i686 i686 i386 GNU /Linux /var/log/messages: Feb 7 09:04:13 garvin kernel: Eeek! page_mapcount(page) went negative! (-1) Feb 7 09:04:13 garvin kernel: page->flags = 80000864 Feb 7 09:04:13 garvin kernel: page->count = 2 Feb 7 09:04:13 garvin kernel: page->mapping = cfebb194 Feb 7 09:04:13 garvin kernel: ------------[ cut here ]------------ Feb 7 09:04:13 garvin kernel: kernel BUG at mm/rmap.c:493! Feb 7 09:04:13 garvin kernel: invalid operand: 0000 [#1] Feb 7 09:04:13 garvin kernel: last sysfs file: /class/vc/vcs8/dev Feb 7 09:04:13 garvin kernel: Modules linked in: parport_pc lp parport nfs lockd nfs_acl autofs4 sunrpc dm_mod ipv6 uhci_hcd i2c_piix4 i2c_core snd_es18xx snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_opl3_lib snd_timer snd_hwdep snd_mpu401_uart snd_ rawmidi snd_seq_device snd soundcore tlan floppy ext3 jbd aic7xxx scsi_transport _spi sd_mod scsi_mod Feb 7 09:04:13 garvin kernel: CPU: 0 Feb 7 09:04:13 garvin kernel: EIP: 0060:[<c0151d57>] Not tainted VLI Feb 7 09:04:13 garvin kernel: EFLAGS: 00010286 (2.6.15-1.1830_FC4) Feb 7 09:04:13 garvin kernel: EIP is at page_remove_rmap+0x9a/0xa8 Feb 7 09:04:14 garvin kernel: eax: ffffffff ebx: c115daa0 ecx: ffffffff edx: 00000000 Feb 7 09:04:14 garvin kernel: esi: 00b8c000 edi: c115daa0 ebp: 00000020 esp: c9851ea4 Feb 7 09:04:14 garvin kernel: ds: 007b es: 007b ss: 0068 Feb 7 09:04:14 garvin kernel: Process udev (pid: 24088, threadinfo=c9851000 task=cc5fdab0) Feb 7 09:04:14 garvin kernel: Stack: c0332b94 cfebb194 c856fe30 c014b4c0 c4cc6c 24 c041dba0 c52a5140 fffffffc Feb 7 09:04:14 garvin kernel: 00000000 c52a5190 c81ba008 00ba3000 c9851f 34 c81ba008 c014b6fb 00b89000 Feb 7 09:04:14 garvin kernel: 00ba3000 c9851f34 00000000 c4cc6c24 c041 dba0 c81ba008 00ba2fff 00b89000 Feb 7 09:04:14 garvin kernel: Call Trace: Feb 7 09:04:14 garvin kernel: [<c014b4c0>] zap_pte_range+0x105/0x25a [<c 014b6fb>] unmap_page_range+0xe6/0x110 Feb 7 09:04:14 garvin kernel: [<c014b7f7>] unmap_vmas+0xd2/0x1f1 [<c 0150022>] exit_mmap+0x5f/0xda Feb 7 09:04:14 garvin kernel: [<c011ad09>] mmput+0x1f/0x95 [<c011f647>] do _exit+0xfc/0x3cf Feb 7 09:04:14 garvin kernel: [<c011f96f>] do_group_exit+0x29/0x90 [<c0102 e75>] syscall_call+0x7/0xb Feb 7 09:04:14 garvin kernel: Code: 01 89 44 24 04 c7 04 24 7d 2b 33 c0 e8 b2 b 7 fc ff 8b 43 10 89 44 24 04 c7 04 24 94 2b 33 c0 e8 9f b7 fc ff eb 84 8b 53 0c eb d0 <0f> 0b ed 01 52 2b 33 c0 90 e9 79 ff ff ff 55 57 56 53 83 ec 0c Feb 7 09:04:14 garvin kernel: Continuing in 120 seconds. ^MContinuing in 119 seconds. ^MContinuing in 118 seconds. ^MContinuing in 117 seconds. ^MContinuing in 116 seconds. ^MContinuing in 115 seconds. ^MContinuing in 114 seconds. ^ MContinuing in 113 seconds. ^MContinuing in 112 seconds. ^MContinuing in 111 seconds. ^MContinuing in 110 seconds. ^MContinuing in 109 seconds. ^MContinuing in 108 seconds. ^MContinuing in 107 seconds. ^MContinuing in 106 seconds. ^ MContinuing in 105 seconds. ^MContinuing in 104 seconds. ^MContinuing in 103 seconds. ^MContinuing in 102 seconds. ^MContinuing in 101 seconds. ^MContinuing in 100 seconds. ^MContinuing in 99 seconds. ^MContinuing in 98 seconds. ^ MContinuing in 97 seconds. ^MContinuing in 96 seconds. ^MContinuing in 95 seconds. ^MContinuing in 94 seconds. ^MContinuing in 93 seconds. ^MContinuing in 92 seconds. ^MContinuing in 91 seconds. ^MContinuing in 90 seconds. ^ MContinuing in 89 seconds. ^MContinuing in 88 seconds. ^MContinuing in 87 seconds. ^MContinuing in 86 seconds. ^MContinuing in 85 seconds. ^MCo Feb 7 09:04:14 garvin kernel: tinuing in 84 seconds. ^MContinuing in 83 seconds. ^MContinuing in 82 seconds. ^MContinuing in 81 seconds. ^MContinuing in 80 seconds. ^MContinuing in 79 seconds. ^MContinuing in 78 seconds. ^ MContinuing in 77 seconds. ^MContinuing in 76 seconds. ^MContinuing in 75 seconds. ^MContinuing in 74 seconds. ^MContinuing in 73 seconds. ^MContinuing in 72 seconds. ^MContinuing in 71 seconds. ^MContinuing in 70 seconds. ^ MContinuing in 69 seconds. ^MContinuing in 68 seconds. ^MContinuing in 67 seconds. ^MContinuing in 66 seconds. ^MContinuing in 65 seconds. ^MContinuing in 64 seconds. ^MContinuing in 63 seconds. ^M^MContinuing in 62 seconds. ^ MContinuing in 61 seconds. ^MContinuing in 60 seconds. ^MContinuing in 59 seconds. ^MContinuing in 58 seconds. ^MContinuing in 57 seconds. ^MContinuing in 56 seconds. ^MContinuing in 55 seconds. ^MContinuing in 54 seconds. ^ MContinuing in 53 seconds. ^MContinuing in 52 seconds. ^MContinuing in 51 seconds. ^MContinuing in 50 seconds. ^MContinuing in 49 seconds. ^MContinuing in 48 seconds. Feb 7 09:04:14 garvin kernel: tinuing in 47 seconds. ^MContinuing in 46 seconds. ^MContinuing in 45 seconds. ^MContinuing in 44 seconds. ^MContinuing in 43 seconds. ^MContinuing in 42 seconds. ^MContinuing in 41 seconds. ^ MContinuing in 40 seconds. ^MContinuing in 39 seconds. ^MContinuing in 38 seconds. ^MContinuing in 37 seconds. ^MContinuing in 36 seconds. ^MContinuing in 35 seconds. ^MContinuing in 34 seconds. ^MContinuing in 33 seconds. ^ MContinuing in 32 seconds. ^MContinuing in 31 seconds. ^MContinuing in 30 seconds. ^MContinuing in 29 seconds. ^MContinuing in 28 seconds. ^MContinuing in 27 seconds. ^MContinuing in 26 seconds. ^M^MContinuing in 25 seconds. ^ MContinuing in 24 seconds. ^MContinuing in 23 seconds. ^MContinuing in 22 seconds. ^MContinuing in 21 seconds. ^MContinuing in 20 seconds. ^MContinuing in 19 seconds. ^MContinuing in 18 seconds. ^MContinuing in 17 seconds. ^ MContinuing in 16 seconds. ^MContinuing in 15 seconds. ^MContinuing in 14 seconds. ^MContinuing in 13 seconds. ^MContinuing in 12 seconds. ^MContinuing in 11 seconds. Feb 7 09:04:14 garvin kernel: tinuing in 10 seconds. ^MContinuing in 9 seconds. ^MContinuing in 8 seconds. ^MContinuing in 7 seconds. ^MContinuing in 6 seconds. ^MContinuing in 5 seconds. ^MContinuing in 4 seconds. ^MContinuing in 3 seconds. ^MContinuing in 2 seconds. ^MContinuing in 1 seconds. Feb 7 09:04:14 garvin kernel: <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 Feb 7 09:04:14 garvin kernel: in_atomic():1, irqs_disabled():0 Feb 7 09:04:14 garvin kernel: [<c011dda3>] profile_task_exit+0x13/0x43 [<c 011f566>] do_exit+0x1b/0x3cf Feb 7 09:04:14 garvin kernel: [<c01041ab>] do_divide_error+0x0/0xa8 [<c 010433b>] do_invalid_op+0x0/0xab Feb 7 09:04:14 garvin kernel: [<c01043dd>] do_invalid_op+0xa2/0xab [<c0151 d57>] page_remove_rmap+0x9a/0xa8 Feb 7 09:04:14 garvin kernel: [<c011d2ff>] call_console_drivers+0x80/0x14c [<c011d8ac>] release_console_sem+0x77/0xb4 Feb 7 09:04:14 garvin kernel: [<c011d6f5>] vprintk+0x1e7/0x2a9 [<c014c569 >] do_wp_page+0x204/0x311 Feb 7 09:04:14 garvin kernel: [<c01039a7>] error_code+0x4f/0x54 [<c0151d57 >] page_remove_rmap+0x9a/0xa8 Feb 7 09:04:15 garvin kernel: [<c014b4c0>] zap_pte_range+0x105/0x25a [<c 014b6fb>] unmap_page_range+0xe6/0x110 Feb 7 09:04:15 garvin kernel: [<c014b7f7>] unmap_vmas+0xd2/0x1f1 [<c 0150022>] exit_mmap+0x5f/0xda Feb 7 09:04:15 garvin kernel: [<c011ad09>] mmput+0x1f/0x95 [<c011f647>] do _exit+0xfc/0x3cf Feb 7 09:04:15 garvin kernel: [<c011f96f>] do_group_exit+0x29/0x90 [<c0102 e75>] syscall_call+0x7/0xb Feb 7 09:04:15 garvin kernel: Fixing recursive fault but reboot is needed! Feb 7 09:04:15 garvin kernel: scheduling while atomic: udev/0x00000001/24088 Feb 7 09:04:15 garvin kernel: [<c03159d4>] schedule+0x504/0x5bb [<c012d3a6 >] __kernel_text_address+0x1c/0x27 Feb 7 09:04:15 garvin kernel: [<c0103bcc>] show_trace+0x2d/0xb5 [<c0102e75 >] syscall_call+0x7/0xb Feb 7 09:04:15 garvin kernel: [<c011f89e>] do_exit+0x353/0x3cf [<c01041ab >] do_divide_error+0x0/0xa8 Feb 7 09:04:15 garvin kernel: [<c010433b>] do_invalid_op+0x0/0xab [<c01043 dd>] do_invalid_op+0xa2/0xab Feb 7 09:04:15 garvin kernel: [<c0151d57>] page_remove_rmap+0x9a/0xa8 [<c 011d2ff>] call_console_drivers+0x80/0x14c Feb 7 09:04:15 garvin kernel: [<c011d8ac>] release_console_sem+0x77/0xb4 [<c011d6f5>] vprintk+0x1e7/0x2a9 Feb 7 09:04:15 garvin kernel: [<c014c569>] do_wp_page+0x204/0x311 [<c01039 a7>] error_code+0x4f/0x54 Feb 7 09:04:15 garvin kernel: [<c0151d57>] page_remove_rmap+0x9a/0xa8 [<c 014b4c0>] zap_pte_range+0x105/0x25a Feb 7 09:04:15 garvin kernel: [<c014b6fb>] unmap_page_range+0xe6/0x110 [<c 014b7f7>] unmap_vmas+0xd2/0x1f1 Feb 7 09:04:15 garvin kernel: [<c0150022>] exit_mmap+0x5f/0xda [<c011ad09 >] mmput+0x1f/0x95 Feb 7 09:04:15 garvin kernel: [<c011f647>] do_exit+0xfc/0x3cf [<c011f96f>] do_group_exit+0x29/0x90 Feb 7 09:04:15 garvin kernel: [<c0102e75>] syscall_call+0x7/0xb Created attachment 124444 [details]
/var/log/messages
This happens quite regularly now. I have attached a copy of /var/log/messages.
Also note that udev is behaving unexpectedly:
top - 18:27:37 up 3 days, 9:39, 4 users, load average: 7.34, 7.61, 7.26
Tasks: 68 total, 1 running, 67 sleeping, 0 stopped, 0 zombie
Cpu(s): 45.4% us, 5.2% sy, 0.0% ni, 49.3% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 255940k total, 250736k used, 5204k free, 8408k buffers
Swap: 522104k total, 104k used, 522000k free, 151500k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
465 root 13 -4 45104 42m 336 S 42.2 17.1 877:49.38 udevd
[This comment added as part of a mass-update to all open FC4 kernel bugs] FC4 has now transitioned to the Fedora legacy project, which will continue to release security related updates for the kernel. As this bug is not security related, it is unlikely to be fixed in an update for FC4, and has been migrated to FC5. Please retest with Fedora Core 5. Thank you. A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you. Never had this happen before, until today. I had recently updated this box to the newest FC5 kernel 2.6.20-1.2300.fc5smp. Within 24 hours I get the below and system crash/hang. If it happens again, will switch back to 2288, which did not do this (at least not in the few weeks it was out). Oddities about this system: Arco IDE DupliDisk1 hardware RAID 1. Everything else is pretty standard. Message from syslogd@firewall at Fri Mar 16 04:35:37 2007 ... firewall kernel: Bad page state in process 'cat' Message from syslogd@firewall at Fri Mar 16 04:35:37 2007 ... firewall kernel: page:c1307ff0 flags:0x40000000 mapping:c1af05c8 mapcount:0 count:0 (Not tainted) Message from syslogd@firewall at Fri Mar 16 04:35:37 2007 ... firewall kernel: Trying to fix it up, but a reboot is needed Since upgrading to the FC5 kernels 2.6.20-1.2300.fc5smp and 2.6.20-1.2307.fc5smp my system periodically hangs requiring a hardware reset. I find the following comments in /var/log/messages: Mar 21 14:29:30 des119 kernel: Bad page state in process 'grep' Mar 22 15:28:44 des119 kernel: Bad page state in process 'grep' Mar 22 15:28:44 des119 kernel: page:c1307ff0 flags:0x40000000 mapping:f7ec35c8 mapcount:0 count:0 (Tainted: PF ) Mar 22 15:28:44 des119 kernel: Trying to fix it up, but a reboot is needed Mar 24 10:58:08 des119 kernel: Bad page state in process 'apt-get' Mar 26 12:42:59 des119 kernel: Bad page state in process 'apt-cache' Hamish, you got this behaviour for sure in 2307? The box this happened on for me is back to 2288 and 100% stable. 2300 crashed 3 times before I gave up. I was hoping 2307 solved it. 2.6.20.* seems very buggy so far (this and other problems). I believe I have had this same problem with 2.6.19-1.2288.fc5 and 2.6.20-1.2312.fc5 but NOT with 2.6.19-1.2288.2.4.fc5. The machine never hung during the month 2.6.19-1.2288.2.4.fc5 was booted, but hung frequently while running 2.6.19-1.2288.fc5 or 2.6.20-1.2312.fc5. Most of the time it did not log anything useful, but last night finally gave me "kernel: Bad page state in process 'apple2'" when it hung running 2.6.20-1.2312.fc5. I started seeing this problem with 2.6.20-1.2316.fc5smp I had forgotten to turn on yum nightly updates, so I went directly from 2.6.19-1.2288.2.4.fc5smp to 2.6.20-1.2316.fc5smp I have reverted to 2.6.19-1.2288.2.4.fc5smp, which seems to have been stable for me. I should add that based on the thread above, I would hazard a guess that we're looking at a new bug, not the one against which this ticket was originally opened. (this is a mass-close to kernel bugs in NEEDINFO state) As indicated previously there has been no update on the progress of this bug therefore I am closing it as INSUFFICIENT_DATA. Please re-open if the issue still occurs for you and I will try to assist in its resolution. Thank you for taking the time to report the initial bug. If you believe that this bug was closed in error, please feel free to reopen this bug. |