Description of problem: Kernel BUG at include/linux/list.h:180 Version-Release number of selected component (if applicable): 2.6.16-1.2133_FC5 How reproducible: It has occured 3 times in the last month. All 3 times it has happened in the middle of the night. The header list.h only has 167 lines. Steps to Reproduce: 1. Not sure, but it maybe related to mythv recordings and or how full the disks is. 2. 3. Actual results: Jun 17 01:06:33 redwood kernel: List corruption. prev->next should be ffff810035c8d000, but was ffff8100171a1814 Jun 17 01:06:33 redwood kernel: ----------- [cut here ] --------- [please bite here ] --------- Jun 17 01:06:33 redwood kernel: Kernel BUG at include/linux/list.h:180 Jun 17 01:06:33 redwood kernel: invalid opcode: 0000 [1] SMP Jun 17 01:06:33 redwood kernel: last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed Jun 17 01:06:33 redwood kernel: CPU 0 Jun 17 01:06:33 redwood kernel: Modules linked in: nfs lockd nfs_acl autofs4 sunrpc video button battery ac sg ipv6 lp parport_pc parport floppy nvram usblp ehci_hcd usb_storage ohci_hcd snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_emu10k1 bt878 tuner snd_intel8x0 snd_rawmidi bttv snd_ac97_codec snd_ac97_bus snd_seq_dummy video_buf snd_seq_oss nvidia(U) compat_ioctl32 i2c_algo_bit snd_seq_midi_event snd_seq snd_pcm_oss v4l2_common btcx_risc ir_common tveeprom videodev emu10k1_gp gameport snd_mixer_oss i2c_nforce2 i2c_core snd_pcm snd_seq_device snd_util_mem snd_hwdep forcedeth snd_timer snd soundcore snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd sata_nv libata sd_mod scsi_mod Jun 17 01:06:33 redwood kernel: Pid: 202, comm: kswapd0 Tainted: P 2.6.16-1 .2133_FC5 #1 Jun 17 01:06:33 redwood kernel: RIP: 0010:[<ffffffff8017c39d>] <ffffffff8017c39d>{free_block+135} Jun 17 01:06:33 redwood kernel: RSP: 0018:ffff81000235fb68 EFLAGS: 00010086 Jun 17 01:06:33 redwood kernel: RAX: 0000000000000054 RBX: ffff810035c8d000 RCX: 000000000000aa52 Jun 17 01:06:33 redwood kernel: RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff803c89e0 Jun 17 01:06:33 redwood kernel: RBP: ffff810037ff0f40 R08: ffffffff803c89f8 R09: ffff81003ea87480 Jun 17 01:06:33 redwood kernel: R10: 0000000000000010 R11: 0000000000000000 R12: ffff810035c8dad8 Jun 17 01:06:34 redwood kernel: R13: 0000000000000000 R14: ffff81003ffd9400 R15: ffff81003ffdacd0 Jun 17 01:06:34 redwood kernel: FS: 0000000046e0c940(0000) GS:ffffffff80514000(0000) knlGS:00000000f7fd68e0 Jun 17 01:06:34 redwood kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Jun 17 01:06:34 redwood kernel: CR2: 00002aaab0c8a000 CR3: 000000002e2b0000 CR4: 00000000000006e0 Jun 17 01:06:34 redwood kernel: Process kswapd0 (pid: 202, threadinfo ffff81000235e000, task ffff810002241820) Jun 17 01:06:34 redwood kernel: Stack: ffff8100032ac090 000000000000002f 0000001 50000003c ffff810037ff0f40 Jun 17 01:06:34 redwood kernel: 000000000000003c ffff81003ffdac00 0000000000000000 ffff81003ffd9400 Jun 17 01:06:34 redwood kernel: ffff810037ff0f90 ffffffff8017c0e9 Expected results: Additional info:
*** This bug has been marked as a duplicate of 73733 ***
Since I removed the nvidia kernel module and I have upgraded the kernel to 2.6.17-1.2139_FC5 we have a different issue. Here is the latest kernel trace. Jul 2 04:03:12 redwood kernel: List corruption. next->prev should be ffff81000cb10000, but was 0556401754cc3938 Jul 2 04:03:12 redwood kernel: ----------- [cut here ] --------- [please bite here ] --------- Jul 2 04:03:12 redwood kernel: Kernel BUG at include/linux/list.h:185 Jul 2 04:03:12 redwood kernel: invalid opcode: 0000 [1] SMP Jul 2 04:03:12 redwood kernel: last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed Jul 2 04:03:12 redwood kernel: CPU 0 Jul 2 04:03:12 redwood kernel: Modules linked in: nfs lockd nfs_acl ipv6 autofs4 sunrpc sg video button battery acpi_memhotplug ac lp parport_pc parport usblp usb_storage ohci_hcd ehci_hcd floppy snd_intel8x0 snd_emu10k1_synth bt878 snd_emux_synth tuner snd_seq_virmidi snd_seq_midi_emul bttv snd_emu10k1 video_buf ir_common compat_ioctl32 emu10k1_gp gameport i2c_algo_bit v4l2_common btcx_risc tveeprom videodev snd_rawmidi snd_ac97_codec snd_ac97_bus snd_util_mem snd_seq_dummy snd_hwdep snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm forcedeth i2c_nforce2 snd_timer snd soundcore snd_page_alloc i2c_core dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd sata_nv libata sd_mod scsi_mod Jul 2 04:03:12 redwood kernel: Pid: 31471, comm: beagle-build-in Not tainted 2.6.17-1.2139_FC5 #1 Jul 2 04:03:12 redwood kernel: RIP: 0010:[<ffffffff80261fd7>] <ffffffff80261fd7>{cache_alloc_refill+337} Jul 2 04:03:12 redwood kernel: RSP: 0018:ffff81000dff3b08 EFLAGS: 00010086 Jul 2 04:03:12 redwood kernel: RAX: 0000000000000054 RBX: 0000000000000027 RCX: ffffffff80548a98 Jul 2 04:03:12 redwood kernel: RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff80548a80 Jul 2 04:03:12 redwood kernel: RBP: ffff81000cb10000 R08: ffffffff80548a98 R09: ffff81000dff3858 Jul 2 04:03:12 redwood kernel: R10: 0000000000000010 R11: 0000000000000000 R12: ffff81003efc6e40 Jul 2 04:03:12 redwood kernel: R13: ffff8100011bd000 R14: ffff81003efc6e50 R15: 0000000000000015 Jul 2 04:03:12 redwood kernel: FS: 0000000040485940(0063) GS:ffffffff8069c000(0000) knlGS:00000000f7fd68e0 Jul 2 04:03:12 redwood kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Jul 2 04:03:12 redwood kernel: CR2: 00002aaaaaaac000 CR3: 00000000366fa000 CR4: 00000000000006e0 Jul 2 04:03:12 redwood kernel: Process beagle-build-in (pid: 31471, threadinfo ffff81000dff2000, task ffff81003105e100) Jul 2 04:03:12 redwood kernel: Stack: 000000d03e0ab4e0 ffff81003efc9500 ffff81003efc6e80 00000000000000d0 Jul 2 04:03:12 redwood kernel: ffff81003efc9500 0000000000000246 ffff810019221830 ffff81003e04f400 Jul 2 04:03:12 redwood kernel: ffff810001170a28 ffffffff8020a35f Jul 2 04:03:12 redwood kernel: Call Trace: <ffffffff8020a35f>{kmem_cache_alloc+127} Jul 2 04:03:12 redwood kernel: <ffffffff80313c50>{selinux_inode_alloc_security+44} Jul 2 04:03:12 redwood kernel: <ffffffff8022692a>{alloc_inode+257} <ffffffff8022426c>{iget_locked+114} Jul 2 04:03:12 redwood kernel: <ffffffff8809077c>{:ext3:ext3_lookup+81} <ffffffff8020ca3f>{do_lookup+206} Jul 2 04:03:12 redwood kernel: <ffffffff802098b1>{__link_path_walk+2590} <ffffffff8020e6a7>{link_path_walk+92} Jul 2 04:03:12 redwood kernel: <ffffffff803170af>{selinux_inode_getattr+80} <ffffffff8020c7fe>{do_path_lookup+633} Jul 2 04:03:12 redwood kernel: <ffffffff802249c9>{__user_walk_fd+55} <ffffffff8022988c>{vfs_stat_fd+27} Jul 2 04:03:12 redwood kernel: <ffffffff802246b3>{sys_newstat+25} <ffffffff80262d8e>{system_call+126} Jul 2 04:03:12 redwood kernel: Jul 2 04:03:12 redwood kernel: Code: 0f 0b 68 73 8e 47 80 c2 b9 00 48 8b 55 00 48 8b 45 08 48 89 Jul 2 04:03:12 redwood kernel: RIP <ffffffff80261fd7>{cache_alloc_refill+337} RSP <ffff81000dff3b08> Jul 2 04:03:12 redwood kernel: <3>BUG: sleeping function called from invalid context at include/linux/rwsem.h:43 Jul 2 04:03:12 redwood kernel: in_atomic():0, irqs_disabled():1 Jul 2 04:03:12 redwood kernel: Jul 2 04:03:12 redwood kernel: Call Trace: <ffffffff80299bc2>{blocking_notifier_call_chain+31} Jul 2 04:03:12 redwood kernel: <ffffffff80215c97>{do_exit+32} <ffffffff80270b57>{kernel_math_error+0} Jul 2 04:03:12 redwood kernel: <ffffffff802710f4>{do_invalid_op+173} <ffffffff80261fd7>{cache_alloc_refill+337} Jul 2 04:03:12 redwood kernel: <ffffffff8021b4f2>{bad_range+16} <ffffffff8020a13f>{get_page_from_freelist+841} Jul 2 04:03:12 redwood kernel: <ffffffff80290991>{printk+82} <ffffffff80263c65>{error_exit+0} Jul 2 04:03:12 redwood kernel: <ffffffff80261fd7>{cache_alloc_refill+337} <ffffffff80261fd7>{cache_alloc_refill+337} Jul 2 04:03:12 redwood kernel: <ffffffff8020a35f>{kmem_cache_alloc+127} <ffffffff80313c50>{selinux_inode_alloc_security+44} Jul 2 04:03:12 redwood kernel: <ffffffff8022692a>{alloc_inode+257} <ffffffff8022426c>{iget_locked+114} Jul 2 04:03:12 redwood kernel: <ffffffff8809077c>{:ext3:ext3_lookup+81} <ffffffff8020ca3f>{do_lookup+206} Jul 2 04:03:12 redwood kernel: <ffffffff802098b1>{__link_path_walk+2590} <ffffffff8020e6a7>{link_path_walk+92} Jul 2 04:03:12 redwood kernel: <ffffffff803170af>{selinux_inode_getattr+80} <ffffffff8020c7fe>{do_path_lookup+633} Jul 2 04:03:12 redwood kernel: <ffffffff802249c9>{__user_walk_fd+55} <ffffffff8022988c>{vfs_stat_fd+27} Jul 2 04:03:12 redwood kernel: <ffffffff802246b3>{sys_newstat+25} <ffffffff80262d8e>{system_call+126} Jul 2 04:03:12 redwood kernel: BUG: beagle-build-in/31471, lock held at task exit time! Jul 2 04:03:12 redwood kernel: [ffff810019221830] {inode_init_once} Jul 2 04:03:12 redwood kernel: .. held by: beagle-build-in:31471 [ffff81003105e100, 134] Jul 2 04:03:12 redwood kernel: ... acquired at: do_lookup+0x8b/0x188
Also seen on our FC4 webserver with 2.6.17-1.2141_FC4smp Jul 22 05:09:32 hawk kernel: kernel BUG at include/linux/list.h:180! Jul 22 05:09:32 hawk kernel: invalid opcode: 0000 [#1] Jul 22 05:09:32 hawk kernel: SMP Jul 22 05:09:32 hawk kernel: last sysfs file: /class/vc/vcsa5/dev Jul 22 05:09:32 hawk kernel: Modules linked in: ipv6 autofs4 nfs lockd nfs_acl sunrpc ip_ conntrack_ftp ipt_REJECT ipt_LOG xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables uhci_hcd i2c_piix4 i2c_core e100 mii floppy dm_snapshot dm_zero dm_mi rror ext3 jbd dm_mod aic7xxx scsi_transport_spi sd_mod scsi_mod Jul 22 05:09:32 hawk kernel: CPU: 1 Jul 22 05:09:32 hawk kernel: EIP: 0060:[<c0462f0a>] Not tainted VLI Jul 22 05:09:32 hawk kernel: EFLAGS: 00010092 (2.6.17-1.2141_FC4smp #1) Jul 22 05:09:32 hawk kernel: EIP is at free_block+0x69/0x189 Jul 22 05:09:32 hawk kernel: eax: 00000044 ebx: e04a9000 ecx: 00000046 edx: 0000000 6 Jul 22 05:09:32 hawk kernel: esi: e04a9980 edi: effe11e0 ebp: 00000000 esp: effeeef 4 Jul 22 05:09:32 hawk kernel: ds: 007b es: 007b ss: 0068 Jul 22 05:09:32 hawk kernel: Process events/1 (pid: 9, threadinfo=effee000 task=c18a0050) Jul 22 05:09:32 hawk kernel: Stack: c0615a58 e04a9000 e04e9000 efe116a0 0000001e c1872d00 00000005 efe0f034 Jul 22 05:09:32 hawk kernel: efe0f020 0000001e efe0f000 00000000 c04630af 00000000 00000000 c1872d00 Jul 22 05:09:32 hawk kernel: effe1204 c1792a60 effe11e0 effe1170 c1872d00 c046315c 00000000 00000000 Jul 22 05:09:32 hawk kernel: Call Trace: Jul 22 05:09:32 hawk kernel: <c04630af> drain_array+0x85/0xb1 <c046315c> cache_reap+0x8 1/0x1c0 Jul 22 05:09:32 hawk kernel: <c04310ef> run_workqueue+0x77/0xb4 <c04630db> cache_reap+0 x0/0x1c0 Jul 22 05:09:32 hawk kernel: <c04311b9> worker_thread+0x0/0x106 <c043128e> worker_threa d+0xd5/0x106 Jul 22 05:09:32 hawk kernel: <c041ee2d> default_wake_function+0x0/0xc <c0434001> kthrea d+0x9d/0xc9 Jul 22 05:09:32 hawk kernel: <c0433f64> kthread+0x0/0xc9 <c0402005> kernel_thread_helpe r+0x5/0xb Jul 22 05:09:32 hawk kernel: Code: 03 8b 52 0c 8b 5a 24 8b 4c 24 08 8b 54 24 28 8b 43 04 8b bc 91 90 00 00 00 8b 00 39 d8 74 17 50 53 68 58 5a 61 c0 e8 e4 13 fc ff <0f> 0b b4 00 8e 5a 61 c0 83 c4 0c 8b 03 8b 40 04 39 d8 74 17 50 Jul 22 05:09:32 hawk kernel: EIP: [<c0462f0a>] free_block+0x69/0x189 SS:ESP 0068:effeeef4 Jul 22 05:09:32 hawk kernel: <3>BUG: sleeping function called from invalid context at in clude/linux/rwsem.h:43 Jul 22 05:09:32 hawk kernel: in_atomic():0, irqs_disabled():1 Jul 22 05:09:32 hawk kernel: <c042f3ce> blocking_notifier_call_chain+0x18/0x4b <c0425a4 4> do_exit+0x1c/0x78d Jul 22 05:09:32 hawk kernel: <c040541e> die+0x25b/0x263 <c04056e3> do_invalid_op+0x0/0x 9d Jul 22 05:09:32 hawk kernel: <c0405774> do_invalid_op+0x91/0x9d <c0462f0a> free_block+0 x69/0x189 Jul 22 05:09:32 hawk kernel: <c04242ca> vprintk+0x2a5/0x2c9 <c04048cb> error_code+0x4f/ 0x54 Jul 22 05:09:32 hawk kernel: <c0462f0a> free_block+0x69/0x189 <c04630af> drain_array+0x 85/0xb1 Jul 22 05:09:32 hawk kernel: <c046315c> cache_reap+0x81/0x1c0 <c04310ef> run_workqueue+ 0x77/0xb4 Jul 22 05:09:32 hawk kernel: <c04630db> cache_reap+0x0/0x1c0 <c04311b9> worker_thread+0 x0/0x106 Jul 22 05:09:32 hawk kernel: <c043128e> worker_thread+0xd5/0x106 <c041ee2d> default_wak e_function+0x0/0xc Jul 22 05:09:32 hawk kernel: <c0434001> kthread+0x9d/0xc9 <c0433f64> kthread+0x0/0xc9 Jul 22 05:09:32 hawk kernel: <c0402005> kernel_thread_helper+0x5/0xb Jul 22 05:09:32 hawk kernel: BUG: events/1/9, lock held at task exit time! Jul 22 05:09:32 hawk kernel: [c06ff200] {cache_chain_mutex} Jul 22 05:09:32 hawk kernel: .. held by: events/1: 9 [c18a0050, 110] Jul 22 05:09:32 hawk kernel: ... acquired at: cache_reap+0x11/0x1c0
Since the kernel trace was removed I have had two kernel faults in two days. Both seemed to involve cpu scaling. Linux redwood 2.6.17-1.2157_FC5 #1 SMP Tue Jul 11 22:53:56 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux Jul 30 16:45:59 redwood kernel: List corruption. prev->next should be ffff810033e1b000, but was ffff810098aaa6af Jul 30 16:45:59 redwood kernel: ----------- [cut here ] --------- [please bite here ] --------- Jul 30 16:45:59 redwood kernel: Kernel BUG at include/linux/list.h:180 Jul 30 16:45:59 redwood kernel: invalid opcode: 0000 [1] SMP Jul 30 16:45:59 redwood kernel: last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed Jul 30 16:45:59 redwood kernel: CPU 0 Jul 30 16:45:59 redwood kernel: Modules linked in: nfs lockd nfs_acl autofs4 sunrpc video button battery acpi_memhotplug ac sg ipv6 lp parport_pc parport usb_storage usblp snd_intel8x0 snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul bt878 ehci_hcd tuner bttv video_buf ir_common compat_ioctl32 i2c_algo_bit snd_emu10k1 ohci_hcd snd_rawmidi emu10k1_gp gameport v4l2_common btcx_risc tveeprom snd_ac97_codec snd_ac97_bus snd_util_mem floppy videodev snd_hwdep nvidia(U) snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm i2c_nforce2 i2c_core snd_timer forcedeth snd soundcore snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd sata_nv libata sd_mod scsi_mod Jul 30 16:45:59 redwood kernel: Pid: 207, comm: kswapd0 Tainted: P 2.6.17-1.2157_FC5 #1 Jul 30 16:45:59 redwood kernel: RIP: 0010:[<ffffffff802c6622>] <ffffffff802c6622>{free_block+159} Jul 30 16:45:59 redwood kernel: RSP: 0018:ffff81003ed35b38 EFLAGS: 00010086 Jul 30 16:45:59 redwood kernel: RAX: 0000000000000054 RBX: ffff810033e1b000 RCX: ffffffff80549a98 Jul 30 16:45:59 redwood kernel: RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffffff80549a80 Jul 30 16:45:59 redwood kernel: RBP: ffff810037ff5ec0 R08: ffffffff80549a98 R09: ffff81003ed35888 Jul 30 16:45:59 redwood kernel: R10: 0000000000000010 R11: 0000000000000000 R12: ffff810033e1b4f0 Jul 30 16:45:59 redwood kernel: R13: 0000000000000000 R14: ffff81003f1c1440 R15: ffff8100011bd968 Jul 30 16:45:59 redwood kernel: FS: 0000000040a00940(0000) GS:ffffffff8069d000(0000) knlGS:00000000f7fd68e0 Jul 30 16:45:59 redwood kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Jul 31 20:00:44 redwood kernel: List corruption. next->prev should be ffff810033f37000, but was 00e0084156075805 Jul 31 20:00:44 redwood kernel: ----------- [cut here ] --------- [please bite here ] --------- Jul 31 20:00:44 redwood kernel: Kernel BUG at include/linux/list.h:185 Jul 31 20:00:44 redwood kernel: invalid opcode: 0000 [1] SMP Jul 31 20:00:44 redwood kernel: last sysfs file: /devices/system/cpu/cpu0/cpufreq/scaling_setspeed Jul 31 20:00:44 redwood kernel: CPU 0 Jul 31 20:00:44 redwood kernel: Modules linked in: nfs lockd nfs_acl autofs4 sunrpc video button battery acpi_memhotplug ac sg ipv6 lp parport_pc parport usblp usb_storage snd_intel8x0 snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_emu10k1 ohci_hcd bt878 ehci_hcd tuner emu10k1_gp gameport snd_rawmidi snd_ac97_codec snd_ac97_bus snd_util_mem bttv video_buf ir_common compat_ioctl32 nvidia(U) snd_hwdep i2c_algo_bit v4l2_common floppy snd_seq_dummy btcx_risc tveeprom videodev snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm forcedeth snd_timer snd soundcore i2c_nforce2 i2c_core snd_page_alloc dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd sata_nv libata sd_mod scsi_mod Jul 31 20:00:44 redwood kernel: Pid: 5, comm: events/0 Tainted: P 2.6.17-1.2157_FC5 #1 Jul 31 20:00:44 redwood kernel: RIP: 0010:[<ffffffff802c6649>] <ffffffff802c6649>{free_block+198} Jul 31 20:00:44 redwood kernel: RSP: 0000:ffff810037e17d48 EFLAGS: 00010086 Jul 31 20:00:44 redwood kernel: RAX: 0000000000000054 RBX: ffff810033f37000 RCX: ffffffff80549a98 Jul 31 20:00:44 redwood kernel: RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffffffff80549a80 Jul 31 20:00:44 redwood kernel: RBP: ffff81003f1c6c40 R08: ffffffff80549a98 R09: ffff810037e17a98 Jul 31 20:00:44 redwood kernel: R10: 0000000000000010 R11: 0000000000000000 R12: ffff810033f376d0 Jul 31 20:00:44 redwood kernel: R13: ffff81003f1c6c80 R14: ffff81003f1d10c0 R15: ffff81003f1d21b0 Jul 31 20:00:44 redwood kernel: FS: 0000000048c0d940(0000) GS:ffffffff8069d000(0000) knlGS:0000000000000000 Jul 31 20:00:44 redwood kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b Jul 31 20:00:44 redwood kernel: CR2: 00002aaaaaaac000 CR3: 000000001fb83000 CR4: 00000000000006e0 Jul 31 20:00:44 redwood kernel: Process events/0 (pid: 5, threadinfo ffff810037e16000, task ffff810037fef820) Jul 31 20:00:44 redwood kernel: Stack: ffffffffffffffff 0000000037fef820 0000003100000060 ffff81003f1d2028 Jul 31 20:00:45 redwood kernel: 0000000000000060 ffff81003f1d2000 ffff81003f1c6c80 0000000000000000 Jul 31 20:00:45 redwood kernel: ffff81003f1d10c0 ffffffff802c6813 Jul 31 20:00:45 redwood kernel: Call Trace: <ffffffff802c6813>{drain_array+139} <ffffffff802c801d>{cache_reap+250} Jul 31 20:00:45 redwood kernel: <ffffffff802c7f23>{cache_reap+0} <ffffffff80250f16>{run_workqueue+159} Jul 31 20:00:45 redwood kernel: <ffffffff8024d69e>{worker_thread+0} <ffffffff8024d78e>{worker_thread+240} Jul 31 20:00:45 redwood kernel: <ffffffff8028b810>{default_wake_function+0} <ffffffff80234d6c>{kthread+246} Jul 31 20:00:45 redwood kernel: <ffffffff80263ade>{child_rip+8} <ffffffff80234c76>{kthread+0} Jul 31 20:00:45 redwood kernel: <ffffffff80263ad6>{child_rip+0} Jul 31 20:00:45 redwood kernel: Jul 31 20:00:45 redwood kernel: Code: 0f 0b 68 73 9e 47 80 c2 b9 00 48 8b 13 48 8b 43 08 48 89 42 Jul 31 20:00:45 redwood kernel: RIP <ffffffff802c6649>{free_block+198} RSP <ffff810037e17d48> Jul 31 20:00:45 redwood kernel: <3>BUG: sleeping function called from invalid context at include/linux/rwsem.h:43 Jul 31 20:00:45 redwood kernel: in_atomic():0, irqs_disabled():1 Jul 31 20:00:45 redwood kernel: Jul 31 20:00:45 redwood kernel: Call Trace: <ffffffff802998a6>{blocking_notifier_call_chain+31} Jul 31 20:00:45 redwood kernel: <ffffffff80215c43>{do_exit+32} <ffffffff80270817>{kernel_math_error+0} Jul 31 20:00:45 redwood kernel: <ffffffff80270db4>{do_invalid_op+173} <ffffffff802c6649>{free_block+198} Jul 31 20:00:45 redwood kernel: <ffffffff80290675>{printk+82} <ffffffff80263925>{error_exit+0} Jul 31 20:00:45 redwood kernel: <ffffffff802c6649>{free_block+198} <ffffffff802c6649>{free_block+198} Jul 31 20:00:45 redwood kernel: <ffffffff802c6813>{drain_array+139} <ffffffff802c801d>{cache_reap+250} Jul 31 20:00:45 redwood kernel: <ffffffff802c7f23>{cache_reap+0} <ffffffff80250f16>{run_workqueue+159} Jul 31 20:00:45 redwood kernel: <ffffffff8024d69e>{worker_thread+0} <ffffffff8024d78e>{worker_thread+240} Jul 31 20:00:45 redwood kernel: <ffffffff8028b810>{default_wake_function+0} <ffffffff80234d6c>{kthread+246} Jul 31 20:00:45 redwood kernel: <ffffffff80263ade>{child_rip+8} <ffffffff80234c76>{kthread+0} Jul 31 20:00:45 redwood kernel: <ffffffff80263ad6>{child_rip+0} Jul 31 20:00:45 redwood kernel: BUG: events/0/5, lock held at task exit time! Jul 31 20:00:45 redwood kernel: [ffffffff80551a80] {cache_chain_mutex} Jul 31 20:00:45 redwood kernel: .. held by: events/0: 5 [ffff810037fef820, 110] Jul 31 20:00:45 redwood kernel: ... acquired at: cache_reap+0x26/0x2fd
Additional notes. The mother board is a GA-K8NS Gigabyte K8 Triton board amd processor is AMD Athlon ADA3000AEP4AX. I'm running two other systems with the same board and processor but Fedora core 4 which do not have an issue. Linux spruce 2.6.17-1.2142_FC4 #1 Tue Jul 11 22:41:06 EDT 2006 x86_64 x86_64 x86_64 GNU/Linux
The traces with the nvidia module loaded are pretty uninteresting, as it's not been unknown for that to cause memory corruption issues in the past. However, they are similar to the traces in comments #2 & #3. I assume the affected boxes pass a test of memtest86 ? It's unusual because if that list gets corrupted, we notice bad things happening very quickly, and it only seems to be affecting you and Orion, and bugs here would tend to affect a lot more people. Something else that may try to isolate this would be to try running without some of the modules loaded. Knowing for eg, "It only happens if bttv has been loaded" would be a useful datapoint. But first of all, please rule out the nvidia module from any further traces.
After the first posting I did remove the nvidia module and ran until the system crashed again which ruled out the nvidia driver module. I have not run memtest86 but will see if I can figure out a way to do it. The system does not have a floppy drive and we have been trying to create bootable thumb drives at work which seems to be some what of a black art.
yum install memtest86+ This should install /boot/memtest86+-1.65. Then add to /etc/grub.conf: title Memtest86+ root (hd0,0) kernel /memtest86+-1.65 (use the root and kernel prefix from your existing kernel entries) then reboot and select Memtest86+. Voila!
They had an iso image so I burned a CD and ran memory tests for 2.5 hours 6 passes with no failures. That gave me time to help my son with his Winders virus clean up:-) I have disabled the cpuspeed service since that was listed as the last command in the crash. Since it is so infrequent it may take a few weeks before I know if I have found the problem.
Note, rmmod nvidia is not enough to ensure an untainted kernel. It must never have been loaded during boot. By the time you get to rmmod it, it may already have corrupted kernel text.
I did reboot the system and went through a couple of kernel updates. So the issue I'm experiencing is not related to the nvidia module.
I think turning off the cpuspeed service has stopped the issue. There has been a lengthy discussion on the mailing list: http://www.ivtvdriver.org/pipermail/ivtv-users/ With the subject: The ivtv DMA error It appears that others are experiencing related issues with the AMD systems and the k8powernow and cpuspeed functions.
A new kernel update has been released (Version: 2.6.18-1.2200.fc5) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. In the last few updates, some users upgrading from FC4->FC5 have reported that installing a kernel update has left their systems unbootable. If you have been affected by this problem please check you only have one version of device-mapper & lvm2 installed. See bug 207474 for further details. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. If this bug has been fixed, but you are now experiencing a different problem, please file a separate bug for the new problem. Thank you.
Created attachment 138867 [details] Part of /var/log/message file.
Dam I dislike unfriendly modern applications. I entered more comments before doing the attachment which were disgarded! To re-explain the steps that go with the above log file fragment. I upgraded to kernel 2.6.18-1.2200.fc5 and then turned the service cpuspeed back on. I did not upgrade from FC4 to FC5 but checked the device-mapper & lvm2 rpms anyway and found that there are two device-mapper, x86-64 & i386, and one lvm2 rpms installed. I have had one crash since doing this. The trace is in the attachment file. I also see the following at boot time in the file /var/log/messages: Oct 16 22:00:37 redwood kernel: powernow-k8: Found 1 AMD Athlon(tm) 64 Processor 3000+ processors (version 2.00.00) Oct 16 22:00:37 redwood kernel: powernow-k8: invalid freq entries 3900000 kHz vs. 65535000 kHz Oct 16 22:00:37 redwood kernel: powernow-k8: invalid freq entries 3900000 kHz vs. 65535000 kHz Oct 16 22:00:37 redwood kernel: powernow-k8: 0 : fid 0xc (2000 MHz), vid 0x2 Oct 16 22:00:37 redwood kernel: powernow-k8: 1 : fid 0xa (1800 MHz), vid 0x6 Oct 16 22:00:37 redwood kernel: powernow-k8: 2 : fid 0x2 (1000 MHz), vid 0x12Oct 16 22:00:37 redwood kernel: ACPI: (supports S0 S1 S4 S5)
This bug check has been triggered in a number of different reports with the nvidia module loaded. Unfortunatly, there's nothing we can do to fix problems in Nvidia's code.
This bug has nothing to do with the nvidia module. I have removed the module in the past and had the same failures occur.
Ok, but please don't post tainted traces. They are worthless, and just add to the noise. Apart from comment #2, every trace posted so far has been tainted. We need an untainted trace from the current kernel.
Never mind, it seems to hard to figure out. I think it is bios and powernow-k8 compatiibility issues. But just close it the bug as I can work around it by turning off the cpuspeed service.