kernel-2.6.11-1-14_FC3 All versions released of the AMD64 kernels have been causing sporadic oops and hanging entirely. This includes kernels: title Fedora Core (2.6.11-1.14_FC3) title Fedora Core (2.6.10-1.770_FC3) title Fedora Core (2.6.10-1.766_FC3) title Fedora Core (2.6.10-1.760_FC3) title Fedora Core (2.6.9-1.667) When the system hands, it has done so on boot up a few times... once while invoking CUPS, other times I'm not sure where the hang occured. With the latest kernel 2.6.11-1.14_FC3, a few times during boot up the sendmail process has cored with a segv. After rebooting a few times, it will boot without error and will generate an occasional oops and kernel diagnostic. It will almost always continue to run after the oops/diagnostic, but will normally hang completely in some random amount of time thereafter. I believe it has also hung without ever receiving the oops/diagnostic, but I can't be sure of that now. I've had it happen while attempting an up2date run (fun to correct), and sometimes when no real activity has been occurring on the system. Sometimes it will hang within a short period of time (whether in use or on screen save) and sometimes when actually trying to use the system. There does not seem to be any timing/activity cause that I can find consistently. Reproducibility: Hangs - unpredictable oops/kernel diagnostics - most times after reboot... 1.Nothing special to reproduce it. Following this is today's console and dmesg file contents is attached. I've tried to intentionally cause it by leaving it on for several days and sometimes it works fine and sometimes it will hang very shortly after the screen awakens (and sometimes not) Message from syslogd@argonaut at Fri Apr 22 07:06:14 2005 ... argonaut kernel: Oops: 0000 [1] Message from syslogd@argonaut at Fri Apr 22 07:06:14 2005 ... argonaut kernel: CR2: 0000000000002000 Dmesg file attached ---------------------------------------------------------- Following is an earlier message from the kernel: Unable to handle kernel NULL pointer dereference at 0000000000000078 RIP: <ffffffff801a54d5>{clear_inode+241} PML4 31dd4067 PGD 31a83067 PMD 31380067 PTE 0 Oops: 0000 [1] CPU 0 Modules linked in: parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc pcmcia yenta_socket pcmcia_core md5 ipv6 vfat fat dm_mod video button battery ac ohci1394 ieee1394 ohci_hcd ehci_hcd snd_emu10k1 snd_rawmidi snd_seq_device snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_util_mem snd_hwdep snd soundcore forcedeth floppy ext3 jbd sata_sil libata sd_mod scsi_mod Pid: 196, comm: kswapd0 Not tainted 2.6.10-1.770_FC3 RIP: 0010:[<ffffffff801a54d5>] <ffffffff801a54d5>{clear_inode+241} RSP: 0018:0000010037c1fd98 EFLAGS: 00010206 RAX: 0000000000000000 RBX: 000001002fe69238 RCX: 000001002fee3c00 RDX: 0000000000000002 RSI: 000000000000004e RDI: 000001002fe69538 RBP: 000001003c8ac5d8 R08: 0000000000000000 R09: ffffffff804949e8 R10: 7fffffffffffffff R11: 0000000000000000 R12: 0000000000000079 R13: 0000000000000020 R14: 00000000000000d0 R15: 0000000000000838 FS: 0000002a9558a3e0(0000) GS:ffffffff804ff980(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000078 CR3: 0000000000101000 CR4: 00000000000006e0 Process kswapd0 (pid: 196, threadinfo 0000010037c1e000, task 000001003fda8030) Stack: 000001002fe69238 ffffffff801a81eb 000001003c8ac630 ffffffff801a17de 00000000000004b7 000001003ffe9440 00000000000000ca ffffffff801a293d 0000000000000000 ffffffff8016ccb7 Call Trace:<ffffffff801a81eb>{generic_drop_inode+665} <ffffffff801a17de>{prune_dcache+950} <ffffffff801a293d>{shrink_dcache_memory+21} <ffffffff8016ccb7>{shrink_slab+188} <ffffffff8016ea0d>{balance_pgdat+510} <ffffffff8016ec1f>{kswapd+224} <ffffffff801511dc>{autoremove_wake_function+0} <ffffffff801511dc>{autoremove_wake_function+0} <ffffffff8012fa5c>{schedule_tail+11} <ffffffff8010f303>{child_rip+8} <ffffffff8016eb3f>{kswapd+0} <ffffffff8010f2fb>{child_rip+0} Code: 48 8b 40 78 48 85 c0 74 05 48 89 df ff d0 48 83 bb d8 02 00 RIP <ffffffff801a54d5>{clear_inode+241} RSP <0000010037c1fd98> CR2: 0000000000000078
Created attachment 113558 [details] Dmesg for 04/22 oops and diagnostic referenced 1st above
I have seen hangs as well on 2.6.9-1.681_FC3 (caps lock starts blinking). Recently, I upgraded to 2.6.11-1.14_FC3 yesterday and received my first one of these messages. I run the binary nvidia driver as well (NVIDIA-Linux-x86_64-1.0-7174-pkg2.run). No lock ups on 2.6.11, yet. Compaq R3240US laptop. Let me know if I can help with testing. ------ console message: Message from syslogd@seti at Tue May 3 10:41:46 2005 ... seti kernel: Oops: 0010 [1] Message from syslogd@seti at Tue May 3 10:41:47 2005 ... seti kernel: CR2: 0000000000000000 /var/log/messages: May 3 10:41:46 seti kernel: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: May 3 10:41:46 seti kernel: [<0000000000000000>] May 3 10:41:46 seti kernel: PGD 4130d067 PUD 41301067 PMD 0 May 3 10:41:46 seti kernel: Oops: 0010 [1] May 3 10:41:46 seti kernel: CPU 0 May 3 10:41:46 seti kernel: Modules linked in: nvidia(U) md5 ipv6 parport_pc lp parport autofs4 pcmcia ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod video button battery ac ohci1394 ieee1394 yenta_socket rsrc_nonstatic pcmcia_core ohci_hcd ehci_hcd i2c_nforce2 i2c_core snd_intel8x0m snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc orinoco hermes 8139too mii ext3 jbd May 3 10:41:46 seti kernel: Pid: 4786, comm: pam-panel-icon Tainted: P 2.6.11-1.14_FC3 May 3 10:41:46 seti kernel: RIP: 0010:[<0000000000000000>] [<0000000000000000>] May 3 10:41:46 seti kernel: RSP: 0000:ffff8100412d1ef0 EFLAGS: 00010282 May 3 10:41:46 seti kernel: RAX: ffffffff804aa1a0 RBX: 0000000000000145 RCX: 00000000c0000100 May 3 10:41:46 seti kernel: RDX: 0000000000000000 RSI: ffff81004121a440 RDI: ffff8100412f90c0 May 3 10:41:46 seti kernel: RBP: ffff8100412f90c0 R08: ffff8100412d0000 R09: 00000000001e709b May 3 10:41:46 seti kernel: R10: 000000004277b7da R11: 0000000000000000 R12: ffff81002c45da4c May 3 10:41:46 seti kernel: R13: 0000000000000000 R14: ffff81002c45da40 R15: 0000000000000003 May 3 10:41:46 seti kernel: FS: 00002aaaaaad4e80(0000) GS:ffffffff80550700 (0000) knlGS:000000005b50dbb0 May 3 10:41:46 seti kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 3 10:41:46 seti kernel: CR2: 0000000000000000 CR3: 0000000041319000 CR4: 00000000000006e0 May 3 10:41:46 seti kernel: Process pam-panel-icon (pid: 4786, threadinfo ffff8100412d0000, task ffff8100477397e0) May 3 10:41:47 seti kernel: Stack: ffffffff801b2549 ffff81002c45da4c 7fffffffffffffff ffff81002c45da40 May 3 10:41:47 seti kernel: 0000000000544b70 00000000412f90c0 0000000000000000 ffffffff801b16b0 May 3 10:41:47 seti kernel: ffff81002a8f3000 00007fff00000000 May 3 10:41:47 seti kernel: Call Trace:<ffffffff801b2549>{sys_poll+489} <ffffffff801b16b0>{__pollwait+0} May 3 10:41:47 seti kernel: <ffffffff801b0ffa>{sys_ioctl+106} <ffffffff8010ec0a>{system_call+126} May 3 10:41:47 seti kernel: May 3 10:41:47 seti kernel: May 3 10:41:47 seti kernel: Code: Bad RIP value. May 3 10:41:47 seti kernel: RIP [<0000000000000000>] RSP <ffff8100412d1ef0> May 3 10:41:47 seti kernel: CR2: 0000000000000000
Another one while running unixbench (4.1.0) on kernel-2.6.11-1.27_FC3: May 25 17:06:01 seti kernel: Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP: May 25 17:06:01 seti kernel: <ffffffff80320983>{sock_poll+19} May 25 17:06:01 seti kernel: PGD 422e2067 PUD 422dc067 PMD 0 May 25 17:06:01 seti kernel: Oops: 0000 [1] May 25 17:06:01 seti kernel: CPU 0 May 25 17:06:01 seti kernel: Modules linked in: nls_utf8 lp autofs4 vmnet(U) parport_pc parport vmmon(U) pcmcia ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod video button battery ac nvidia(U) md5 ipv6 cdc_acm ohci1394 ieee1394 yenta_socket rsrc_nonstatic pcmcia_core ohci_hcd ehci_hcd i2c_nforce2 i2c_core snd_intel8x0m snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc orinoco hermes 8139too mii ext3 jbd May 25 17:06:01 seti kernel: Pid: 5032, comm: X Tainted: P 2.6.11-1.27_FC3 May 25 17:06:01 seti kernel: RIP: 0010:[<ffffffff80320983>] <ffffffff80320983>{sock_poll+19} May 25 17:06:01 seti kernel: RSP: 0000:ffff810042253de0 EFLAGS: 00010246 May 25 17:06:01 seti kernel: RAX: 0000000000000000 RBX: ffff810042b4f680 RCX: 0000000000000000 May 25 17:06:01 seti kernel: RDX: 0000000000000000 RSI: ffff810041a70918 RDI: ffff810042b4f680
Comment #2: Ooopses while using the nvidia binary drivers can't be investigated. Can you reproduce the problem with the binary drivers? (the orignial oops from Comment #0 was not tainted though)
Same goes for if you are using vmware binary modules...
The original post is not using the nvidia binary video drivers if that's what you mean. It does have an nvidia based graphics card, but it is running the xfree86 provided drivers. It is also an nvidia nforce 3 based motherboard (Gigabyte brand) if you mean that too..just not sure what your reference is to.
I've pulled the binary-only nvidia and vmware kernel modules and will try to reproduce. I don't get oopses every day, so it may take a few days. Is there anything else I can to to gather more info if I have another occurrence?
Well, that didn't take long . . . May 26 10:01:58 seti kernel: <1>Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP: May 26 10:01:58 seti kernel: <ffffffff80320983>{sock_poll+19} May 26 10:01:58 seti kernel: PGD 452b6067 PUD 452b0067 PMD 44e76067 PTE 0 May 26 10:01:58 seti kernel: Oops: 0000 [2] May 26 10:01:58 seti kernel: CPU 0 May 26 10:01:58 seti kernel: Modules linked in: parport_pc lp parport autofs4 pcmcia ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod video button battery ac md5 ipv6 ohci1394 ieee1394 yenta_socket rsrc_nonstatic pcmcia_core ohci_hcd ehci_hcd i2c_nforce2 i2c_core snd_intel8x0m snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc orinoco hermes 8139too mii ext3 jbd May 26 10:01:58 seti kernel: Pid: 4385, comm: X Not tainted 2.6.11-1.27_FC3 May 26 10:01:58 seti kernel: RIP: 0010:[<ffffffff80320983>] <ffffffff80320983>{sock_poll+19} May 26 10:01:58 seti kernel: RSP: 0018:ffff81004515bde0 EFLAGS: 00010246 May 26 10:01:58 seti kernel: RAX: 0000000000000000 RBX: ffff8100453125c0 RCX: 0000000000000000 May 26 10:01:58 seti kernel: RDX: 0000000000000000 RSI: ffff810044b90918 RDI: ffff8100453125c0 May 26 10:01:58 seti kernel: RBP: 0000000000000002 R08: ffff81004515a000 R09: 0000000000000000 May 26 10:01:58 seti kernel: R10: 0000000000000118 R11: 0000000000000002 R12: 0000000000000001 May 26 10:01:58 seti gconfd (molson-4514): Received signal 15, shutting down cleanly May 26 10:01:59 seti kernel: R13: 0000000000000001 R14: 0000000000000145 R15: 0000001fffffff8a May 26 10:02:01 seti gconfd (molson-4514): Exiting May 26 10:02:03 seti kernel: FS: 00002aaaaaacb3e0(0000) GS:ffffffff80543380 (0000) knlGS:0000000000000000 May 26 10:02:05 seti crond(pam_unix)[26982]: session opened for user molson by (uid=0) May 26 10:02:08 seti kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 26 10:02:11 seti kernel: CR2: 0000000000000040 CR3: 0000000044d17000 CR4: 00000000000006e0 May 26 10:02:15 seti crond(pam_unix)[26982]: session closed for user molson May 26 10:02:16 seti kernel: Process X (pid: 4385, threadinfo ffff81004515a000, task ffff810044e12030) May 26 10:02:19 seti kernel: Stack: ffffffff801b1b1b 0000000000000000 ffff81004515be88 0000000000000000 May 26 10:02:20 seti kernel: 0000000000040000 0000001fffffff8a 0000000000000000 0000000000000000 May 26 10:02:21 seti kernel: 0000000000000000 ffff8100481e4708 May 26 10:02:21 seti kernel: Call Trace:<ffffffff801b1b1b>{do_select+1307} <ffffffff801b1510>{__pollwait+0} May 26 10:02:22 seti kernel: <ffffffff801b201a>{sys_select+890} <ffffffff8010ec0a>{system_call+126} May 26 10:02:22 seti kernel: May 26 10:02:22 seti kernel: May 26 10:02:22 seti kernel: Code: 4c 8b 58 40 41 ff e3 66 66 90 66 66 90 48 8b 47 10 48 89 f2 May 26 10:02:22 seti kernel: RIP <ffffffff80320983>{sock_poll+19} RSP <ffff81004515bde0> May 26 10:02:22 seti kernel: CR2: 0000000000000040
This may be a hardware problem on my end. The occurrence of this roughly correlates to a memory upgrade (512M - 1280M) put into this machine. I updated the kernel from 2.6.9-1.681 at that time as well, so, I'm not sure which may be the cause. I'm going to replace the old memory and obtaining an RMA on the new memory. Until I do that, you may want to hold off on any further investigation. In the mean time I'll try and reproduce on 512M of memory. It's worth asking if this could be somehow related to have > 1024M of memory. I doubt it though as you would have seem more reports of problems. I'll follow up in a week or two after I get the replacement memory installed.
Created attachment 115131 [details] Original OOPS and RC2 message after new kernel tests
Created attachment 115132 [details] General Fault after attempted reboots and before poweroff new kernel
I think in my case this was a hardware problem. I've been re-testing with a new memory module for the past week and have been unable to reproduce the error. Sorry for the false alarm.
There has as yet been no fix or temporary workaround for the the original and subsequent posting. There does not appear to be any hardware problems on the original system. This version of linux will be removed shortly due to the continued problems and inability to use for any current need. If there are any diagnostic info needed before removal, please advise.
I have this (very similar) problem after installing FC4. I have two installations on different partitions (32 & 64 bit for AMD) 32-bit kernel seems fine (although it has once hung on my machine at work). 64-bit randomly hangs, or reboots without notification text mode or X. I have an nVidia card. Booting rescue mode from install disk is the same. Always gets as far as login, but hang occurs as a result of one or more commands. No messages in syslog. When hung, system is completely dead. Hardware reset needed.
An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you.
This bug has been automatically closed as part of a mass update. It had been in NEEDINFO state since July 2005. If this bug still exists in current errata kernels, please reopen this bug. There are a large number of inactive bugs in the database, and this is the only way to purge them. Thank you.