Bug 155711
Summary: | Kernel (several versions) hangs and oops on AMD64 | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Need Real Name <clido01> |
Component: | kernel | Assignee: | Dave Jones <davej> |
Status: | CLOSED CANTFIX | QA Contact: | Brian Brock <bbrock> |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 3 | CC: | pfrields, redhat |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2005-10-03 00:22:38 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Attachments: |
Description
Need Real Name
2005-04-22 14:28:33 UTC
Created attachment 113558 [details]
Dmesg for 04/22 oops and diagnostic referenced 1st above
I have seen hangs as well on 2.6.9-1.681_FC3 (caps lock starts blinking). Recently, I upgraded to 2.6.11-1.14_FC3 yesterday and received my first one of these messages. I run the binary nvidia driver as well (NVIDIA-Linux-x86_64-1.0-7174-pkg2.run). No lock ups on 2.6.11, yet. Compaq R3240US laptop. Let me know if I can help with testing. ------ console message: Message from syslogd@seti at Tue May 3 10:41:46 2005 ... seti kernel: Oops: 0010 [1] Message from syslogd@seti at Tue May 3 10:41:47 2005 ... seti kernel: CR2: 0000000000000000 /var/log/messages: May 3 10:41:46 seti kernel: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: May 3 10:41:46 seti kernel: [<0000000000000000>] May 3 10:41:46 seti kernel: PGD 4130d067 PUD 41301067 PMD 0 May 3 10:41:46 seti kernel: Oops: 0010 [1] May 3 10:41:46 seti kernel: CPU 0 May 3 10:41:46 seti kernel: Modules linked in: nvidia(U) md5 ipv6 parport_pc lp parport autofs4 pcmcia ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod video button battery ac ohci1394 ieee1394 yenta_socket rsrc_nonstatic pcmcia_core ohci_hcd ehci_hcd i2c_nforce2 i2c_core snd_intel8x0m snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc orinoco hermes 8139too mii ext3 jbd May 3 10:41:46 seti kernel: Pid: 4786, comm: pam-panel-icon Tainted: P 2.6.11-1.14_FC3 May 3 10:41:46 seti kernel: RIP: 0010:[<0000000000000000>] [<0000000000000000>] May 3 10:41:46 seti kernel: RSP: 0000:ffff8100412d1ef0 EFLAGS: 00010282 May 3 10:41:46 seti kernel: RAX: ffffffff804aa1a0 RBX: 0000000000000145 RCX: 00000000c0000100 May 3 10:41:46 seti kernel: RDX: 0000000000000000 RSI: ffff81004121a440 RDI: ffff8100412f90c0 May 3 10:41:46 seti kernel: RBP: ffff8100412f90c0 R08: ffff8100412d0000 R09: 00000000001e709b May 3 10:41:46 seti kernel: R10: 000000004277b7da R11: 0000000000000000 R12: ffff81002c45da4c May 3 10:41:46 seti kernel: R13: 0000000000000000 R14: ffff81002c45da40 R15: 0000000000000003 May 3 10:41:46 seti kernel: FS: 00002aaaaaad4e80(0000) GS:ffffffff80550700 (0000) knlGS:000000005b50dbb0 May 3 10:41:46 seti kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 3 10:41:46 seti kernel: CR2: 0000000000000000 CR3: 0000000041319000 CR4: 00000000000006e0 May 3 10:41:46 seti kernel: Process pam-panel-icon (pid: 4786, threadinfo ffff8100412d0000, task ffff8100477397e0) May 3 10:41:47 seti kernel: Stack: ffffffff801b2549 ffff81002c45da4c 7fffffffffffffff ffff81002c45da40 May 3 10:41:47 seti kernel: 0000000000544b70 00000000412f90c0 0000000000000000 ffffffff801b16b0 May 3 10:41:47 seti kernel: ffff81002a8f3000 00007fff00000000 May 3 10:41:47 seti kernel: Call Trace:<ffffffff801b2549>{sys_poll+489} <ffffffff801b16b0>{__pollwait+0} May 3 10:41:47 seti kernel: <ffffffff801b0ffa>{sys_ioctl+106} <ffffffff8010ec0a>{system_call+126} May 3 10:41:47 seti kernel: May 3 10:41:47 seti kernel: May 3 10:41:47 seti kernel: Code: Bad RIP value. May 3 10:41:47 seti kernel: RIP [<0000000000000000>] RSP <ffff8100412d1ef0> May 3 10:41:47 seti kernel: CR2: 0000000000000000 Another one while running unixbench (4.1.0) on kernel-2.6.11-1.27_FC3: May 25 17:06:01 seti kernel: Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP: May 25 17:06:01 seti kernel: <ffffffff80320983>{sock_poll+19} May 25 17:06:01 seti kernel: PGD 422e2067 PUD 422dc067 PMD 0 May 25 17:06:01 seti kernel: Oops: 0000 [1] May 25 17:06:01 seti kernel: CPU 0 May 25 17:06:01 seti kernel: Modules linked in: nls_utf8 lp autofs4 vmnet(U) parport_pc parport vmmon(U) pcmcia ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod video button battery ac nvidia(U) md5 ipv6 cdc_acm ohci1394 ieee1394 yenta_socket rsrc_nonstatic pcmcia_core ohci_hcd ehci_hcd i2c_nforce2 i2c_core snd_intel8x0m snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc orinoco hermes 8139too mii ext3 jbd May 25 17:06:01 seti kernel: Pid: 5032, comm: X Tainted: P 2.6.11-1.27_FC3 May 25 17:06:01 seti kernel: RIP: 0010:[<ffffffff80320983>] <ffffffff80320983>{sock_poll+19} May 25 17:06:01 seti kernel: RSP: 0000:ffff810042253de0 EFLAGS: 00010246 May 25 17:06:01 seti kernel: RAX: 0000000000000000 RBX: ffff810042b4f680 RCX: 0000000000000000 May 25 17:06:01 seti kernel: RDX: 0000000000000000 RSI: ffff810041a70918 RDI: ffff810042b4f680 Comment #2: Ooopses while using the nvidia binary drivers can't be investigated. Can you reproduce the problem with the binary drivers? (the orignial oops from Comment #0 was not tainted though) Same goes for if you are using vmware binary modules... The original post is not using the nvidia binary video drivers if that's what you mean. It does have an nvidia based graphics card, but it is running the xfree86 provided drivers. It is also an nvidia nforce 3 based motherboard (Gigabyte brand) if you mean that too..just not sure what your reference is to. I've pulled the binary-only nvidia and vmware kernel modules and will try to reproduce. I don't get oopses every day, so it may take a few days. Is there anything else I can to to gather more info if I have another occurrence? Well, that didn't take long . . . May 26 10:01:58 seti kernel: <1>Unable to handle kernel NULL pointer dereference at 0000000000000040 RIP: May 26 10:01:58 seti kernel: <ffffffff80320983>{sock_poll+19} May 26 10:01:58 seti kernel: PGD 452b6067 PUD 452b0067 PMD 44e76067 PTE 0 May 26 10:01:58 seti kernel: Oops: 0000 [2] May 26 10:01:58 seti kernel: CPU 0 May 26 10:01:58 seti kernel: Modules linked in: parport_pc lp parport autofs4 pcmcia ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod video button battery ac md5 ipv6 ohci1394 ieee1394 yenta_socket rsrc_nonstatic pcmcia_core ohci_hcd ehci_hcd i2c_nforce2 i2c_core snd_intel8x0m snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc orinoco hermes 8139too mii ext3 jbd May 26 10:01:58 seti kernel: Pid: 4385, comm: X Not tainted 2.6.11-1.27_FC3 May 26 10:01:58 seti kernel: RIP: 0010:[<ffffffff80320983>] <ffffffff80320983>{sock_poll+19} May 26 10:01:58 seti kernel: RSP: 0018:ffff81004515bde0 EFLAGS: 00010246 May 26 10:01:58 seti kernel: RAX: 0000000000000000 RBX: ffff8100453125c0 RCX: 0000000000000000 May 26 10:01:58 seti kernel: RDX: 0000000000000000 RSI: ffff810044b90918 RDI: ffff8100453125c0 May 26 10:01:58 seti kernel: RBP: 0000000000000002 R08: ffff81004515a000 R09: 0000000000000000 May 26 10:01:58 seti kernel: R10: 0000000000000118 R11: 0000000000000002 R12: 0000000000000001 May 26 10:01:58 seti gconfd (molson-4514): Received signal 15, shutting down cleanly May 26 10:01:59 seti kernel: R13: 0000000000000001 R14: 0000000000000145 R15: 0000001fffffff8a May 26 10:02:01 seti gconfd (molson-4514): Exiting May 26 10:02:03 seti kernel: FS: 00002aaaaaacb3e0(0000) GS:ffffffff80543380 (0000) knlGS:0000000000000000 May 26 10:02:05 seti crond(pam_unix)[26982]: session opened for user molson by (uid=0) May 26 10:02:08 seti kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b May 26 10:02:11 seti kernel: CR2: 0000000000000040 CR3: 0000000044d17000 CR4: 00000000000006e0 May 26 10:02:15 seti crond(pam_unix)[26982]: session closed for user molson May 26 10:02:16 seti kernel: Process X (pid: 4385, threadinfo ffff81004515a000, task ffff810044e12030) May 26 10:02:19 seti kernel: Stack: ffffffff801b1b1b 0000000000000000 ffff81004515be88 0000000000000000 May 26 10:02:20 seti kernel: 0000000000040000 0000001fffffff8a 0000000000000000 0000000000000000 May 26 10:02:21 seti kernel: 0000000000000000 ffff8100481e4708 May 26 10:02:21 seti kernel: Call Trace:<ffffffff801b1b1b>{do_select+1307} <ffffffff801b1510>{__pollwait+0} May 26 10:02:22 seti kernel: <ffffffff801b201a>{sys_select+890} <ffffffff8010ec0a>{system_call+126} May 26 10:02:22 seti kernel: May 26 10:02:22 seti kernel: May 26 10:02:22 seti kernel: Code: 4c 8b 58 40 41 ff e3 66 66 90 66 66 90 48 8b 47 10 48 89 f2 May 26 10:02:22 seti kernel: RIP <ffffffff80320983>{sock_poll+19} RSP <ffff81004515bde0> May 26 10:02:22 seti kernel: CR2: 0000000000000040 This may be a hardware problem on my end. The occurrence of this roughly correlates to a memory upgrade (512M - 1280M) put into this machine. I updated the kernel from 2.6.9-1.681 at that time as well, so, I'm not sure which may be the cause. I'm going to replace the old memory and obtaining an RMA on the new memory. Until I do that, you may want to hold off on any further investigation. In the mean time I'll try and reproduce on 512M of memory. It's worth asking if this could be somehow related to have > 1024M of memory. I doubt it though as you would have seem more reports of problems. I'll follow up in a week or two after I get the replacement memory installed. Created attachment 115131 [details]
Original OOPS and RC2 message after new kernel tests
Created attachment 115132 [details]
General Fault after attempted reboots and before poweroff new kernel
I think in my case this was a hardware problem. I've been re-testing with a new memory module for the past week and have been unable to reproduce the error. Sorry for the false alarm. There has as yet been no fix or temporary workaround for the the original and subsequent posting. There does not appear to be any hardware problems on the original system. This version of linux will be removed shortly due to the continued problems and inability to use for any current need. If there are any diagnostic info needed before removal, please advise. I have this (very similar) problem after installing FC4. I have two installations on different partitions (32 & 64 bit for AMD) 32-bit kernel seems fine (although it has once hung on my machine at work). 64-bit randomly hangs, or reboots without notification text mode or X. I have an nVidia card. Booting rescue mode from install disk is the same. Always gets as far as login, but hang occurs as a result of one or more commands. No messages in syslog. When hung, system is completely dead. Hardware reset needed. An update has been released for Fedora Core 3 (kernel-2.6.12-1.1372_FC3) which may contain a fix for your problem. Please update to this new kernel, and report whether or not it fixes your problem. If you have updated to Fedora Core 4 since this bug was opened, and the problem still occurs with the latest updates for that release, please change the version field of this bug to 'fc4'. Thank you. This bug has been automatically closed as part of a mass update. It had been in NEEDINFO state since July 2005. If this bug still exists in current errata kernels, please reopen this bug. There are a large number of inactive bugs in the database, and this is the only way to purge them. Thank you. |