Bug 445458 - Hard Freezes
Hard Freezes
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: xorg-x11-drv-nv (Show other bugs)
9
All Linux
low Severity low
: ---
: ---
Assigned To: Adam Jackson
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2008-05-06 18:33 EDT by Andy Lawrence
Modified: 2008-10-11 11:55 EDT (History)
1 user (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-10-11 11:55:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
lspci -vvv (24.29 KB, text/plain)
2008-05-06 18:33 EDT, Andy Lawrence
no flags Details
X log (55.54 KB, text/plain)
2008-05-06 18:34 EDT, Andy Lawrence
no flags Details
dmesg output from Dell Precision 380 (26.38 KB, text/plain)
2008-07-14 12:59 EDT, Daniel Qarras
no flags Details
lspci -vvv from Dell Precision 380 (21.72 KB, text/plain)
2008-07-14 13:00 EDT, Daniel Qarras
no flags Details
Xorg.log with nv driver from Dell Precision 380 before freeze (24.09 KB, text/plain)
2008-07-14 13:01 EDT, Daniel Qarras
no flags Details
Xorg.log with nouveau driver from Dell Precision 380 before freeze (80.90 KB, text/plain)
2008-07-15 11:59 EDT, Daniel Qarras
no flags Details

  None (edit)
Description Andy Lawrence 2008-05-06 18:33:47 EDT
Currently running Fedora 9 Release to date.  Doesn't happen very
often, maybe once every 3-4 days.  Twice I've caught it at idle with
the screen saver (picture viewer) frozen and twice while browsing in
Gnome.

Not really noticing anything weird in the logs?

Attached is lspci -vvv and /var/log/Xorg.0.log
Comment 1 Andy Lawrence 2008-05-06 18:33:48 EDT
Created attachment 304692 [details]
lspci -vvv
Comment 2 Andy Lawrence 2008-05-06 18:34:16 EDT
Created attachment 304693 [details]
X log
Comment 3 Andy Lawrence 2008-05-08 18:57:08 EDT
I have removed the vboxdrv kernel module and it has still locked up.  So that
wasn't it!  Trying this was suggested on the fedora-devel mailing list.
Comment 4 Andy Lawrence 2008-05-08 20:21:49 EDT
Ok, now trying with the vboxdrv module disabled at boot, as suggested by the
mailing list.
Comment 5 Andy Lawrence 2008-05-09 06:36:33 EDT
Ok, that wasn't it either!

Log files don't seem to be showing me anything!  Here is messages about that time:

May  8 21:13:06 localhost dhclient: bound to 192.168.2.37 -- renewal in 1670
seconds.
May  8 21:40:56 localhost dhclient: DHCPREQUEST on eth1 to 192.168.2.10 port 67
May  8 21:40:56 localhost dhclient: DHCPACK from 192.168.2.10
May  8 21:40:56 localhost dhclient: bound to 192.168.2.37 -- renewal in 1577
seconds.
May  8 22:07:13 localhost dhclient: DHCPREQUEST on eth1 to 192.168.2.10 port 67
May  8 22:07:13 localhost dhclient: DHCPACK from 192.168.2.10
May  8 22:07:13 localhost dhclient: bound to 192.168.2.37 -- renewal in 1684
seconds.
May  8 22:35:17 localhost dhclient: DHCPREQUEST on eth1 to 192.168.2.10 port 67
May  8 22:35:17 localhost dhclient: DHCPACK from 192.168.2.10
May  8 22:35:17 localhost dhclient: bound to 192.168.2.37 -- renewal in 1547
seconds.
May  8 23:01:04 localhost dhclient: DHCPREQUEST on eth1 to 192.168.2.10 port 67
May  8 23:01:04 localhost dhclient: DHCPACK from 192.168.2.10
May  8 23:01:04 localhost dhclient: bound to 192.168.2.37 -- renewal in 1541
seconds.
May  8 23:08:28 localhost ntpd[1889]: synchronized to 209.193.103.214, stratum 2
May  8 23:26:45 localhost dhclient: DHCPREQUEST on eth1 to 192.168.2.10 port 67
May  8 23:26:45 localhost dhclient: DHCPACK from 192.168.2.10
May  8 23:26:45 localhost dhclient: bound to 192.168.2.37 -- renewal in 1398
seconds.
May  9 06:26:29 localhost kernel: imklog 3.14.1, log source = /proc/kmsg started.
Comment 6 Andy Lawrence 2008-05-13 16:41:54 EDT
Finally caught something useful!!!  Happened again earlier today!

May 13 11:50:44 localhost dhclient: bound to 192.168.2.37 -- renewal in 1574
seconds.
May 13 12:11:50 localhost kernel: slideshow[16311]: segfault at 14 ip 3c69238d5f
sp 7fff73f4f31f error 6 in libX11.so.6.2.0[3c69200000+106000]
May 13 12:15:07 localhost kernel: slideshow[16576]: segfault at 14 ip 3c69238d5f
sp 7fff680a647f error 6 in libX11.so.6.2.0[3c69200000+106000]
May 13 12:16:58 localhost dhclient: DHCPREQUEST on eth1 to 192.168.2.10 port 67
May 13 12:16:58 localhost dhclient: DHCPACK from 192.168.2.10
May 13 12:16:58 localhost dhclient: bound to 192.168.2.37 -- renewal in 1544
seconds.
May 13 12:25:07 localhost kernel: slideshow[16842]: segfault at 14 ip 3c69238d5f
sp 7fff52abbe8f error 6 in libX11.so.6.2.0[3c69200000+106000]
May 13 12:35:07 localhost kernel: slideshow[17107]: segfault at 14 ip 3c69238d5f
sp 7fff907edbcf error 6 in libX11.so.6.2.0[3c69200000+106000]
May 13 14:11:42 localhost kernel: imklog 3.14.1, log source = /proc/kmsg started.

Andy
Comment 7 Bug Zapper 2008-05-14 06:44:29 EDT
Changing version to '9' as part of upcoming Fedora 9 GA.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 8 Andy Lawrence 2008-05-21 18:23:33 EDT
Well, 24 hours of memtest86 show no errors (27 passes), system is still hosed! 
Is there any other tests to prove this isn't a hardware failure?  I do see
amarok in messages but i don't think its been running on all lockup occasion.

Please Help!

messages:

May 21 17:53:47 localhost kernel: BUG: unable to handle kernel paging request at
00000000001200d2
May 21 17:53:47 localhost kernel: IP: [__d_lookup+246/285] __d_lookup+0xf6/0x11d
May 21 17:53:47 localhost kernel: PGD 2e08c067 PUD 10e6b067 PMD 2b4b5067 PTE 0
May 21 17:53:47 localhost kernel: Oops: 0000 [21] SMP 
May 21 17:53:47 localhost kernel: CPU 0 
May 21 17:53:47 localhost kernel: Modules linked in: nfs lockd nfs_acl coretemp
w83627ehf hwmon_vid hwmon fuse sunrpc ipt_REJECT xt_tcpudp nf_conntrack_ipv4
xt_state nf_conntrack iptable_filter ip_tables x_tables loop dm_multipath ipv6
snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq
snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer firewire_ohci
snd_page_alloc snd_hwdep firewire_core r8169 snd pata_jmicron crc_itu_t pl2303
pata_acpi i2c_i801 usb_storage iTCO_wdt pcspkr usbserial bay button i2c_core
ata_generic soundcore iTCO_vendor_support sg floppy dm_snapshot dm_zero
dm_mirror dm_mod ahci libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd
ehci_hcd [last unloaded: microcode]
May 21 17:53:47 localhost kernel: Pid: 12207, comm: amarokcollectio Tainted: G 
    D  2.6.25.3-18.fc9.x86_64 #1
May 21 17:53:47 localhost kernel: RIP: 0010:[__d_lookup+246/285] 
[__d_lookup+246/285] __d_lookup+0xf6/0x11d
May 21 17:53:47 localhost kernel: RSP: 0018:ffff810010e4bb08  EFLAGS: 00010206
May 21 17:53:47 localhost kernel: RAX: 00000000001200d2 RBX: ffff810013db0ea0
RCX: 0000000000000011
May 21 17:53:47 localhost kernel: RDX: 000000000001c256 RSI: ffff810010e4bbf8
RDI: ffff81003f443ea0
May 21 17:53:47 localhost kernel: RBP: ffff810010e4bb58 R08: 0000000000000000
R09: 0000000000000000
May 21 17:53:47 localhost kernel: R10: 0000000000000003 R11: ffff81002508f390
R12: 00000000001200d2
May 21 17:53:47 localhost kernel: R13: ffff81003f443ea0 R14: ffff810010e4bbf8
R15: 0000000099fa0098
May 21 17:53:47 localhost kernel: FS:  0000000000000000(0000)
GS:ffffffff813f6000(0000) knlGS:0000000000000000
May 21 17:53:47 localhost kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 21 17:53:47 localhost kernel: CR2: 00000000001200d2 CR3: 0000000010e69000
CR4: 00000000000006e0
May 21 17:53:47 localhost kernel: DR0: 0000000000000000 DR1: 0000000000000000
DR2: 0000000000000000
May 21 17:53:47 localhost kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0
DR7: 0000000000000400
May 21 17:53:47 localhost kernel: Process amarokcollectio (pid: 12207,
threadinfo ffff810010e4a000, task ffff81000f34a000)
May 21 17:53:47 localhost kernel: Stack:  000000000000000f 0000000f3f5441ed
ffff810026d50900 ffffffff8104184a
May 21 17:53:47 localhost kernel:  00000000001200d2 0000000000000000
ffff810010e4be88 ffff81003f54e3c8
May 21 17:53:47 localhost kernel:  ffff810026d50900 ffff810010e4be88
ffff810010e4bba8 ffffffff810ac3fc
May 21 17:53:47 localhost kernel: Call Trace:
May 21 17:53:47 localhost kernel:  [in_group_p+42/44] ? in_group_p+0x2a/0x2c
May 21 17:53:47 localhost kernel:  [do_lookup+44/424] do_lookup+0x2c/0x1a8
May 21 17:53:47 localhost kernel:  [__link_path_walk+2481/3731]
__link_path_walk+0x9b1/0xe93
May 21 17:53:47 localhost kernel:  [touch_atime+131/274] ? touch_atime+0x83/0x112
May 21 17:53:47 localhost kernel:  [__link_path_walk+3133/3731]
__link_path_walk+0xc3d/0xe93
May 21 17:53:47 localhost kernel:  [path_walk+97/195] path_walk+0x61/0xc3
May 21 17:53:47 localhost kernel:  [do_path_lookup+470/561]
do_path_lookup+0x1d6/0x231
May 21 17:53:47 localhost kernel:  [__path_lookup_intent_open+92/159]
__path_lookup_intent_open+0x5c/0x9f
May 21 17:53:47 localhost kernel:  [path_lookup_open+12/14] path_lookup_open+0xc/0xe
May 21 17:53:47 localhost kernel:  [open_namei+118/1696] open_namei+0x76/0x6a0
May 21 17:53:47 localhost kernel:  [do_filp_open+40/75] do_filp_open+0x28/0x4b
May 21 17:53:47 localhost kernel:  [__strncpy_from_user+44/83] ?
__strncpy_from_user+0x2c/0x53
May 21 17:53:47 localhost kernel:  [get_unused_fd_flags+139/287] ?
get_unused_fd_flags+0x8b/0x11f
May 21 17:53:47 localhost kernel:  [do_sys_open+81/210] do_sys_open+0x51/0xd2
May 21 17:53:47 localhost kernel:  [sys_open+27/29] sys_open+0x1b/0x1d
May 21 17:53:47 localhost kernel:  [tracesys+213/218] tracesys+0xd5/0xda
May 21 17:53:47 localhost kernel: 
May 21 17:53:47 localhost kernel: 
May 21 17:53:47 localhost kernel: Code: 04 10 75 06 f0 ff 03 48 89 d8 fe 43 08
eb 34 41 fe 44 24 f0 48 8b 45 d0 48 8b 00 48 89 45 d0 48 8b 45 d0 48 85 c0 74 19
49 89 c4 <48> 8b 00 49 8d 5c 24 e8 44 39 7b 30 0f 18 08 75 d8 e9 65 ff ff 
May 21 17:53:47 localhost kernel: RIP  [__d_lookup+246/285] __d_lookup+0xf6/0x11d
May 21 17:53:47 localhost kernel:  RSP <ffff810010e4bb08>
May 21 17:53:47 localhost kernel: CR2: 00000000001200d2
May 21 17:53:47 localhost kernel: ---[ end trace c2ff9cce82980cc9 ]---
May 21 17:53:49 localhost kerneloops: Submitted 1 kernel oopses to
www.kerneloops.org


Dmesg:

BUG: unable to handle kernel paging request at 00000000001200d2
IP: [<ffffffff810b6273>] __d_lookup+0xf6/0x11d
PGD 2e088067 PUD 10f70067 PMD 2dd90067 PTE 0
Oops: 0000 [18] SMP 
CPU 0 
Modules linked in: nfs lockd nfs_acl coretemp w83627ehf hwmon_vid hwmon fuse
sunrpc ipt_REJECT xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack
iptable_filter ip_tables x_tables loop dm_multipath ipv6 snd_hda_intel
snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer firewire_ohci snd_page_alloc snd_hwdep
firewire_core r8169 snd pata_jmicron crc_itu_t pl2303 pata_acpi i2c_i801
usb_storage iTCO_wdt pcspkr usbserial bay button i2c_core ata_generic soundcore
iTCO_vendor_support sg floppy dm_snapshot dm_zero dm_mirror dm_mod ahci libata
sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd [last unloaded:
microcode]
Pid: 12190, comm: amarokcollectio Tainted: G      D  2.6.25.3-18.fc9.x86_64 #1
RIP: 0010:[<ffffffff810b6273>]  [<ffffffff810b6273>] __d_lookup+0xf6/0x11d
RSP: 0018:ffff810010f5fb08  EFLAGS: 00010206
RAX: 00000000001200d2 RBX: ffff810013db0ea0 RCX: 0000000000000011
RDX: 000000000001c256 RSI: ffff810010f5fbf8 RDI: ffff81003f443ea0
RBP: ffff810010f5fb58 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000003 R11: ffff81002508f390 R12: 00000000001200d2
R13: ffff81003f443ea0 R14: ffff810010f5fbf8 R15: 0000000099fa0098
FS:  0000000000000000(0000) GS:ffffffff813f6000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000001200d2 CR3: 0000000010e45000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process amarokcollectio (pid: 12190, threadinfo ffff810010f5e000, task
ffff810010e26000)
Stack:  000000000000000f 0000000f3f5441ed ffff810026d50900 ffffffff8104184a
 00000000001200d2 0000000000000000 ffff810010f5fe88 ffff81003f54e3c8
 ffff810026d50900 ffff810010f5fe88 ffff810010f5fba8 ffffffff810ac3fc
Call Trace:
 [<ffffffff8104184a>] ? in_group_p+0x2a/0x2c
 [<ffffffff810ac3fc>] do_lookup+0x2c/0x1a8
 [<ffffffff810ae699>] __link_path_walk+0x9b1/0xe93
 [<ffffffff810b76f8>] ? touch_atime+0x83/0x112
 [<ffffffff810ae925>] __link_path_walk+0xc3d/0xe93
 [<ffffffff810aebdc>] path_walk+0x61/0xc3
 [<ffffffff810aefda>] do_path_lookup+0x1d6/0x231
 [<ffffffff810af2e9>] __path_lookup_intent_open+0x5c/0x9f
 [<ffffffff810af338>] path_lookup_open+0xc/0xe
 [<ffffffff810afe06>] open_namei+0x76/0x6a0
 [<ffffffff810a3787>] do_filp_open+0x28/0x4b
 [<ffffffff81132b05>] ? __strncpy_from_user+0x2c/0x53
 [<ffffffff810a3430>] ? get_unused_fd_flags+0x8b/0x11f
 [<ffffffff810a37fb>] do_sys_open+0x51/0xd2
 [<ffffffff810a38a5>] sys_open+0x1b/0x1d
 [<ffffffff8100c052>] tracesys+0xd5/0xda


Code: 04 10 75 06 f0 ff 03 48 89 d8 fe 43 08 eb 34 41 fe 44 24 f0 48 8b 45 d0 48
8b 00 48 89 45 d0 48 8b 45 d0 48 85 c0 74 19 49 89 c4 <48> 8b 00 49 8d 5c 24 e8
44 39 7b 30 0f 18 08 75 d8 e9 65 ff ff 
RIP  [<ffffffff810b6273>] __d_lookup+0xf6/0x11d
 RSP <ffff810010f5fb08>
CR2: 00000000001200d2
---[ end trace c2ff9cce82980cc9 ]---
Comment 9 Daniel Qarras 2008-07-11 10:01:17 EDT
I see these same freezing problems every day on a Dell Precision WorkStation 380
which identifies its card as:

nVidia Corporation NV43GL [Quadro FX 540] rev 162

Public Smolt profile is at:

http://www.smolts.org/client/show_all/pub_d27d5dc8-5b71-4334-9d90-6a774979d681

If I use the NVidia binary-only driver, then my machine works all ok but I'd
like to stick with the "nv" driver. But that at least provides me the evidence
that the nv driver is faulty, nothing else.

Situation for me is that nv driver causes the display to hang almost hourly
basis! After such a hang I can log in with ssh, everything seems to be normal
then, but diplay has freezed. I can even press keys and notice that they still
"work" (e.g., if a hang happens while on terminal, giving blindly commands like
"sleep 100" can be observed from an ssh connection). I've tried some kernel
parameters like pci=nommconf but nothing seems to help. Based on Google results
my card variant seems to be pretty less used, perhaps it has received less
testing than other variants?

I am willing to test any configuration options or patches that you might
suggest. Thanks!

PS. Please consider rising this bug's priority.
Comment 10 Daniel Qarras 2008-07-14 12:59:12 EDT
I am seeing same kinds of freezes also with the nouveau driver so only the
binary driver works. I'll attach dmesg/lspci/Xorg.log from my system.
dmesg/lspci are of course almost identical with both drivers, Xorg.log is from
system running with the nv driver. I'll try to see next time if there are errors
when the system freezes again.
Comment 11 Daniel Qarras 2008-07-14 12:59:50 EDT
Created attachment 311734 [details]
dmesg output from Dell Precision 380
Comment 12 Daniel Qarras 2008-07-14 13:00:26 EDT
Created attachment 311735 [details]
lspci -vvv from Dell Precision 380
Comment 13 Daniel Qarras 2008-07-14 13:01:10 EDT
Created attachment 311736 [details]
Xorg.log with nv driver from Dell Precision 380 before freeze
Comment 14 Daniel Qarras 2008-07-15 11:58:02 EDT
Actually with nouveau the system hangs even worse, it answers to ping but won't
let in with ssh. And with nv I can still blindly enter commands but with nouveau
that seems not to be the case. Also worth mentioning is that if I log into
machine and kill Xorg etc the display still shows my jammed X session.

I'll attach Xorg log when used with nouveau. It also generated an oops once, it
is available at:

http://www.kerneloops.org/submitresult.php?number=39343

As mentioned, I've tried using kernel parameters like pci=nommconf to no avail,
other suggestions would be welcome. I am using the latest BIOS available from Dell.

Lastly, I think I've been using Firefox everytime these hangs have happened and
then in dmesg I see something like:

npviewer.bin[3260]: segfault at 74646977 eip 0084b85f esp b6c615c0 error 4
npviewer.bin[3383]: segfault at 68632d6e eip 0084b85f esp b6c785a0 error 4
npviewer.bin[3476]: segfault at 67696568 eip 0084b85f esp b6c20540 error 4
npviewer.bin[3507]: segfault at 73776f72 eip 0084b85f esp 0816acb0 error 4
npviewer.bin[3566]: segfault at 73746962 eip 0084b85f esp 094f0d60 error 4
npviewer.bin[3661]: segfault at 68632d6e eip 0084b85f esp b6cc0500 error 4

PS. My hardware is 64-bit, I've tried both 32-bit and 64-bit installations, the
results are the same, the hangs happen several times a day.
Comment 15 Daniel Qarras 2008-07-15 11:59:00 EDT
Created attachment 311851 [details]
Xorg.log with nouveau driver from Dell Precision 380 before freeze
Comment 16 Daniel Qarras 2008-07-15 12:01:02 EDT
Hmm, reading one more time the oopses from original reporter I'm not sure is the
same issue after all. Perhaps I'll open a new bug for my case unless I hear
something else. In any case I've provided all the information I can without some
guidance to enable more verbose debugging or something like that.

Thanks!
Comment 17 Daniel Qarras 2008-08-18 11:25:53 EDT
After comparing once again the original report and my own finding it seems to me that these cases are two different problems after all. I'll open a new bug for my problem, feel free to ignore my comments in this bug.
Comment 18 Daniel Qarras 2008-08-18 11:46:04 EDT
FWIW, the new bug for my problem is Bug 459405.
Comment 19 Andy Lawrence 2008-10-11 11:55:03 EDT
Thought I posted this earlier, my bad!  This was caused by a defective stick of memory on my machine.  One that just happen to fry booting F9 for the first time!  

Closing as not a bug.

Andy

Note You need to log in before you can comment on or make changes to this bug.