Bug 601638

Summary: Enabled PCIe ASPM technology causes systems with GeForce 8+ to freeze randomly
Product: [Fedora] Fedora Reporter: Benny <mail2benny>
Component: xorg-x11-drv-nouveauAssignee: Ben Skeggs <bskeggs>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 14CC: airlied, ajax, anton, bskeggs, dcantrell, dougsland, gansalmon, itamar, ivan.razuvaev, jfeeney, jonathan, kernel-maint
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-06-24 13:00:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Benny 2010-06-08 11:25:05 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Benny 2010-06-08 11:31:08 UTC
Sorry something went wrong while posting...

First i encountered bug 590437 bypassing this bug I installed my computer normally also it seemed. Whenever I work no matter what video driver I use the system freeze s up randomly. SSH still works when my computer is frozen but I cannot init 3 or reboot nor halt. In SSH i get loads of these messages: 

(using the nvidia driver)

Message from syslogd@host at Jun 8 13:13:56
kernel:Stack:
kernel:Call trace:
kernel: <IRQ>
kernel: <EOI>
kernel:Code: e8 89 c0 0f b7 04 42 0f b7 c0 c3 89 d1 b8 00 00 00 00 39 96 84 02
00 00 76 11 48 8b 96 c0 02 00 00 89 c8 c1 e8 02 89 c0 8b 04 82 <f3> c3 39 96 90 
02 00 00 76 0c 89 d2 48 8b 86 d8 02 00 00 88 0c

I tried using the vesa driver (nomodeset, vga=0x31B) the nouveau driver and the rpmfusion nvidia driver. Same results. 

If I should provide anymore info I'll be happy to.

Comment 2 Benny 2010-06-08 11:38:41 UTC
ABRT detected the freeze and I send the bugs to kerneloops (using ABRT). It calls the bug: a soft lockup of CPU #n stuck for 61s! It seems I'm not able to find the adress where the bug is available.

Comment 3 Benny 2010-06-09 13:12:35 UTC
BUG: soft lockup - CPU#2 stuck for 61s! [Xorg:2237]
Modules linked in: fuse cpufreq_ondemand acpi_cpufreq freq_table ipv6 saa7134_alsa mt352 saa7134_dvb videobuf_dvb dvb_core uinput mt20xx tea5767 tda9887 tda8290 tuner snd_hda_codec_realtek snd_hda_intel snd_hda_codec saa7134 ir_common snd_hwdep snd_seq snd_seq_device v4l2_common snd_pcm videodev v4l1_compat v4l2_compat_ioctl32 videobuf_dma_sg snd_timer videobuf_core 8139too ir_core snd 8139cp tveeprom soundcore nvidia(P) i2c_viapro i2c_core microcode shpchp mii snd_page_alloc pata_acpi ata_generic usb_storage pata_via sata_via [last unloaded: scsi_wait_scan]
CPU 2 
Pid: 2237, comm: Xorg Tainted: P           2.6.33.5-112.fc13.x86_64 #1 PT890-8237A/OEM
RIP: 0010:[<ffffffffa0472fff>]  [<ffffffffa0472fff>] _nv006601rm+0x20/0x22 [nvidia]
RSP: 0000:ffff880001f03c58  EFLAGS: 00003246
RAX: 00000000ffffffff RBX: ffff8800a7c0dc30 RCX: 0000000000000000
RDX: ffffc90016100000 RSI: ffff8800b01f0000 RDI: ffff8800375bc800
RBP: ffffffff8100a4d3 R08: ffff8800b6520000 R09: 0000000000000001
R10: 000000000000010e R11: 0000000000094d19 R12: ffff880001f03bd0
R13: ffff8800375bc800 R14: ffff8800b01f0000 R15: ffffffff8102022f
FS:  00007fa539460840(0000) GS:ffff880001f00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000002631e98 CR3: 00000000b03d5000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process Xorg (pid: 2237, threadinfo ffff8800b56be000, task ffff88009c0e9750)
Stack:
ffffffffa02f6136 0000000000000b68 000000000000016d ffff8800b01f0000
 ffff8800a2278b70 0000000000000200 ffffffffa02be687 ffff8800b56a5000
 ffff8800b01f0000 0000000000000200 ffff8800b02e3c00 ffff8800b0258800
Call Trace:

Comment 4 Benny 2010-06-09 13:25:58 UTC
Found same problem on nvnews.net: http://www.nvnews.net/vbulletin/showthread.php?t=149056

Comment 5 Ivan Razuvaev 2010-06-22 05:55:41 UTC
Try kernel option pcie_aspm=off. This fixes freezes for some people using GF8600 video cards.

Comment 6 Benny 2010-06-22 08:49:44 UTC
Ivan you are a miracle worker! The system passed several youtube tests and is stable now for at least half an hour, since the bug occurs randomly I could be lucky, but I got a good feeling about this one. Will report back later.

Comment 7 Benny 2010-06-22 21:43:32 UTC
pcie_aspm=off really did the trick!!! Many thanks. 
Changing the bug name since we know what causes the bug.

Comment 8 Ivan Razuvaev 2010-06-23 05:43:22 UTC
I think the bug should be named like "Enabled PCIe ASPM technology causes systems with PCIe GeForce 8+ freezes". There's several bugreports found by keyword "aspm" on Bugzilla. Do we need to start separate thread to put an accent on ASPM error?

Comment 9 Benny 2010-06-23 09:39:53 UTC
Ok, I changed the name.

Comment 10 Chuck Ebbert 2010-07-22 02:36:35 UTC
Please try 2.6.34.1-20 from koji which will disable ASPM if the motherboard does not support it.

Comment 11 Benny 2010-07-22 16:16:18 UTC
I was about to press save changes telling everything was fixed when my computer froze up after half an hour using 2.6.34.1-20.fc13.x86_64. without the pcie_aspm=off. 

Hence, this bug does not seem fixed with 2.6.34.1-20.fc13.x86_64.  

Another irritating thing using this kernel is that my external hard disks keep on swapping device name each boot (sdb1 becomes sdc1 and vice versa), but that is probably another bug if a bug at all...

Comment 12 Benny 2011-02-26 15:25:25 UTC
Remains a problem in fc14...

Comment 13 Chuck Ebbert 2011-02-28 17:16:47 UTC
Looks like nouveau should be disabling aspm on these adapters?

Comment 14 Benny 2011-06-10 15:09:08 UTC
I don't have this problem any more in Fedora 15, I have a new video-card, tough... So I don't know if the software is fixed, or the hardware is more compatible now... 

Old video card: nVidia GF8500 GT
New video card: nVidia GT218