Bug 602956

Summary: large pages in firefox locks up X
Product: [Fedora] Fedora Reporter: Pierre Ossman <pierre-bugzilla>
Component: xorg-x11-drv-nouveauAssignee: Ben Skeggs <bskeggs>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 13CC: airlied, ajax, bskeggs, chemobejk, mcepl
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: kernel-2.6.34.6-47.fc13 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-08-30 18:23:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
xorg.conf
none
Xorg.0.log
none
dmesg
none
messages none

Description Pierre Ossman 2010-06-11 06:25:47 UTC
Since the upgrade to FC13, I've been getting hangs now and then. I finally correlated it to firefox opening very large pages so it seems like there is some buffer check missing or similar. I never had this problem on FC12.

At first it seems like only Xorg is affected. Mouse pointer still works and machine is reachable over the network. Xorg is eating 100% CPU and killing it (SIGKILL is needed) results in the entire machine hanging.

The nouveau kernel module throws out these things whenever the machine gets hosed:

Jun 11 08:12:41 mjolnir kernel: [drm] nouveau 0000:01:00.0: PFIFO_DMA_PUSHER - Ch 2

Hardware is a Quadro NVS 140M.

Please have a look at this soon as this makes the machine very difficult to use as it might hang on you at any moment. :/

Comment 1 Matěj Cepl 2010-06-11 13:28:37 UTC
Thanks for the bug report.  We have reviewed the information you have provided above, and there is some additional information we require that will be helpful in our diagnosis of this issue.

Please add drm.debug=0x04 to the kernel command line, restart computer, wait until the Xorg freezes, and collects the following via ssh

* your X server config file (/etc/X11/xorg.conf, if available),
* X server log file (/var/log/Xorg.*.log)
* output of the dmesg command, and
* system log (/var/log/messages)

and attach to the bug report as individual uncompressed file attachments using the bugzilla file attachment link above.

We will review this issue again once you've had a chance to attach this information.

Thanks in advance.

Comment 2 Pierre Ossman 2010-06-11 17:17:55 UTC
Created attachment 423342 [details]
xorg.conf

Comment 3 Pierre Ossman 2010-06-11 17:18:31 UTC
Created attachment 423343 [details]
Xorg.0.log

Comment 4 Pierre Ossman 2010-06-11 17:18:49 UTC
Created attachment 423345 [details]
dmesg

Comment 5 Pierre Ossman 2010-06-11 17:19:11 UTC
Created attachment 423346 [details]
messages

Comment 6 Matěj Cepl 2010-06-14 16:44:59 UTC
Backtrace:
[   214.084] 0: /usr/bin/Xorg (xorg_backtrace+0x28) [0x49e708]
[   214.084] 1: /usr/bin/Xorg (mieqEnqueue+0x1f4) [0x49e0b4]
[   214.084] 2: /usr/bin/Xorg (xf86PostButtonEventP+0xcf) [0x477a3f]
[   214.085] 3: /usr/bin/Xorg (xf86PostButtonEvent+0xbe) [0x477b6e]
[   214.085] 4: /usr/lib64/xorg/modules/input/synaptics_drv.so (0x7ff04ec47000+0x3a12) [0x7ff04ec4aa12]
[   214.085] 5: /usr/lib64/xorg/modules/input/synaptics_drv.so (0x7ff04ec47000+0x5cc8) [0x7ff04ec4ccc8]
[   214.085] 6: /usr/bin/Xorg (0x400000+0x6aae7) [0x46aae7]
[   214.085] 7: /usr/bin/Xorg (0x400000+0x117b43) [0x517b43]
[   214.085] 8: /lib64/libc.so.6 (0x326d200000+0x32a20) [0x326d232a20]
[   214.085] 9: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7ff04f07c000+0x1f458) [0x7ff04f09b458]
[   214.085] 10: /usr/lib64/xorg/modules/drivers/nouveau_drv.so (0x7ff04f07c000+0x2093d) [0x7ff04f09c93d]
[   214.085] 11: /usr/lib64/xorg/modules/libexa.so (0x7ff04e5e7000+0x9080) [0x7ff04e5f0080]
[   214.085] 12: /usr/lib64/xorg/modules/libexa.so (0x7ff04e5e7000+0xeac8) [0x7ff04e5f5ac8]
[   214.085] 13: /usr/bin/Xorg (0x400000+0xd21e0) [0x4d21e0]
[   214.085] 14: /usr/bin/Xorg (0x400000+0xcb91e) [0x4cb91e]
[   214.085] 15: /usr/bin/Xorg (0x400000+0x2c32c) [0x42c32c]
[   214.085] 16: /usr/bin/Xorg (0x400000+0x219ca) [0x4219ca]
[   214.085] 17: /lib64/libc.so.6 (__libc_start_main+0xfd) [0x326d21ec5d]
[   214.085] 18: /usr/bin/Xorg (0x400000+0x21579) [0x421579]

Comment 7 Pierre Ossman 2010-08-05 12:34:21 UTC
Anyone had a chance to look at this? It seems to be less frequent with current updates, but it still happens now and then.

Comment 8 Ben Skeggs 2010-08-05 12:40:39 UTC
Ah, the cause is known and a patch available.  I'll fix it in F13 in the morning.

Comment 9 Ben Skeggs 2010-08-06 01:17:03 UTC
Can you give this build a try please and see how you go: http://koji.fedoraproject.org/koji/taskinfo?taskID=2383661

Comment 10 Stefan Becker 2010-08-06 11:48:50 UTC
This looks suspiciously like bug #609764 or bug #566987.

I'm running the new kernel on my Dell T3500 where the older kernels would freeze up with 10 minutes. Works fine up to now, keeping my fingers crossed...

Comment 11 Fedora Update System 2010-08-07 05:00:27 UTC
kernel-2.6.34.2-34.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/kernel-2.6.34.2-34.fc13

Comment 12 Fedora Update System 2010-08-07 23:28:22 UTC
kernel-2.6.34.2-34.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/kernel-2.6.34.2-34.fc13

Comment 13 Ben Skeggs 2010-08-08 22:51:29 UTC
(In reply to comment #10)
> This looks suspiciously like bug #609764 or bug #566987.
> 
> I'm running the new kernel on my Dell T3500 where the older kernels would
> freeze up with 10 minutes. Works fine up to now, keeping my fingers crossed...    

This particular issue should only occur if something's chewed up all your VRAM, if that's what's happening for you, then it's definitely a possible candidate.

Comment 14 Ben Skeggs 2010-08-09 04:47:10 UTC
(In reply to comment #13)
> (In reply to comment #10)
> > This looks suspiciously like bug #609764 or bug #566987.
> > 
> > I'm running the new kernel on my Dell T3500 where the older kernels would
> > freeze up with 10 minutes. Works fine up to now, keeping my fingers crossed...    
> 
> This particular issue should only occur if something's chewed up all your VRAM,
> if that's what's happening for you, then it's definitely a possible candidate.    

I just tested with KDE.  It actually appears that KDE does indeed use a massive amount of VRAM compared to Gnome, so it's entirely possible that you're hitting this bug if you're using KDE.

Comment 15 Stefan Becker 2010-08-09 05:36:55 UTC
(In reply to comment #14)
> 
> I just tested with KDE.  It actually appears that KDE does indeed use a massive
> amount of VRAM compared to Gnome, so it's entirely possible that you're hitting
> this bug if you're using KDE.    

I didn't check on the desktop if the corrupted KDE icons bug #591570 is fixed with this kernel. It is still running on my laptop so I'll check there.

While this new kernel didn't fix the lockup bug #609764, the machine behaved differently. Before there was no chance to getting the card back to work again without a hard reboot. With this kernel the framebuffer console came back at X server reset and the machine shut down normally.

I'm using KDE4 with kwin4 running XRender composite. OpenGL from the nouveau mesa driver is still too incomplete, so kwin4 rejects it. As far as I can tell all my lockups happened when some popup window appeared (window menus, menus in task bar).

Comment 16 Pierre Ossman 2010-08-09 21:28:31 UTC
I've installed kernel-2.6.34.2-34.fc13 and it seems to solve at least one test case I managed to produce.

Comment 17 Fedora Update System 2010-08-10 23:53:27 UTC
kernel-2.6.34.3-37.fc13 has been submitted as an update for Fedora 13.
http://admin.fedoraproject.org/updates/kernel-2.6.34.3-37.fc13

Comment 18 Fedora Update System 2010-08-11 07:25:48 UTC
kernel-2.6.34.3-37.fc13 has been pushed to the Fedora 13 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/kernel-2.6.34.3-37.fc13

Comment 19 Pierre Ossman 2010-08-11 15:32:47 UTC
Seems like there is some issue remaining. I got a hang today again. Nothing in Xorg.0.log.old, but I got this in messages:

Aug 11 17:05:18 mjolnir kernel: [drm] nouveau 0000:01:00.0: PGRAPH_TRAP - Ch 2/5 Class 0x8297 Mthd 0x15e0 Data 0x00000000:0x00000000
Aug 11 17:05:18 mjolnir kernel: [drm] nouveau 0000:01:00.0: PGRAPH_TRAP_TPDMA - no VM fault?
Aug 11 17:05:18 mjolnir kernel: [drm] nouveau 0000:01:00.0: PGRAPH_TRAP_TPDMA - TP0: Unhandled ustatus 0x00000008
Aug 11 17:05:18 mjolnir kernel: [drm] nouveau 0000:01:00.0: PFIFO_DMA_PUSHER - Ch 2
Aug 11 17:05:18 mjolnir kernel: [drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - Ch 2/5 Class 0x8297 Mthd 0x0fa4 Data 0x00000000:0x0008ae04
Aug 11 17:05:18 mjolnir kernel: [drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - INVALID_BITFIELD
Aug 11 17:05:18 mjolnir kernel: [drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - Ch 2/5 Class 0x8297 Mthd 0x0fa8 Data 0x00000000:0x0151014d
Aug 11 17:05:18 mjolnir kernel: [drm] nouveau 0000:01:00.0: PGRAPH_DATA_ERROR - INVALID_VALUE

Comment 20 Fedora Update System 2010-08-27 11:23:19 UTC
kernel-2.6.34.6-47.fc13 has been submitted as an update for Fedora 13.
https://admin.fedoraproject.org/updates/kernel-2.6.34.6-47.fc13

Comment 21 Stefan Becker 2010-08-27 14:28:39 UTC
(In reply to comment #19)
> Seems like there is some issue remaining. I got a hang today again. Nothing in
> Xorg.0.log.old, but I got this in messages:
> 
> Aug 11 17:05:18 mjolnir kernel: [drm] nouveau 0000:01:00.0: PGRAPH_TRAP - Ch
> 2/5 Class 0x8297 Mthd 0x15e0 Data 0x00000000:0x00000000

Your problem is probably fixed and now you see bug #566987 or bug #609764.

IMvvvHO 2.6.34 is a dud (radeon suspend & hibernate broken, worse powertop results than in .33 r .35) and F13 should go to .35 directly.

Comment 22 Fedora Update System 2010-08-30 18:22:06 UTC
kernel-2.6.34.6-47.fc13 has been pushed to the Fedora 13 stable repository.  If problems still persist, please make note of it in this bug report.