Bug 751753
Summary: | [abrt] kernel: BUG: unable to handle kernel paging request at bffffffc | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Jozef Mlich <jmlich> | ||||||||||||
Component: | xorg-x11-drv-nouveau | Assignee: | Ben Skeggs <bskeggs> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||||
Priority: | unspecified | ||||||||||||||
Version: | 16 | CC: | airlied, ajax, bibo, brian, bskeggs, cje, fedora, gansalmon, hhorak, itamar, jaakko.airo, jonathan, kernel-maint, madhu.chinakonda, mcepl, redhat, szen012, trevor | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | i686 | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | abrt_hash:a8b0a2bdc3fad7b0e288fb40e213916438f21b64 | ||||||||||||||
Fixed In Version: | kernel-3.1.4-1.fc16 | Doc Type: | Bug Fix | ||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2011-12-06 01:02:20 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Attachments: |
|
Description
Jozef Mlich
2011-11-07 13:03:44 UTC
Created attachment 532036 [details]
File: IMG_1578.jpg
Created attachment 532037 [details]
File: Xorg.0.log
Thanks for the bug report. We have reviewed the information you have provided above, and there is some additional information we require that will be helpful in our diagnosis of this issue. Please add drm.debug=0x04 to the kernel command line, restart computer, and attach * your X server config file (/etc/X11/xorg.conf, if available), * X server log file (/var/log/Xorg.*.log*; check with grep Backtrace /var/log/Xorg* which logs might be the most interesting ones, send us at least Xorg.0.log) * output of the dmesg command, and * system log (/var/log/messages) to the bug report as individual uncompressed file attachments using the bugzilla file attachment link above. We will review this issue again once you've had a chance to attach this information. Thanks in advance. Created attachment 532070 [details]
Xorg.0.log
modified grub2 settings as follows
[imlich@pcmlich ~]$ cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Fedora"
GRUB_DEFAULT=saved
GRUB_CMDLINE_LINUX="rd.md=0 rd.lvm=0 rd.dm=0 KEYTABLE=us quiet SYSFONT=latarcyrheb-sun16 rhgb rd.luks=0 LANG=en_US.UTF-8 drm.debug=0x04"
/etc/X11/xorg.conf is not available
Created attachment 532071 [details]
/var/log/messages
Created attachment 532072 [details]
dmesg output
I got three times similar looking oops "unable to handle kernel paging request at bffffffc", ":IP: [<f87159e0>] nouveau_sgdma_clear+0x26/0x45 [nouveau]" during loading of page http://www.computeraudiophile.com/content/HD-music-fft-atlas-reference-thread with Firefox or soon afterwards. My hardware's smolt page is at http://www.smolts.org/client/show/pub_eafd4357-c35b-4b7f-8ead-ce167a9052c0 F16 up-to-date in a Dell Inspiron 9300 laptoop with NV41.8 [GeForce Go 6800]. *** Bug 755149 has been marked as a duplicate of this bug. *** Package: kernel Architecture: i686 OS Release: Fedora release 16 (Verne) Comment ----- Not know... Package: kernel Architecture: i686 OS Release: Fedora release 16 (Verne) Comment ----- Not know... Another "me too". The only (useful) thing I have to add is my graphics card type which is NV44: 08:00.0 VGA compatible controller: nVidia Corporation NV44 [Quadro NVS 285] (rev a1) *** Bug 753066 has been marked as a duplicate of this bug. *** *** Bug 753563 has been marked as a duplicate of this bug. *** And there a "me too" from me. The graphic card as lspci says is as following: :(4318:471:6058:8293) pci, nouveau, VIDEO, G72M [Quadro NVS 110M/GeForce Go 7300] I've managed to reproduce this exactly once so far, so progress in tracking this down is slow. I'm working on it though. FWIW i'm pretty certain i'm getting this daily on: nVidia Corporation NV34 [GeForce FX 5500] [10de:0326] (rev a1) (In reply to comment #15) > I've managed to reproduce this exactly once so far, so progress in tracking > this down is slow. I'm working on it though. I got the oops now for the fifth and sixth time by loading the page http://www.computeraudiophile.com/content/HD-music-fft-atlas-reference-thread in Firefox and maybe jumping to page 2 in the thread, and maybe reloading the page 1 again. FWIW. A temporary workaround for me has been forcing fallback mode in Gnome 3 (System Settings > System Info > Graphics > Forced Fallback Mode.) I suspect disabling nouveau's hardware acceleration will also work for other environments (appending nouveau.noaccel=1 to the kernel options.) I've not experienced any freezes since fallback mode was turned on. lspci: 01:00.0 VGA compatible controller: nVidia Corporation NV41GL [Quadro FX 1400] (rev a2) (In reply to comment #17) > (In reply to comment #15) > > I've managed to reproduce this exactly once so far, so progress in tracking > > this down is slow. I'm working on it though. > > I got the oops now for the fifth and sixth time by loading the page > > http://www.computeraudiophile.com/content/HD-music-fft-atlas-reference-thread > > in Firefox and maybe jumping to page 2 in the thread, and maybe reloading the > page 1 again. FWIW. Yep, that was the page that allowed me to reproduce any issue at all. The trouble is that what I was seeing was a complete hard lockup, not even netconsole was useful - which makes it very difficult to debug this when you can get zero output about what's going on. I got the lockup once, randomly. Until today. I decided to take a different approach and tracked the issue as occuring between 3.1.0-0.rc6.git0.3.fc16.i686, and 3.1.0-0.rc7.git0.0.fc16.i686. the former seemed to work correctly, and the latter gives me the same backtrace you're all seeing. So, I have a starting point now at least :) (In reply to comment #19) > (In reply to comment #17) > > (In reply to comment #15) > > > I've managed to reproduce this exactly once so far, so progress in tracking > > > this down is slow. I'm working on it though. > > > > I got the oops now for the fifth and sixth time by loading the page > > > > http://www.computeraudiophile.com/content/HD-music-fft-atlas-reference-thread > > > > in Firefox and maybe jumping to page 2 in the thread, and maybe reloading the > > page 1 again. FWIW. > > Yep, that was the page that allowed me to reproduce any issue at all. The > trouble is that what I was seeing was a complete hard lockup, not even > netconsole was useful - which makes it very difficult to debug this when you > can get zero output about what's going on. I got the lockup once, randomly. Err, got the *backtrace* once, randomly. > > Until today. I decided to take a different approach and tracked the issue as > occuring between 3.1.0-0.rc6.git0.3.fc16.i686, and > 3.1.0-0.rc7.git0.0.fc16.i686. the former seemed to work correctly, and the > latter gives me the same backtrace you're all seeing. So, I have a starting > point now at least :) Ok. I think I have a solution. This has only been tested on top of the first kernel I found that was bad (3.1.0-0.rc7.git0.0.fc16.i686), but I've prepared a build on top of the latest f16 kernel. http://koji.fedoraproject.org/koji/taskinfo?taskID=3539966 It hasn't finished building yet so keep an eye out for that, and let me know how you fare! On Monday I'll confirm myself the fix works on top of latest git, and the latest F16 kernel, and get it sorted out properly. i'm using that kernel and tried looking at the killer web page from comment 17 - and got a new (i think) crash on my FX5500. i wrote down what i hope is the most helpful bit of the error but it shouldn't be too hard to reproduce if you need more data. so, from the error: EIP is at kthread_data+0xf/0x13 ... Call Trace: wq_worker_sleeping kernel-3.1.3-0.rc1.1.fc16.i686 still gives the oops. NV41.8 [GeForce Go 6800] (rev a2). (In reply to comment #23) > kernel-3.1.3-0.rc1.1.fc16.i686 still gives the oops. NV41.8 [GeForce Go 6800] > (rev a2). The version of the kernel I posted, or the vanilla f16 kernel of the same version? (In reply to comment #24) > (In reply to comment #23) > > kernel-3.1.3-0.rc1.1.fc16.i686 still gives the oops. NV41.8 [GeForce Go 6800] > > (rev a2). > > The version of the kernel I posted, or the vanilla f16 kernel of the same > version? I managed to reproduce a second instance of this bug, with a different backtrace (this is important to know). So, it's possible this second case is what you were seeing. I've put the patch covering both instances into an "official" fedora kernel build[1], once again it hasn't finished building at the time of posting this, so keep an eye out. Thanks, Ben. [1] http://koji.fedoraproject.org/koji/taskinfo?taskID=3545261 (In reply to comment #22) > i'm using that kernel and tried looking at the killer web page from comment 17 > - and got a new (i think) crash on my FX5500. i wrote down what i hope is the > most helpful bit of the error but it shouldn't be too hard to reproduce if you > need more data. > > so, from the error: > > EIP is at kthread_data+0xf/0x13 > ... > Call Trace: > wq_worker_sleeping I'm not sure this is the same bug, was there any more of a backtrace than this? (In reply to comment #26) > (In reply to comment #22) > I'm not sure this is the same bug, was there any more of a backtrace than this? there was but i didn't manage to capture it. i'm going to enable kdump to help if it happens again. speaking of which, i've installed your 3.1.3-2 kernel and have been happily refreshing the 'killer' page for a while without any problems. i'll try playing with some heavy 3D apps etc - see what it can cope with. (In reply to comment #24) > (In reply to comment #23) > > kernel-3.1.3-0.rc1.1.fc16.i686 still gives the oops. NV41.8 [GeForce Go 6800] > > (rev a2). > > The version of the kernel I posted, or the vanilla f16 kernel of the same > version? In comment #23 I tested and got the oops like the one in comment #22 $ rpm -qi kernel-3.1.3-0.rc1.1.fc16 | grep 'Build Date' Build Date : Fri 25 Nov 2011 10:17:08 AM EET Today I tested $ rpm -qi kernel-3.1.1-2.fc16.i686 | grep 'Build Date' Build Date : Mon 14 Nov 2011 07:16:41 PM EET I got once "[drm] nouveau 0000:01:00.0: GPU lockup - switching to software fbcon" and once "EIP is at nouveau_fence_update+0xe/0xa5 [nouveau]". But this is likely a separate bug. To get this, I have glxgears running, and I was loading/reloading the fft-atlas page mentioned above, and hitting Ctrl-Alt-PgUp/PgDn randomly. These were much harder to hit than the "EIP nouveau_sgdma_clear". (In reply to comment #28) > (In reply to comment #24) > > (In reply to comment #23) > > > kernel-3.1.3-0.rc1.1.fc16.i686 still gives the oops. NV41.8 [GeForce Go 6800] > > > (rev a2). > > > > The version of the kernel I posted, or the vanilla f16 kernel of the same > > version? > > In comment #23 I tested and got the oops like the one in comment #22 > $ rpm -qi kernel-3.1.3-0.rc1.1.fc16 | grep 'Build Date' > Build Date : Fri 25 Nov 2011 10:17:08 AM EET > > Today I tested > $ rpm -qi kernel-3.1.1-2.fc16.i686 | grep 'Build Date' > Build Date : Mon 14 Nov 2011 07:16:41 PM EET > > I got once "[drm] nouveau 0000:01:00.0: GPU lockup - switching to software > fbcon" and once "EIP is at nouveau_fence_update+0xe/0xa5 [nouveau]". But this > is likely a separate bug. To get this, I have glxgears running, and I was > loading/reloading the fft-atlas page mentioned above, and hitting > Ctrl-Alt-PgUp/PgDn randomly. These were much harder to hit than the "EIP > nouveau_sgdma_clear". Yep, a completely separate bug here. If you could file a new one and post the entire backtrace (hopefully it appears in your kernel log), that'll be helpful. (In reply to comment #28) > Today I tested > $ rpm -qi kernel-3.1.1-2.fc16.i686 | grep 'Build Date' > Build Date : Mon 14 Nov 2011 07:16:41 PM EET The kernel I tested was the new candidate, not the stock Fedora 3.1.1 $ rpm -qi kernel-3.1.3-2.fc16 | grep 'Build Date' Build Date : Mon 28 Nov 2011 06:22:51 AM EET New bug is filed at https://bugzilla.redhat.com/show_bug.cgi?id=757989 kernel-3.1.4-1.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/kernel-3.1.4-1.fc16 kernel-3.1.4-1.fc16 works for me so far. No crashes when I visit and refresh the "killer website." yup, kernel-3.1.4-1.fc16.i686 is looking good to me too. still get one or two 'lock ups' (mouse moves but nothing else happens) per day but looks like that's a separate issue. thanks Ben! :-) Package kernel-3.1.4-1.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing kernel-3.1.4-1.fc16' as soon as you are able to, then reboot. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2011-16645/kernel-3.1.4-1.fc16 then log in and leave karma (feedback). kernel-3.1.4-1.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report. So far no crashes in 4 days. However, sometimes takes 7 days to crash so will report back. "Killer web page" didn't kill me now (but I didn't test pre-3.1.4 for comparison). |