Bug 730582
Summary: | suspend/resume crashes with nouveau and freezes my system completely | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Mr-4 <mr.dash.four> | ||||||||||
Component: | xorg-x11-drv-nouveau | Assignee: | Ben Skeggs <bskeggs> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 15 | CC: | aeriksson, airlied, ajax, bskeggs, gansalmon, itamar, jan.public, jonathan, kernel-maint, madhu.chinakonda, pawelprazak | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | x86_64 | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2011-10-31 20:12:25 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Description
Mr-4
2011-08-14 16:49:58 UTC
I can confirm this problem ion my ancient P-II machine with an NV17 card (actually, running gentoo -stable). X11 starts up fine, all vt's work fine. After the first hibernation cycle, X11 on vt7 still works ok, but any attemt to switch to another VT results in the above logs and a blank vt. switching back to vt7 brings back a functioning X11. This is on vanilla 3.0.1 kernel. Possible solution: Replace (compile and install) the Nouveau DRM driver which exists in the kernel with the one from the Nouveau web site following this guide: http://nouveau.freedesktop.org/wiki/InstallDRM So, I suppose the kernel maintainers need to get off their backsides and sync the Nouveau tree with the Fedora kernel one to bring in the latest Nouveau code updates. Please note that simply compiling and installing Nouveau from the above link *won't work* because in some cases (mine included) Nouveau DRM driver is also included in initramfs as there are Plymouth dependencies there which have to be satisfied. What needs to be done in this case is: 0. Backup your old initramfs 1. Unpack initramfs in some temporary directory; 2. Copy the 6 files installed by the DRM guide above to the same directory where all files from the initramfs image were unpacked in step 1 above; 3. Re-package initramfs again and install it in its place (normally /boot) Once this is done everything should be OK - I've had 7/7 hibernation/restore cycles since then and no problems were encountered. (In reply to comment #2) > Possible solution: Replace (compile and install) the Nouveau DRM driver which > exists in the kernel with the one from the Nouveau web site following this > guide: http://nouveau.freedesktop.org/wiki/InstallDRM I fixed this problem *very* recently in the nouveau tree. It's queued for Linux 3.1, but didn't make it for 3.0, hence not being in the 2.6.40 tree. I'll look today at how invasive it'd be to fix in the F15 kernel. > > So, I suppose the kernel maintainers need to get off their backsides and sync > the Nouveau tree with the Fedora kernel one to bring in the latest Nouveau code > updates. That's generally not a good plan. The Nouveau tree at any given time can be horrifically unstable, and contain stuff that's raw and untested. I (both upstream nouveau and Fedora nouveau maintainer FWIW) get the stable patches into the upstream kernel tree, which in turn end up in Fedora the same way. > > Please note that simply compiling and installing Nouveau from the above link > *won't work* because in some cases (mine included) Nouveau DRM driver is also > included in initramfs as there are Plymouth dependencies there which have to be > satisfied. What needs to be done in this case is: > > 0. Backup your old initramfs > 1. Unpack initramfs in some temporary directory; > 2. Copy the 6 files installed by the DRM guide above to the same directory > where all files from the initramfs image were unpacked in step 1 above; > 3. Re-package initramfs again and install it in its place (normally /boot) > > Once this is done everything should be OK - I've had 7/7 hibernation/restore > cycles since then and no problems were encountered. (In reply to comment #3) > I fixed this problem *very* recently in the nouveau tree. It's queued for > Linux 3.1, but didn't make it for 3.0, hence not being in the 2.6.40 tree. > I'll look today at how invasive it'd be to fix in the F15 kernel. It shouldn't pose any problems as I have been using the DRM drivers from the Nouveau tree on 2.6.40-(1-4) for more than 10 days now and had no issues so far - it works 100% as far as hibernate/restore is concerned. > > So, I suppose the kernel maintainers need to get off their backsides and sync > > the Nouveau tree with the Fedora kernel one to bring in the latest Nouveau code > > updates. > That's generally not a good plan. The Nouveau tree at any given time can be > horrifically unstable, and contain stuff that's raw and untested. I (both > upstream nouveau and Fedora nouveau maintainer FWIW) get the stable patches > into the upstream kernel tree, which in turn end up in Fedora the same way. As I have indicated in the initial bug report (above), the existing DRM driver (the one which comes with the 2.6.40/3.0.0 Fedora kernel) is not working - never has! Hibernate/restore always freezes my system - without fail - and judging by the comments written in this bug report I am not the only one! So, I don't see how allowing this abomination in mainstream is "a good plan", particularly given the fact that I did not experience these issues with the 2.6.38 version of the Fedora kernel. (In reply to comment #4) > (In reply to comment #3) > > I fixed this problem *very* recently in the nouveau tree. It's queued for > > Linux 3.1, but didn't make it for 3.0, hence not being in the 2.6.40 tree. > > I'll look today at how invasive it'd be to fix in the F15 kernel. > It shouldn't pose any problems as I have been using the DRM drivers from the > Nouveau tree on 2.6.40-(1-4) for more than 10 days now and had no issues so far > - it works 100% as far as hibernate/restore is concerned. For *you*. And it's not that straight-forward. It'll be fixed somehow, but the same patches that are in upstream may not necessarily be appropriate. > > > > So, I suppose the kernel maintainers need to get off their backsides and sync > > > the Nouveau tree with the Fedora kernel one to bring in the latest Nouveau code > > > updates. > > That's generally not a good plan. The Nouveau tree at any given time can be > > horrifically unstable, and contain stuff that's raw and untested. I (both > > upstream nouveau and Fedora nouveau maintainer FWIW) get the stable patches > > into the upstream kernel tree, which in turn end up in Fedora the same way. > As I have indicated in the initial bug report (above), the existing DRM driver > (the one which comes with the 2.6.40/3.0.0 Fedora kernel) is not working - > never has! > > Hibernate/restore always freezes my system - without fail - and judging by the > comments written in this bug report I am not the only one! So, I don't see how > allowing this abomination in mainstream is "a good plan", particularly given > the fact that I did not experience these issues with the 2.6.38 version of the > Fedora kernel. For *you*. There's also brand spanking new fan control code which could quite possibly accidentally switch off the GPU's fan completely in nouveau git and burn someone's card. Would you like me to push that into Fedora too? I've pushed the patches from 3.1 that should fix this issue into the f15 kernel git repository. I haven't done a build yet, I'll leave that for the kernel maintainers, there's several other commits there pending without a build so I'm not sure if they're ready yet. This bug should get updated automatically once an update has been submitted. The same is happening here, F15 x64, geforce go7300. Give this kernel a try: http://koji.fedoraproject.org/koji/buildinfo?buildID=260424 Thank for a quick reply :) Unfortunately the new kernel didn't fix the problem, maybe I have a different bug? I have a lots of artifacts with colors from the original image that should be displayed and they are blinking like a broken fluorescent lamp. System is responsive, I can hear it (sound and hard drive) and it responds to the keyboard. I use nouveau driver with GeForce 7300, the bug is reproducible 100% on every kernel (vmlinuz-2.6.40.3-2.fc15.x86_64, vmlinuz-2.6.40.3-0.fc15.x86_64, vmlinuz-2.6.38.6-26.rc1.fc15.x86_64) How can I help to pinpoint the problem? Created attachment 520024 [details]
suspend log
Created attachment 520025 [details]
messages log
Created attachment 520026 [details]
xorg log
Created attachment 520027 [details]
artifacts photo
Update: screenshots are completely normal (no artifacts) and I have ssh session working, what commands should I try? (In reply to comment #5) I will have the opportunity to compile/build and install the new kernel from koji (as per your post above) later today or over the weekend at the latest and will let you know whether that fixes the problem on my machine. > For *you*. There's also brand spanking new fan control code which could quite > possibly accidentally switch off the GPU's fan completely in nouveau git and > burn someone's card. Would you like me to push that into Fedora too? On a slightly different note, yesterday I have compiled and installed the latest DRM, using the latest git source, which introduces the fan control feature (checked in more than a week ago according to the git logs). I wanted to see whether I could use it on my card (a feature I have been missing - badly!). All went well, except that when I try to change the performance level (after setting the appropriate kernel parameter to 7777 as instructed) via echo X > .../performance_level this does change (cat .../performance_level shows that change), but nothing *actually* changes - the fan is still at 100% and not at the reduced speed as required by this performance level. Using the pwm0_min value (30 in my case) to "force" this issue (i.e. echo 30 > .../pwm0) changes the value (i.e. cat .../pwm0 shows "30") but does not actually change the fan speed at all! It is also worth noting that I have tried reducing my fan speed before - as instructed here - https://github.com/pathscale/pscnv/wiki/Power-Management - by using nvpeek/nvpoke (writing the appropriate values to port 0x10f0 - as applicable to my NV49 card), but that didn't work! I was hoping that the new nouveau DRM code would address this. I would also have opened a new bug for this, but don't know where to submit it. I am willing to test this and give you a hand, if needed, as I am very keen to use this feature - my fan is always @ 100% when I start my Linux system, which is extremely annoying! That, compared to 0% when I boot Windows. (In reply to comment #8) > Give this kernel a try: > http://koji.fedoraproject.org/koji/buildinfo?buildID=260424 That is so far so good! I've done about 10 hibernate/restore cycles with no issues to report, except a very minor one - sometimes, may be on 2 or 3 occasions, the hibernate process (I use the standard one which comes with the kernel - nothing fancy like) introduces some pretty heavy snow-flickers when the screen goes blank (this usually was a precursor for a restore failure previously!), but when I restore - successfully - and check the syslogs there isn't anything there in terms of unusual behaviour or errors, so I suppose the nouveau code is now capable of handling this sort of thing. I will continue to test this further and will report any issues arising from this. Thanks for fixing it - at long last, a decent restore/hibernate on my system! Now for the nVidia fan speed... :) (In reply to comment #9) > Thank for a quick reply :) > > Unfortunately the new kernel didn't fix the problem, maybe I have a different > bug? > > I have a lots of artifacts with colors from the original image that should be > displayed and they are blinking like a broken fluorescent lamp. > > System is responsive, I can hear it (sound and hard drive) and it responds to > the keyboard. > > I use nouveau driver with GeForce 7300, the bug is reproducible 100% on every > kernel (vmlinuz-2.6.40.3-2.fc15.x86_64, vmlinuz-2.6.40.3-0.fc15.x86_64, > vmlinuz-2.6.38.6-26.rc1.fc15.x86_64) > > How can I help to pinpoint the problem? This is a separate issue. Can you please file a new bug report containing all the logs you've submitted here. Can you also ensure you have a suspend/resume dmesg log with "drm.debug=14 log_buf_len=1M" in your kernel boot options. (In reply to comment #16) > (In reply to comment #8) > > Give this kernel a try: > > http://koji.fedoraproject.org/koji/buildinfo?buildID=260424 > That is so far so good! > > I've done about 10 hibernate/restore cycles with no issues to report, except a > very minor one - sometimes, may be on 2 or 3 occasions, the hibernate process > (I use the standard one which comes with the kernel - nothing fancy like) > introduces some pretty heavy snow-flickers when the screen goes blank (this > usually was a precursor for a restore failure previously!), but when I restore > - successfully - and check the syslogs there isn't anything there in terms of > unusual behaviour or errors, so I suppose the nouveau code is now capable of > handling this sort of thing. > > I will continue to test this further and will report any issues arising from > this. > > Thanks for fixing it - at long last, a decent restore/hibernate on my system! > Now for the nVidia fan speed... :) Thanks for letting me know it worked. As for fan speed.. Ignore the hype you see on phoronix. This is still very much a work in progress, and in my opinion, we should not be trumpeting this as anywhere near ready.. Anyway.. I have encountered one other person with a NV49 that doesn't respond the the normal PWM control regs, have not managed to track this down yet. If you email me privately (skeggsb, gmail) with your vbios image, we can work on tracking this down. (In reply to comment #18) > Thanks for letting me know it worked. OK, this is what happened last night - I restored my computer as normal, but this time the machine rebooted - "automatically" - straight after restore was done. I haven't touched anything, nor did I see the screen show me anything from the last time when I executed hibernate - it was immediate reboot. After checking the logs, there was nothing suspicious (nouveau and everything else restored normally - at least according to the logs), but as soon as the restore completed the machine rebooted (soft reboot - no memory test). I can't be 100% certain that this is caused by Nouveau though - it might be something else, which caused this (some other hw misbehaving, maybe). Just thought to let you know. > I have encountered one other person with a NV49 that doesn't respond the the > normal PWM control regs, have not managed to track this down yet. If you email > me privately (skeggsb, gmail) with your vbios image, we can work on tracking > this down. Will do, but it will be later tonight when I get home. I take it, I need to use the nvbios tool to do that, right? (In reply to comment #17) Thank you, I did as you said, here is the new report with logs: https://bugzilla.redhat.com/show_bug.cgi?id=734914 kernel-2.6.40.4-5.fc15 has been submitted as an update for Fedora 15. https://admin.fedoraproject.org/updates/kernel-2.6.40.4-5.fc15 kernel-2.6.40.4-5.fc15 has been pushed to the Fedora 15 stable repository. If problems still persist, please make note of it in this bug report. New set of errors after hibernate/restore cycle below. I get a different black-and-white "pattern" to the one which I used to get when I submitted the above bug report: the pattern now seems to be black and white squares instead of stripes. The system hangs completely (hardware reset needed) after seemingly futile attempt by nouveau to rectify the problem. My syslog is: Sep 12 23:01:52 test1 kernel: PM: Syncing filesystems ... done. Sep 12 23:01:52 test1 kernel: Freezing user space processes ... (elapsed 0.01 seconds) done. Sep 12 23:01:52 test1 kernel: Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done. Sep 12 23:01:52 test1 kernel: PM: Preallocating image memory... done (allocated 221311 pages) Sep 12 23:01:52 test1 kernel: PM: Allocated 885244 kbytes in 0.49 seconds (1806.62 MB/s) Sep 12 23:01:52 test1 kernel: Suspending console(s) (use no_console_suspend to debug) Sep 12 23:01:52 test1 kernel: i8042 kbd 00:0a: wake-up capability enabled by ACPI Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Disabling fbcon acceleration... Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Unpinning framebuffer(s)... Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Evicting buffers... Sep 12 23:01:52 test1 kernel: sata_via 0000:00:0f.0: PCI INT B disabled Sep 12 23:01:52 test1 kernel: pciehp 0000:00:02.0:pcie04: pciehp_suspend ENTRY Sep 12 23:01:52 test1 kernel: HDA Intel 0000:80:01.0: PCI INT A disabled Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Idling channels... Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Suspending GPU objects... Sep 12 23:01:52 test1 kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: And we're gone! Sep 12 23:01:52 test1 kernel: PM: freeze of devices complete after 202.706 msecs Sep 12 23:01:52 test1 kernel: PM: late freeze of devices complete after 0.678 msecs Sep 12 23:01:52 test1 kernel: ACPI: Preparing to enter system sleep state S4 Sep 12 23:01:52 test1 restorecond: Read error (Interrupted system call) Sep 12 23:01:52 test1 kernel: PM: Saving platform NVS memory Sep 12 23:01:52 test1 kernel: Disabling non-boot CPUs ... Sep 12 23:01:52 test1 kernel: CPU 1 is now offline Sep 12 23:01:52 test1 kernel: PM: Creating hibernation image: Sep 12 23:01:52 test1 kernel: PM: Need to copy 105168 pages Sep 12 23:01:52 test1 kernel: PM: Restoring platform NVS memory Sep 12 23:01:52 test1 kernel: Enabling non-boot CPUs ... Sep 12 23:01:52 test1 kernel: Booting Node 0 Processor 1 APIC 0x1 Sep 12 23:01:52 test1 kernel: NMI watchdog enabled, takes one hw-pmu counter. Sep 12 23:01:52 test1 kernel: Switched to NOHz mode on CPU #1 Sep 12 23:01:52 test1 kernel: CPU1 is up Sep 12 23:01:52 test1 kernel: ACPI: Waking up from system sleep state S4 Sep 12 23:01:52 test1 kernel: PM: early restore of devices complete after 0.959 msecs Sep 12 23:01:52 test1 kernel: pciehp 0000:00:02.0:pcie04: pciehp_resume ENTRY Sep 12 23:01:52 test1 kernel: sata_via 0000:00:0f.0: PCI INT B -> GSI 21 (level, low) -> IRQ 21 Sep 12 23:01:52 test1 kernel: usb usb2: root hub lost power or was reset Sep 12 23:01:52 test1 kernel: usb usb3: root hub lost power or was reset Sep 12 23:01:52 test1 kernel: usb usb4: root hub lost power or was reset Sep 12 23:01:52 test1 kernel: usb usb5: root hub lost power or was reset Sep 12 23:01:52 test1 kernel: usb usb1: root hub lost power or was reset Sep 12 23:01:52 test1 kernel: via-rhine 0000:00:12.0: eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: We're back, enabling device... Sep 12 23:01:52 test1 kernel: agpgart-via 0000:00:00.0: AGP 3.5 bridge Sep 12 23:01:52 test1 kernel: agpgart: kworker/u:7 tried to set rate=x12. Setting to AGP3 x8 mode. Sep 12 23:01:52 test1 kernel: agpgart-via 0000:00:00.0: putting AGP V3 device into 8x mode Sep 12 23:01:52 test1 kernel: nouveau 0000:01:00.0: putting AGP V3 device into 8x mode Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: POSTing device... Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 0 at offset 0xDFFC Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 1 at offset 0xE8EF Sep 12 23:01:52 test1 kernel: HDA Intel 0000:80:01.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 Sep 12 23:01:52 test1 kernel: i8042 kbd 00:0a: wake-up capability disabled by ACPI Sep 12 23:01:52 test1 kernel: sd 2:0:0:0: [sda] Starting disk Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 2 at offset 0xF310 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 3 at offset 0xF48B Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 4 at offset 0xF5DF Sep 12 23:01:52 test1 kernel: agpgart-via 0000:00:00.0: AGP 3.5 bridge Sep 12 23:01:52 test1 kernel: agpgart: kworker/u:7 tried to set rate=x12. Setting to AGP3 x8 mode. Sep 12 23:01:52 test1 kernel: agpgart-via 0000:00:00.0: putting AGP V3 device into 8x mode Sep 12 23:01:52 test1 kernel: nouveau 0000:01:00.0: putting AGP V3 device into 8x mode Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Restoring GPU objects... Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Reinitialising engines... Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Restoring mode... Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0184 data 0x00004001 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: 0xD3FB: Parsing digital output script table Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0188 data 0x00004000 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x030c data 0x030bb000 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0310 data 0x00040000 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0314 data 0x00001000 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0318 data 0x00001000 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x031c data 0x00001000 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0320 data 0x00000004 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0324 data 0x00000101 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0328 data 0x00000000 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0184 data 0x00004001 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0188 data 0x00004000 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x030c data 0x030bf000 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0310 data 0x00044000 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0314 data 0x00001000 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0318 data 0x00001000 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x031c data 0x00001000 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0320 data 0x00000004 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0324 data 0x00000101 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0328 data 0x00000000 Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on tmds encoder (output 1) Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on vga encoder (output 0) Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on vga encoder (output 2) Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on TV encoder (output 3) Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: 0xD3FB: Parsing digital output script table Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Setting dpms mode 0 on tmds encoder (output 1) Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Output DVI-I-1 is running on CRTC 0 using output C Sep 12 23:01:52 test1 kernel: ata4.00: ACPI cmd ef/03:42:00:00:00:a0 (SET FEATURES) filtered out Sep 12 23:01:52 test1 kernel: ata3.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out Sep 12 23:01:52 test1 kernel: ata3.00: ACPI cmd ef/03:01:00:00:00:a0 (SET FEATURES) filtered out Sep 12 23:01:52 test1 kernel: ata4.00: configured for UDMA/33 Sep 12 23:01:52 test1 kernel: ata3.00: configured for UDMA/100 Sep 12 23:01:52 test1 kernel: usb 1-2: reset high speed USB device number 2 using ehci_hcd Sep 12 23:01:52 test1 kernel: usb 1-2.2: reset low speed USB device number 3 using ehci_hcd Sep 12 23:01:52 test1 kernel: PM: restore of devices complete after 1168.258 msecs Sep 12 23:01:52 test1 kernel: Restarting tasks ... done. Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PFIFO still angry after 101 spins, halt Sep 12 23:01:55 test1 kernel: [drm] nouveau 0000:01:00.0: reloc wait_idle failed: -16 Sep 12 23:01:55 test1 kernel: [drm] nouveau 0000:01:00.0: reloc apply: -16 Sep 12 23:01:58 test1 kernel: [drm] nouveau 0000:01:00.0: reloc wait_idle failed: -16 Sep 12 23:01:58 test1 kernel: [drm] nouveau 0000:01:00.0: reloc apply: -16 Sep 12 23:02:01 test1 kernel: [drm] nouveau 0000:01:00.0: fail ttm_validate Sep 12 23:02:01 test1 kernel: [drm] nouveau 0000:01:00.0: validate vram_list Sep 12 23:02:01 test1 kernel: [drm] nouveau 0000:01:00.0: validate: -16 Sep 12 23:02:04 test1 kernel: [drm] nouveau 0000:01:00.0: fail ttm_validate Sep 12 23:02:04 test1 kernel: [drm] nouveau 0000:01:00.0: validate vram_list Sep 12 23:02:04 test1 kernel: [drm] nouveau 0000:01:00.0: validate: -16 Sep 12 23:02:07 test1 kernel: [drm] nouveau 0000:01:00.0: reloc wait_idle failed: -16 Sep 12 23:02:07 test1 kernel: [drm] nouveau 0000:01:00.0: reloc apply: -16 Sep 12 23:02:17 test1 kernel: [drm] nouveau 0000:01:00.0: GPU lockup - switching to software fbcon Sep 12 23:02:18 test1 abrt[11306]: saved core dump of pid 1944 (/usr/bin/Xorg) to /var/spool/abrt/ccpp-1315864937-1944.new/coredump (29745152 bytes) Sep 12 23:02:18 test1 abrtd: Directory 'ccpp-1315864937-1944' creation detected Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: Failed to idle channel 1. Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000019 Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000018 Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x8000001a Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000013 Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000017 Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000015 Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000016 Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000011 Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000012 Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x8000001c Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x8000001b Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000010 Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x00000000 Are you certain you was running the updated kernel at this point? Some of the errors in the kernel log would've been fixed by the kernel I pointed you at, I just double-checked to make sure the current f15 kernel still has the patches, and it should. I am running that kernel, though the nouveau driver has been compiled from the nouveau git dated 30 August 2011 (the current point of master as far as I can see from the git logs) as I was under the impression that this is newer than the kernel version, isn't that the case? Yeah, that should have the fixes for sure.. (In reply to comment #26) > Yeah, that should have the fixes for sure.. Well, clearly, it does not fix that particular bug as evident by the syslogs I posted above, though, admittedly, this does not happen as frequent as before - it is the first instance I am getting the above errors after about 20+ hibernate/restore cycles, compared to getting it every time with the previous revision of the nouveau driver. I am inclined to close this bug as I haven't had this (or any other nouveau) error for over a month now - the nouveau driver in the 2.6.40-6 (3.0.6) kernel seems to be very stable. Hibernate/restore works every time and although I get the occasional data corruption when I reboot - as oppose to hibernate again (the kernel-implemented hibernate is not 100% there yet, unfortunately), my system - and nouveau in particular - seems very stable. Thank you for letting us know. Please refer to https://bugs.freedesktop.org/show_bug.cgi?id=50121 for the full history of this. The above bug was finally fixed in 3.7.4 (with kernel versions before 3.7.4 nouveau was crashing, albeit infrequently; with 3.7.4 it was absolutely rock-solid - never had any crashes with over a 100 hibernate/resume cycles completed), but since upgrading to 3.7.9 the nightmare has returned! As there are no noticeable changes to the nouveau kernel driver in the "normal" kernel tree, I am wondering whether there is something done on the Fedora-flavour side of the kernel, hence placing this comment here to see if that is the case. |