Bug 730582

Summary: suspend/resume crashes with nouveau and freezes my system completely
Product: [Fedora] Fedora Reporter: Mr-4 <mr.dash.four>
Component: xorg-x11-drv-nouveauAssignee: Ben Skeggs <bskeggs>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 15CC: aeriksson, airlied, ajax, bskeggs, gansalmon, itamar, jan.public, jonathan, kernel-maint, madhu.chinakonda, pawelprazak
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-31 20:12:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
suspend log
none
messages log
none
xorg log
none
artifacts photo none

Description Mr-4 2011-08-14 16:49:58 UTC
Description of problem:
Resume after suspend crashes my machine, without fail, after which I am unable to use that machine unless I switch it off and back on again. This wasn't the case with the previous kernel - 2.6.38 - and only happened when I upgraded to 2.6.40-4.

Version-Release number of selected component (if applicable):
2.6.40-4 on FC15

How reproducible:
Always

Steps to Reproduce:
1. Include/configure/install/use nouveau.ko driver supplied with the above kernel version.
2. start X/gdm (I use gnome)
3. suspend (via shutdown/suspend) and then resume.
  
Actual results:
This is the log I am getting during suspend/resume (var/log/messages):
Aug 14 16:29:32 test1 kernel: PM: Syncing filesystems ... done.
Aug 14 16:29:32 test1 kernel: Freezing user space processes ... (elapsed 0.01 seconds) done.
Aug 14 16:29:32 test1 kernel: Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
Aug 14 16:29:32 test1 kernel: PM: Preallocating image memory... done (allocated 220081 pages)
Aug 14 16:29:32 test1 kernel: PM: Allocated 880324 kbytes in 0.44 seconds (2000.73 MB/s)
Aug 14 16:29:32 test1 kernel: Suspending console(s) (use no_console_suspend to debug)
Aug 14 16:29:32 test1 kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Aug 14 16:29:32 test1 kernel: i8042 kbd 00:0a: wake-up capability enabled by ACPI
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Disabling fbcon acceleration...
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Unpinning framebuffer(s)...
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Evicting buffers...
Aug 14 16:29:32 test1 kernel: sata_via 0000:00:0f.0: PCI INT B disabled
Aug 14 16:29:32 test1 kernel: pciehp 0000:00:02.0:pcie04: pciehp_suspend ENTRY
Aug 14 16:29:32 test1 kernel: agpgart-via 0000:00:00.0: Refused to change power state, currently in D0
Aug 14 16:29:32 test1 kernel: HDA Intel 0000:80:01.0: PCI INT A disabled
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Idling channels...
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Suspending GPU objects...
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: And we're gone!
Aug 14 16:29:32 test1 kernel: PM: freeze of devices complete after 250.760 msecs
Aug 14 16:29:32 test1 kernel: PM: late freeze of devices complete after 0.567 msecs
Aug 14 16:29:32 test1 kernel: ACPI: Preparing to enter system sleep state S4
Aug 14 16:29:32 test1 kernel: PM: Saving platform NVS memory
Aug 14 16:29:32 test1 kernel: Disabling non-boot CPUs ...
Aug 14 16:29:32 test1 kernel: Broke affinity for irq 23
Aug 14 16:29:32 test1 kernel: CPU 1 is now offline
Aug 14 16:29:32 test1 kernel: PM: Creating hibernation image:
Aug 14 16:29:32 test1 kernel: PM: Need to copy 181763 pages
Aug 14 16:29:32 test1 kernel: PM: Restoring platform NVS memory
Aug 14 16:29:32 test1 kernel: Enabling non-boot CPUs ...
Aug 14 16:29:32 test1 restorecond: Read error (Interrupted system call)
Aug 14 16:29:32 test1 kernel: Booting Node 0 Processor 1 APIC 0x1
Aug 14 16:29:32 test1 kernel: Switched to NOHz mode on CPU #1
Aug 14 16:29:32 test1 kernel: NMI watchdog enabled, takes one hw-pmu counter.
Aug 14 16:29:32 test1 kernel: CPU1 is up
Aug 14 16:29:32 test1 kernel: ACPI: Waking up from system sleep state S4
Aug 14 16:29:32 test1 kernel: PM: early restore of devices complete after 0.942 msecs
Aug 14 16:29:32 test1 kernel: pciehp 0000:00:02.0:pcie04: pciehp_resume ENTRY
Aug 14 16:29:32 test1 kernel: sata_via 0000:00:0f.0: PCI INT B -> GSI 21 (level, low) -> IRQ 21
Aug 14 16:29:32 test1 kernel: usb usb2: root hub lost power or was reset
Aug 14 16:29:32 test1 kernel: usb usb3: root hub lost power or was reset
Aug 14 16:29:32 test1 kernel: usb usb4: root hub lost power or was reset
Aug 14 16:29:32 test1 kernel: usb usb5: root hub lost power or was reset
Aug 14 16:29:32 test1 kernel: usb usb1: root hub lost power or was reset
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: We're back, enabling device...
Aug 14 16:29:32 test1 kernel: agpgart-via 0000:00:00.0: AGP 3.5 bridge
Aug 14 16:29:32 test1 kernel: agpgart: kworker/u:1 tried to set rate=x12. Setting to AGP3 x8 mode.
Aug 14 16:29:32 test1 kernel: agpgart-via 0000:00:00.0: putting AGP V3 device into 8x mode
Aug 14 16:29:32 test1 kernel: nouveau 0000:01:00.0: putting AGP V3 device into 8x mode
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: POSTing device...
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 0 at offset 0xDFFC
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 1 at offset 0xE8EF
Aug 14 16:29:32 test1 kernel: via-rhine 0000:00:12.0: eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1
Aug 14 16:29:32 test1 kernel: HDA Intel 0000:80:01.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
Aug 14 16:29:32 test1 kernel: sd 2:0:0:0: [sda] Starting disk
Aug 14 16:29:32 test1 kernel: i8042 kbd 00:0a: wake-up capability disabled by ACPI
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 2 at offset 0xF310
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 3 at offset 0xF48B
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 4 at offset 0xF5DF
Aug 14 16:29:32 test1 kernel: agpgart-via 0000:00:00.0: AGP 3.5 bridge
Aug 14 16:29:32 test1 kernel: agpgart: kworker/u:1 tried to set rate=x12. Setting to AGP3 x8 mode.
Aug 14 16:29:32 test1 kernel: agpgart-via 0000:00:00.0: putting AGP V3 device into 8x mode
Aug 14 16:29:32 test1 kernel: nouveau 0000:01:00.0: putting AGP V3 device into 8x mode
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Restoring GPU objects...
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Reinitialising engines...
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Restoring mode...
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: 0xD3FB: Parsing digital output script table
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on tmds encoder (output 1)
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on vga encoder (output 0)
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on vga encoder (output 2)
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on TV encoder (output 3)
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: 0xD3FB: Parsing digital output script table
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Setting dpms mode 0 on tmds encoder (output 1)
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: Output DVI-I-1 is running on CRTC 0 using output C
Aug 14 16:29:32 test1 kernel: ata4.00: ACPI cmd ef/03:42:00:00:00:a0 (SET FEATURES) filtered out
Aug 14 16:29:32 test1 kernel: ata4.00: configured for UDMA/33
Aug 14 16:29:32 test1 kernel: ata3.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
Aug 14 16:29:32 test1 kernel: ata3.00: ACPI cmd ef/03:01:00:00:00:a0 (SET FEATURES) filtered out
Aug 14 16:29:32 test1 kernel: ata3.00: configured for UDMA/100
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0be4 data 0x00000000
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0be8 data 0x00100008
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0bec data 0x00000000
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0bf0 data 0x00aaaaaa
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0bf4 data 0x00100008
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0bf8 data 0x00100008
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0bfc data 0x00000000
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0c00 data 0x00000000
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0c04 data 0x00000000
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0c08 data 0x00000000
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0c0c data 0x00000000
Aug 14 16:29:32 test1 kernel: usb 1-2: reset high speed USB device number 2 using ehci_hcd
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0be4 data 0x00000000
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0be8 data 0x00100008
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0bec data 0x00000000
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0bf0 data 0x00aaaaaa
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0bf4 data 0x00100008
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0bf8 data 0x00100008
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0bfc data 0x00000000
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0c00 data 0x00000000
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Aug 14 16:29:32 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 3 class 0x0039 mthd 0x0c04 data 0x00000000
Aug 14 16:29:32 test1 kernel: usb 1-2.2: reset low speed USB device number 3 using ehci_hcd
Aug 14 16:29:32 test1 kernel: PM: restore of devices complete after 1165.489 msecs
Aug 14 16:29:32 test1 kernel: Restarting tasks ... done.


Expected results:
Resume to function properly and restore my system to its previous state.

Additional info:
I am using Intel Core 2 (x86_64) system with nVidia 7800GS. uname -a shows this:
Linux test1.me.net 2.6.40-4.fc15.x86_64 #1 SMP Fri Aug 12 23:34:52 BST 2011 x86_64 x86_64 x86_64 GNU/Linux

nvclock -i shows this about my video card (may be useful in finding what the problem is):
-- General info --
Card: 		nVidia Geforce 7800GS
Architecture: 	NV49/G71 A2
PCI id: 	0xf5
GPU clock: 	275.400 MHz
Bustype: 	AGP (BR02)

-- Pipeline info --
Pixel units: 4x4 (101110b)
Vertex units: 6x1 (00111111b)
HW masked units: pixel 000111b vertex 00000000b
SW masked units: pixel 010001b vertex 11000000b

-- Memory info --
Amount: 	256 MB
Type: 		256 bit DDR3
Clock: 		600.750 MHz

-- Sensor info --
Sensor: National Semiconductor LM99
Board temperature: 41C
GPU temperature: 51C

-- VideoBios information --
Version: 05.71.22.21.0a
Signon message: GeForce 7800 GS AGP VGA BIOS
Performance level 0: gpu 275MHz/memory 600MHz/1.10V/50%
Performance level 1: gpu 440MHz/memory 650MHz/1.10V/79%
VID mask: 3
Voltage level 0: 1.05V, VID: 0
Voltage level 1: 1.10V, VID: 1
Voltage level 2: 1.20V, VID: 2

Comment 1 Anders Eriksson 2011-08-18 14:11:29 UTC
I can confirm this problem ion my ancient P-II machine with an NV17 card (actually, running gentoo -stable). 

X11 starts up fine, all vt's work fine. After the first hibernation cycle, X11 on vt7 still works ok, but any attemt to switch to another VT results in the above logs and a blank vt. switching back to vt7 brings back a functioning X11. This is on vanilla 3.0.1 kernel.

Comment 2 Mr-4 2011-08-19 09:54:34 UTC
Possible solution: Replace (compile and install) the Nouveau DRM driver which exists in the kernel with the one from the Nouveau web site following this guide: http://nouveau.freedesktop.org/wiki/InstallDRM

So, I suppose the kernel maintainers need to get off their backsides and sync the Nouveau tree with the Fedora kernel one to bring in the latest Nouveau code updates.

Please note that simply compiling and installing Nouveau from the above link *won't work* because in some cases (mine included) Nouveau DRM driver is also included in initramfs as there are Plymouth dependencies there which have to be satisfied. What needs to be done in this case is:

0. Backup your old initramfs
1. Unpack initramfs in some temporary directory;
2. Copy the 6 files installed by the DRM guide above to the same directory where all files from the initramfs image were unpacked in step 1 above;
3. Re-package initramfs again and install it in its place (normally /boot)

Once this is done everything should be OK - I've had 7/7 hibernation/restore cycles since then and no problems were encountered.

Comment 3 Ben Skeggs 2011-08-21 23:00:20 UTC
(In reply to comment #2)
> Possible solution: Replace (compile and install) the Nouveau DRM driver which
> exists in the kernel with the one from the Nouveau web site following this
> guide: http://nouveau.freedesktop.org/wiki/InstallDRM
I fixed this problem *very* recently in the nouveau tree.  It's queued for Linux 3.1, but didn't make it for 3.0, hence not being in the 2.6.40 tree.  I'll look today at how invasive it'd be to fix in the F15 kernel.

> 
> So, I suppose the kernel maintainers need to get off their backsides and sync
> the Nouveau tree with the Fedora kernel one to bring in the latest Nouveau code
> updates.
That's generally not a good plan.  The Nouveau tree at any given time can be horrifically unstable, and contain stuff that's raw and untested.  I (both upstream nouveau and Fedora nouveau maintainer FWIW) get the stable patches into the upstream kernel tree, which in turn end up in Fedora the same way.

> 
> Please note that simply compiling and installing Nouveau from the above link
> *won't work* because in some cases (mine included) Nouveau DRM driver is also
> included in initramfs as there are Plymouth dependencies there which have to be
> satisfied. What needs to be done in this case is:
> 
> 0. Backup your old initramfs
> 1. Unpack initramfs in some temporary directory;
> 2. Copy the 6 files installed by the DRM guide above to the same directory
> where all files from the initramfs image were unpacked in step 1 above;
> 3. Re-package initramfs again and install it in its place (normally /boot)
> 
> Once this is done everything should be OK - I've had 7/7 hibernation/restore
> cycles since then and no problems were encountered.

Comment 4 Mr-4 2011-08-22 10:29:48 UTC
(In reply to comment #3)
> I fixed this problem *very* recently in the nouveau tree.  It's queued for
> Linux 3.1, but didn't make it for 3.0, hence not being in the 2.6.40 tree. 
> I'll look today at how invasive it'd be to fix in the F15 kernel.
It shouldn't pose any problems as I have been using the DRM drivers from the Nouveau tree on 2.6.40-(1-4) for more than 10 days now and had no issues so far - it works 100% as far as hibernate/restore is concerned.

> > So, I suppose the kernel maintainers need to get off their backsides and sync
> > the Nouveau tree with the Fedora kernel one to bring in the latest Nouveau code
> > updates.
> That's generally not a good plan.  The Nouveau tree at any given time can be
> horrifically unstable, and contain stuff that's raw and untested.  I (both
> upstream nouveau and Fedora nouveau maintainer FWIW) get the stable patches
> into the upstream kernel tree, which in turn end up in Fedora the same way.
As I have indicated in the initial bug report (above), the existing DRM driver (the one which comes with the 2.6.40/3.0.0 Fedora kernel) is not working - never has!

Hibernate/restore always freezes my system - without fail - and judging by the comments written in this bug report I am not the only one! So, I don't see how allowing this abomination in mainstream is "a good plan", particularly given the fact that I did not experience these issues with the 2.6.38 version of the Fedora kernel.

Comment 5 Ben Skeggs 2011-08-22 23:29:18 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > I fixed this problem *very* recently in the nouveau tree.  It's queued for
> > Linux 3.1, but didn't make it for 3.0, hence not being in the 2.6.40 tree. 
> > I'll look today at how invasive it'd be to fix in the F15 kernel.
> It shouldn't pose any problems as I have been using the DRM drivers from the
> Nouveau tree on 2.6.40-(1-4) for more than 10 days now and had no issues so far
> - it works 100% as far as hibernate/restore is concerned.
For *you*.  And it's not that straight-forward.  It'll be fixed somehow, but the same patches that are in upstream may not necessarily be appropriate.

> 
> > > So, I suppose the kernel maintainers need to get off their backsides and sync
> > > the Nouveau tree with the Fedora kernel one to bring in the latest Nouveau code
> > > updates.
> > That's generally not a good plan.  The Nouveau tree at any given time can be
> > horrifically unstable, and contain stuff that's raw and untested.  I (both
> > upstream nouveau and Fedora nouveau maintainer FWIW) get the stable patches
> > into the upstream kernel tree, which in turn end up in Fedora the same way.
> As I have indicated in the initial bug report (above), the existing DRM driver
> (the one which comes with the 2.6.40/3.0.0 Fedora kernel) is not working -
> never has!
> 
> Hibernate/restore always freezes my system - without fail - and judging by the
> comments written in this bug report I am not the only one! So, I don't see how
> allowing this abomination in mainstream is "a good plan", particularly given
> the fact that I did not experience these issues with the 2.6.38 version of the
> Fedora kernel.

For *you*.  There's also brand spanking new fan control code which could quite possibly accidentally switch off the GPU's fan completely in nouveau git and burn someone's card.  Would you like me to push that into Fedora too?

Comment 6 Ben Skeggs 2011-08-23 05:00:07 UTC
I've pushed the patches from 3.1 that should fix this issue into the f15 kernel git repository.  I haven't done a build yet, I'll leave that for the kernel maintainers, there's several other commits there pending without a build so I'm not sure if they're ready yet.

This bug should get updated automatically once an update has been submitted.

Comment 7 Paweł Prażak 2011-08-26 00:19:00 UTC
The same is happening here, F15 x64, geforce go7300.

Comment 8 Ben Skeggs 2011-08-26 00:36:02 UTC
Give this kernel a try: http://koji.fedoraproject.org/koji/buildinfo?buildID=260424

Comment 9 Paweł Prażak 2011-08-26 07:57:08 UTC
Thank for a quick reply :)

Unfortunately the new kernel didn't fix the problem, maybe I have a different bug?

I have a lots of artifacts with colors from the original image that should be displayed and they are blinking like a broken fluorescent lamp.

System is responsive, I can hear it (sound and hard drive) and it responds to the keyboard.

I use nouveau driver with GeForce 7300, the bug is reproducible 100% on every kernel (vmlinuz-2.6.40.3-2.fc15.x86_64, vmlinuz-2.6.40.3-0.fc15.x86_64, vmlinuz-2.6.38.6-26.rc1.fc15.x86_64)

How can I help to pinpoint the problem?

Comment 10 Paweł Prażak 2011-08-26 07:59:12 UTC
Created attachment 520024 [details]
suspend log

Comment 11 Paweł Prażak 2011-08-26 07:59:55 UTC
Created attachment 520025 [details]
messages log

Comment 12 Paweł Prażak 2011-08-26 08:00:37 UTC
Created attachment 520026 [details]
xorg log

Comment 13 Paweł Prażak 2011-08-26 08:02:05 UTC
Created attachment 520027 [details]
artifacts photo

Comment 14 Paweł Prażak 2011-08-26 10:08:26 UTC
Update: screenshots are completely normal (no artifacts) and I have ssh session working, what commands should I try?

Comment 15 Mr-4 2011-08-26 10:31:38 UTC
(In reply to comment #5)
I will have the opportunity to compile/build and install the new kernel from koji (as per your post above) later today or over the weekend at the latest and will let you know whether that fixes the problem on my machine.

> For *you*.  There's also brand spanking new fan control code which could quite
> possibly accidentally switch off the GPU's fan completely in nouveau git and
> burn someone's card.  Would you like me to push that into Fedora too?
On a slightly different note, yesterday I have compiled and installed the latest DRM, using the latest git source, which introduces the fan control feature (checked in more than a week ago according to the git logs). I wanted to see whether I could use it on my card (a feature I have been missing - badly!). 

All went well, except that when I try to change the performance level (after setting the appropriate kernel parameter to 7777 as instructed) via echo X > .../performance_level this does change (cat .../performance_level shows that change), but nothing *actually* changes - the fan is still at 100% and not at the reduced speed as required by this performance level. 

Using the pwm0_min value (30 in my case) to "force" this issue (i.e. echo 30 > .../pwm0) changes the value (i.e. cat .../pwm0 shows "30") but does not actually change the fan speed at all!

It is also worth noting that I have tried reducing my fan speed before - as instructed here - https://github.com/pathscale/pscnv/wiki/Power-Management - by using nvpeek/nvpoke (writing the appropriate values to port 0x10f0 - as applicable to my NV49 card), but that didn't work! I was hoping that the new nouveau DRM code would address this.

I would also have opened a new bug for this, but don't know where to submit it. 

I am willing to test this and give you a hand, if needed, as I am very keen to use this feature - my fan is always @ 100% when I start my Linux system, which is extremely annoying! That, compared to 0% when I boot Windows.

Comment 16 Mr-4 2011-08-28 17:43:21 UTC
(In reply to comment #8)
> Give this kernel a try:
> http://koji.fedoraproject.org/koji/buildinfo?buildID=260424
That is so far so good! 

I've done about 10 hibernate/restore cycles with no issues to report, except a very minor one - sometimes, may be on 2 or 3 occasions, the hibernate process (I use the standard one which comes with the kernel - nothing fancy like) introduces some pretty heavy snow-flickers when the screen goes blank (this usually was a precursor for a restore failure previously!), but when I restore - successfully - and check the syslogs there isn't anything there in terms of unusual behaviour or errors, so I suppose the nouveau code is now capable of handling this sort of thing.

I will continue to test this further and will report any issues arising from this. 

Thanks for fixing it - at long last, a decent restore/hibernate on my system! Now for the nVidia fan speed... :)

Comment 17 Ben Skeggs 2011-08-30 03:48:00 UTC
(In reply to comment #9)
> Thank for a quick reply :)
> 
> Unfortunately the new kernel didn't fix the problem, maybe I have a different
> bug?
> 
> I have a lots of artifacts with colors from the original image that should be
> displayed and they are blinking like a broken fluorescent lamp.
> 
> System is responsive, I can hear it (sound and hard drive) and it responds to
> the keyboard.
> 
> I use nouveau driver with GeForce 7300, the bug is reproducible 100% on every
> kernel (vmlinuz-2.6.40.3-2.fc15.x86_64, vmlinuz-2.6.40.3-0.fc15.x86_64,
> vmlinuz-2.6.38.6-26.rc1.fc15.x86_64)
> 
> How can I help to pinpoint the problem?

This is a separate issue.  Can you please file a new bug report containing all the logs you've submitted here.  Can you also ensure you have a suspend/resume dmesg log with "drm.debug=14 log_buf_len=1M" in your kernel boot options.

Comment 18 Ben Skeggs 2011-08-30 03:52:03 UTC
(In reply to comment #16)
> (In reply to comment #8)
> > Give this kernel a try:
> > http://koji.fedoraproject.org/koji/buildinfo?buildID=260424
> That is so far so good! 
> 
> I've done about 10 hibernate/restore cycles with no issues to report, except a
> very minor one - sometimes, may be on 2 or 3 occasions, the hibernate process
> (I use the standard one which comes with the kernel - nothing fancy like)
> introduces some pretty heavy snow-flickers when the screen goes blank (this
> usually was a precursor for a restore failure previously!), but when I restore
> - successfully - and check the syslogs there isn't anything there in terms of
> unusual behaviour or errors, so I suppose the nouveau code is now capable of
> handling this sort of thing.
> 
> I will continue to test this further and will report any issues arising from
> this. 
> 
> Thanks for fixing it - at long last, a decent restore/hibernate on my system!
> Now for the nVidia fan speed... :)

Thanks for letting me know it worked.

As for fan speed.. Ignore the hype you see on phoronix.  This is still very much a work in progress, and in my opinion, we should not be trumpeting this as anywhere near ready.. Anyway..

I have encountered one other person with a NV49 that doesn't respond the the normal PWM control regs, have not managed to track this down yet.  If you email me privately (skeggsb, gmail) with your vbios image, we can work on tracking this down.

Comment 19 Mr-4 2011-08-30 12:20:10 UTC
(In reply to comment #18)
> Thanks for letting me know it worked.
OK, this is what happened last night - I restored my computer as normal, but this time the machine rebooted - "automatically" - straight after restore was done. I haven't touched anything, nor did I see the screen show me anything from the last time when I executed hibernate - it was immediate reboot.

After checking the logs, there was nothing suspicious (nouveau and everything else restored normally - at least according to the logs), but as soon as the restore completed the machine rebooted (soft reboot - no memory test). 

I can't be 100% certain that this is caused by Nouveau though - it might be something else, which caused this (some other hw misbehaving, maybe). Just thought to let you know.

> I have encountered one other person with a NV49 that doesn't respond the the
> normal PWM control regs, have not managed to track this down yet.  If you email
> me privately (skeggsb, gmail) with your vbios image, we can work on tracking
> this down.
Will do, but it will be later tonight when I get home. I take it, I need to use the nvbios tool to do that, right?

Comment 20 Paweł Prażak 2011-08-31 20:27:18 UTC
(In reply to comment #17)

Thank you, I did as you said, here is the new report with logs:
https://bugzilla.redhat.com/show_bug.cgi?id=734914

Comment 21 Fedora Update System 2011-09-01 11:06:41 UTC
kernel-2.6.40.4-5.fc15 has been submitted as an update for Fedora 15.
https://admin.fedoraproject.org/updates/kernel-2.6.40.4-5.fc15

Comment 22 Fedora Update System 2011-09-07 00:00:48 UTC
kernel-2.6.40.4-5.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 23 Mr-4 2011-09-13 15:53:33 UTC
New set of errors after hibernate/restore cycle below. 

I get a different black-and-white "pattern" to the one which I used to get when I submitted the above bug report: the pattern now seems to be black and white squares instead of stripes. The system hangs completely (hardware reset needed) after seemingly futile attempt by nouveau to rectify the problem. 

My syslog is:

Sep 12 23:01:52 test1 kernel: PM: Syncing filesystems ... done.
Sep 12 23:01:52 test1 kernel: Freezing user space processes ... (elapsed 0.01 seconds) done.
Sep 12 23:01:52 test1 kernel: Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done.
Sep 12 23:01:52 test1 kernel: PM: Preallocating image memory... done (allocated 221311 pages)
Sep 12 23:01:52 test1 kernel: PM: Allocated 885244 kbytes in 0.49 seconds (1806.62 MB/s)
Sep 12 23:01:52 test1 kernel: Suspending console(s) (use no_console_suspend to debug)
Sep 12 23:01:52 test1 kernel: i8042 kbd 00:0a: wake-up capability enabled by ACPI
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Disabling fbcon acceleration...
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Unpinning framebuffer(s)...
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Evicting buffers...
Sep 12 23:01:52 test1 kernel: sata_via 0000:00:0f.0: PCI INT B disabled
Sep 12 23:01:52 test1 kernel: pciehp 0000:00:02.0:pcie04: pciehp_suspend ENTRY
Sep 12 23:01:52 test1 kernel: HDA Intel 0000:80:01.0: PCI INT A disabled
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Idling channels...
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Suspending GPU objects...
Sep 12 23:01:52 test1 kernel: sd 2:0:0:0: [sda] Synchronizing SCSI cache
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: And we're gone!
Sep 12 23:01:52 test1 kernel: PM: freeze of devices complete after 202.706 msecs
Sep 12 23:01:52 test1 kernel: PM: late freeze of devices complete after 0.678 msecs
Sep 12 23:01:52 test1 kernel: ACPI: Preparing to enter system sleep state S4
Sep 12 23:01:52 test1 restorecond: Read error (Interrupted system call)
Sep 12 23:01:52 test1 kernel: PM: Saving platform NVS memory
Sep 12 23:01:52 test1 kernel: Disabling non-boot CPUs ...
Sep 12 23:01:52 test1 kernel: CPU 1 is now offline
Sep 12 23:01:52 test1 kernel: PM: Creating hibernation image:
Sep 12 23:01:52 test1 kernel: PM: Need to copy 105168 pages
Sep 12 23:01:52 test1 kernel: PM: Restoring platform NVS memory
Sep 12 23:01:52 test1 kernel: Enabling non-boot CPUs ...
Sep 12 23:01:52 test1 kernel: Booting Node 0 Processor 1 APIC 0x1
Sep 12 23:01:52 test1 kernel: NMI watchdog enabled, takes one hw-pmu counter.
Sep 12 23:01:52 test1 kernel: Switched to NOHz mode on CPU #1
Sep 12 23:01:52 test1 kernel: CPU1 is up
Sep 12 23:01:52 test1 kernel: ACPI: Waking up from system sleep state S4
Sep 12 23:01:52 test1 kernel: PM: early restore of devices complete after 0.959 msecs
Sep 12 23:01:52 test1 kernel: pciehp 0000:00:02.0:pcie04: pciehp_resume ENTRY
Sep 12 23:01:52 test1 kernel: sata_via 0000:00:0f.0: PCI INT B -> GSI 21 (level, low) -> IRQ 21
Sep 12 23:01:52 test1 kernel: usb usb2: root hub lost power or was reset
Sep 12 23:01:52 test1 kernel: usb usb3: root hub lost power or was reset
Sep 12 23:01:52 test1 kernel: usb usb4: root hub lost power or was reset
Sep 12 23:01:52 test1 kernel: usb usb5: root hub lost power or was reset
Sep 12 23:01:52 test1 kernel: usb usb1: root hub lost power or was reset
Sep 12 23:01:52 test1 kernel: via-rhine 0000:00:12.0: eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: We're back, enabling device...
Sep 12 23:01:52 test1 kernel: agpgart-via 0000:00:00.0: AGP 3.5 bridge
Sep 12 23:01:52 test1 kernel: agpgart: kworker/u:7 tried to set rate=x12. Setting to AGP3 x8 mode.
Sep 12 23:01:52 test1 kernel: agpgart-via 0000:00:00.0: putting AGP V3 device into 8x mode
Sep 12 23:01:52 test1 kernel: nouveau 0000:01:00.0: putting AGP V3 device into 8x mode
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: POSTing device...
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 0 at offset 0xDFFC
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 1 at offset 0xE8EF
Sep 12 23:01:52 test1 kernel: HDA Intel 0000:80:01.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
Sep 12 23:01:52 test1 kernel: i8042 kbd 00:0a: wake-up capability disabled by ACPI
Sep 12 23:01:52 test1 kernel: sd 2:0:0:0: [sda] Starting disk
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 2 at offset 0xF310
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 3 at offset 0xF48B
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 4 at offset 0xF5DF
Sep 12 23:01:52 test1 kernel: agpgart-via 0000:00:00.0: AGP 3.5 bridge
Sep 12 23:01:52 test1 kernel: agpgart: kworker/u:7 tried to set rate=x12. Setting to AGP3 x8 mode.
Sep 12 23:01:52 test1 kernel: agpgart-via 0000:00:00.0: putting AGP V3 device into 8x mode
Sep 12 23:01:52 test1 kernel: nouveau 0000:01:00.0: putting AGP V3 device into 8x mode
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Restoring GPU objects...
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Reinitialising engines...
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Restoring mode...
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0184 data 0x00004001
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: 0xD3FB: Parsing digital output script table
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0188 data 0x00004000
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x030c data 0x030bb000
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0310 data 0x00040000
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0314 data 0x00001000
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0318 data 0x00001000
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x031c data 0x00001000
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0320 data 0x00000004
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0324 data 0x00000101
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0328 data 0x00000000
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0184 data 0x00004001
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0188 data 0x00004000
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x030c data 0x030bf000
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0310 data 0x00044000
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0314 data 0x00001000
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0318 data 0x00001000
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x031c data 0x00001000
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0320 data 0x00000004
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0324 data 0x00000101
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ERROR nsource: ILLEGAL_MTHD nstatus: PROTECTION_FAULT
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PGRAPH - ch 0 (0x00042000) subc 0 class 0x0000 mthd 0x0328 data 0x00000000
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on tmds encoder (output 1)
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on vga encoder (output 0)
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on vga encoder (output 2)
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Setting dpms mode 3 on TV encoder (output 3)
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: 0xD3FB: Parsing digital output script table
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Setting dpms mode 0 on tmds encoder (output 1)
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: Output DVI-I-1 is running on CRTC 0 using output C
Sep 12 23:01:52 test1 kernel: ata4.00: ACPI cmd ef/03:42:00:00:00:a0 (SET FEATURES) filtered out
Sep 12 23:01:52 test1 kernel: ata3.00: ACPI cmd ef/03:45:00:00:00:a0 (SET FEATURES) filtered out
Sep 12 23:01:52 test1 kernel: ata3.00: ACPI cmd ef/03:01:00:00:00:a0 (SET FEATURES) filtered out
Sep 12 23:01:52 test1 kernel: ata4.00: configured for UDMA/33
Sep 12 23:01:52 test1 kernel: ata3.00: configured for UDMA/100
Sep 12 23:01:52 test1 kernel: usb 1-2: reset high speed USB device number 2 using ehci_hcd
Sep 12 23:01:52 test1 kernel: usb 1-2.2: reset low speed USB device number 3 using ehci_hcd
Sep 12 23:01:52 test1 kernel: PM: restore of devices complete after 1168.258 msecs
Sep 12 23:01:52 test1 kernel: Restarting tasks ... done.
Sep 12 23:01:52 test1 kernel: [drm] nouveau 0000:01:00.0: PFIFO still angry after 101 spins, halt
Sep 12 23:01:55 test1 kernel: [drm] nouveau 0000:01:00.0: reloc wait_idle failed: -16
Sep 12 23:01:55 test1 kernel: [drm] nouveau 0000:01:00.0: reloc apply: -16
Sep 12 23:01:58 test1 kernel: [drm] nouveau 0000:01:00.0: reloc wait_idle failed: -16
Sep 12 23:01:58 test1 kernel: [drm] nouveau 0000:01:00.0: reloc apply: -16
Sep 12 23:02:01 test1 kernel: [drm] nouveau 0000:01:00.0: fail ttm_validate
Sep 12 23:02:01 test1 kernel: [drm] nouveau 0000:01:00.0: validate vram_list
Sep 12 23:02:01 test1 kernel: [drm] nouveau 0000:01:00.0: validate: -16
Sep 12 23:02:04 test1 kernel: [drm] nouveau 0000:01:00.0: fail ttm_validate
Sep 12 23:02:04 test1 kernel: [drm] nouveau 0000:01:00.0: validate vram_list
Sep 12 23:02:04 test1 kernel: [drm] nouveau 0000:01:00.0: validate: -16
Sep 12 23:02:07 test1 kernel: [drm] nouveau 0000:01:00.0: reloc wait_idle failed: -16
Sep 12 23:02:07 test1 kernel: [drm] nouveau 0000:01:00.0: reloc apply: -16
Sep 12 23:02:17 test1 kernel: [drm] nouveau 0000:01:00.0: GPU lockup - switching to software fbcon
Sep 12 23:02:18 test1 abrt[11306]: saved core dump of pid 1944 (/usr/bin/Xorg) to /var/spool/abrt/ccpp-1315864937-1944.new/coredump (29745152 bytes)
Sep 12 23:02:18 test1 abrtd: Directory 'ccpp-1315864937-1944' creation detected
Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: Failed to idle channel 1.
Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000019
Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000018
Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x8000001a
Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000013
Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000017
Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000015
Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000016
Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000011
Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000012
Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x8000001c
Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x8000001b
Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x80000010
Sep 12 23:02:21 test1 kernel: [drm] nouveau 0000:01:00.0: RAMHT entry not found. ch=1, handle=0x00000000

Comment 24 Ben Skeggs 2011-09-13 20:36:23 UTC
Are you certain you was running the updated kernel at this point?  Some of the errors in the kernel log would've been fixed by the kernel I pointed you at, I just double-checked to make sure the current f15 kernel still has the patches, and it should.

Comment 25 Mr-4 2011-09-14 10:56:23 UTC
I am running that kernel, though the nouveau driver has been compiled from the nouveau git dated 30 August 2011 (the current point of master as far as I can see from the git logs) as I was under the impression that this is newer than the kernel version, isn't that the case?

Comment 26 Ben Skeggs 2011-09-14 14:42:10 UTC
Yeah, that should have the fixes for sure..

Comment 27 Mr-4 2011-09-14 15:05:05 UTC
(In reply to comment #26)
> Yeah, that should have the fixes for sure..
Well, clearly, it does not fix that particular bug as evident by the syslogs I posted above, though, admittedly, this does not happen as frequent as before - it is the first instance I am getting the above errors after about 20+ hibernate/restore cycles, compared to getting it every time with the previous revision of the nouveau driver.

Comment 28 Mr-4 2011-10-31 14:46:00 UTC
I am inclined to close this bug as I haven't had this (or any other nouveau) error for over a month now - the nouveau driver in the 2.6.40-6 (3.0.6) kernel seems to be very stable. 

Hibernate/restore works every time and although I get the occasional data corruption when I reboot - as oppose to hibernate again (the kernel-implemented hibernate is not 100% there yet, unfortunately), my system - and nouveau in particular - seems very stable.

Comment 29 Josh Boyer 2011-10-31 20:12:25 UTC
Thank you for letting us know.

Comment 30 Mr-4 2013-02-25 22:58:44 UTC
Please refer to https://bugs.freedesktop.org/show_bug.cgi?id=50121 for the full history of this.

The above bug was finally fixed in 3.7.4 (with kernel versions before 3.7.4 nouveau was crashing, albeit infrequently; with 3.7.4 it was absolutely rock-solid - never had any crashes with over a 100 hibernate/resume cycles completed), but since upgrading to 3.7.9 the nightmare has returned!

As there are no noticeable changes to the nouveau kernel driver in the "normal" kernel tree, I am wondering whether there is something done on the Fedora-flavour side of the kernel, hence placing this comment here to see if that is the case.