Bug 847479

Summary: Kernel 3.5.0-2 and 3.5.1-1 nouveau driver causes Xorg to abrt after resume from RAM
Product: [Fedora] Fedora Reporter: Mark Frey <mark_frey_2000>
Component: xorg-x11-drv-nouveauAssignee: Ben Skeggs <bskeggs>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 17CC: airlied, ajax, bskeggs, cs, gansalmon, itamar, jglotzer, john.ronciak, jonathan, kernel-maint, kparal, Langenbach.Lutz, madhu.chinakonda, mpdimitroff, niki.waibel, redhat, reklov
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: RejectedNTH
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-02 13:18:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Xorg backtrace
none
Xorg backtrace none

Description Mark Frey 2012-08-11 22:33:28 UTC
Description of problem:

After resuming from suspend to RAM, nouveau emits error messages and Xorg server aborts.  Boot into kernel 3.4.6-2 instead and the system resumes correctly after suspends.  Kernels 3.5.0-2 and 3.5.1-1 result in Xorg abort, all else being the same. This is a dual-monitor setup, VGA and DVI from a single nVidia card.

Version-Release number of selected component (if applicable):

Kernel 3.5.0-2.fc17.x86_64, 3.5.1-1.fc17.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Suspend to RAM
2. Press any key
3. System wakes
  
Actual results:

drm / nouveau driver issues error messages after resume.
X server crashes.  Server respawns a kdm login but logging in results in a similar crash. Reboot seems necessary to clear the fault.

Expected results:

Normal resumption

Additional info:

Snippet of lspci -v:
04:00.0 VGA compatible controller: nVidia Corporation G71 [GeForce 7300 GS] (rev a1) (prog-if 00 [VGA controller])
        Subsystem: XFX Pine Group Inc. Device 2200
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at cf000000 (32-bit, non-prefetchable) [size=16M]
        Memory at d0000000 (64-bit, prefetchable) [size=256M]
        Memory at ce000000 (64-bit, non-prefetchable) [size=16M]
        Expansion ROM at cdfe0000 [disabled] [size=128K]
        Capabilities: [60] Power Management version 2
        Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [128] Power Budgeting <?>
        Kernel driver in use: nouveau


Relevant syslog entries after resume:
Aug 11 13:58:58 linux kernel: [  104.155972] [drm] nouveau 0000:04:00.0: We're back, enabling device...
Aug 11 13:58:58 linux kernel: [  104.155985] [drm] nouveau 0000:04:00.0: POSTing device...
Aug 11 13:58:58 linux kernel: [  104.155988] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 0 at offset 0xE0F0
Aug 11 13:58:58 linux kernel: [  104.156102] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 1 at offset 0xE41F
Aug 11 13:58:58 linux kernel: [  104.171379] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 2 at offset 0xE9BD
Aug 11 13:58:58 linux kernel: [  104.171432] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 3 at offset 0xEB38
Aug 11 13:58:58 linux kernel: [  104.172568] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 4 at offset 0xED91
Aug 11 13:58:58 linux kernel: [  104.172569] [drm] nouveau 0000:04:00.0: Restoring GPU objects...
Aug 11 13:58:58 linux kernel: [  104.189255] [drm] nouveau 0000:04:00.0: Reinitialising engines...
Aug 11 13:58:58 linux kernel: [  104.189345] [drm] nouveau 0000:04:00.0: Restoring mode...
Aug 11 13:58:58 linux kernel: [  104.200325] [drm] nouveau 0000:04:00.0: 0xD4A7: Parsing digital output script table
Aug 11 13:58:58 linux kernel: [  104.251273] [drm] nouveau 0000:04:00.0: Setting dpms mode 3 on vga encoder (output 0)
Aug 11 13:58:58 linux kernel: [  104.302018] [drm] nouveau 0000:04:00.0: Setting dpms mode 3 on vga encoder (output 1)
Aug 11 13:58:58 linux kernel: [  104.302021] [drm] nouveau 0000:04:00.0: Setting dpms mode 3 on TV encoder (output 3)
Aug 11 13:58:58 linux kernel: [  104.322678] [drm] nouveau 0000:04:00.0: Setting dpms mode 0 on vga encoder (output 0)
Aug 11 13:58:58 linux kernel: [  104.322680] [drm] nouveau 0000:04:00.0: Output VGA-1 is running on CRTC 0 using output A
Aug 11 13:58:58 linux kernel: [  104.322686] [drm] nouveau 0000:04:00.0: Setting dpms mode 3 on tmds encoder (output 2)
Aug 11 13:58:58 linux kernel: [  104.343330] [drm] nouveau 0000:04:00.0: 0xD4A7: Parsing digital output script table
Aug 11 13:58:58 linux kernel: [  104.393359] [drm] nouveau 0000:04:00.0: Setting dpms mode 0 on tmds encoder (output 2)
Aug 11 13:58:58 linux kernel: [  104.393361] [drm] nouveau 0000:04:00.0: Output DVI-I-1 is running on CRTC 1 using output A
Aug 11 13:59:01 linux kernel: [  114.341030] [drm] nouveau 0000:04:00.0: reloc wait_idle failed: -16
Aug 11 13:59:01 linux kernel: [  114.341037] [drm] nouveau 0000:04:00.0: reloc apply: -16
Aug 11 13:59:04 linux kernel: [  117.542020] [drm] nouveau 0000:04:00.0: reloc wait_idle failed: -16
Aug 11 13:59:04 linux kernel: [  117.542027] [drm] nouveau 0000:04:00.0: reloc apply: -16
Aug 11 13:59:07 linux kernel: [  120.542017] [drm] nouveau 0000:04:00.0: reloc wait_idle failed: -16
Aug 11 13:59:07 linux kernel: [  120.542024] [drm] nouveau 0000:04:00.0: reloc apply: -16
Aug 11 13:59:11 linux abrtd: Directory 'ccpp-2012-08-11-13:59:10-740' creation detected
Aug 11 13:59:11 linux abrt[2415]: Saved core dump of pid 740 (/usr/bin/Xorg) to /var/spool/abrt/ccpp-2012-08-11-13:59:10-740 (44433408 bytes)
Aug 11 13:59:14 linux abrtd: New problem directory /var/spool/abrt/ccpp-2012-08-11-13:59:10-740, processing
Aug 11 13:59:14 linux kernel: [  127.718044] [drm] nouveau 0000:04:00.0: Failed to idle channel 1.
Aug 11 13:59:14 linux kdm[671]: X server for display :0 terminated unexpectedly
Aug 11 13:59:14 linux systemd-logind[626]: Removed session 1.
Aug 11 13:59:20 linux systemd-logind[626]: New session 2 of user markfrey.
Aug 11 13:59:28 linux kernel: [  141.581837] [drm] nouveau 0000:04:00.0: fail pre-validate sync
Aug 11 13:59:28 linux kernel: [  141.581844] [drm] nouveau 0000:04:00.0: validate both_list
Aug 11 13:59:28 linux kernel: [  141.581856] [drm] nouveau 0000:04:00.0: validate: -16
Aug 11 13:59:31 linux abrtd: Directory 'ccpp-2012-08-11-13:59:31-2446' creation detected
Aug 11 13:59:31 linux abrt[2752]: Saved core dump of pid 2446 (/usr/bin/Xorg) to /var/spool/abrt/ccpp-2012-08-11-13:59:31-2446 (11567104 bytes)
Aug 11 13:59:32 linux abrtd: Duplicate: core backtrace
Aug 11 13:59:32 linux abrtd: DUP_OF_DIR: /var/spool/abrt/ccpp-2012-08-11-13:59:10-740
Aug 11 13:59:32 linux abrtd: Problem directory is a duplicate of /var/spool/abrt/ccpp-2012-08-11-13:59:10-740
Aug 11 13:59:32 linux abrtd: Deleting problem directory ccpp-2012-08-11-13:59:31-2446 (dup of ccpp-2012-08-11-13:59:10-740
Aug 11 13:59:34 linux kernel: [  147.963025] [drm] nouveau 0000:04:00.0: Failed to idle channel 1.
Aug 11 13:59:34 linux kdm[671]: X server for display :0 terminated unexpectedly
Aug 11 13:59:34 linux systemd-logind[626]: Removed session 2.
Aug 11 13:59:37 linux kernel: [  151.059033] [drm] nouveau 0000:04:00.0: Failed to idle channel 3.

Comment 1 Volker Sobek 2012-08-15 00:49:41 UTC
I see X crashing some time after wake up from suspend, too, same kernel version.

$ lspci -v -s 01:00
01:00.0 VGA compatible controller: nVidia Corporation G84M [Quadro NVS 140M] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Lenovo ThinkPad T61
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at d6000000 (32-bit, non-prefetchable) [size=16M]
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Memory at d4000000 (64-bit, non-prefetchable) [size=32M]
	I/O ports at 2000 [size=128]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: <access denied>
	Kernel driver in use: nouveau


Aug 14 22:04:56 tp kernel: [    1.409922] nouveau 0000:01:00.0: power state changed by ACPI to D0
Aug 14 22:04:56 tp kernel: [    1.409929] nouveau 0000:01:00.0: power state changed by ACPI to D0
Aug 14 22:04:56 tp kernel: [    1.410423] [drm] nouveau 0000:01:00.0: Detected an NV50 generation card (0x086900a2)
Aug 14 22:04:56 tp kernel: [    1.411535] fb: conflicting fb hw usage nouveaufb vs VESA VGA - removing generic driver
Aug 14 22:04:56 tp kernel: [    1.416123] [drm] nouveau 0000:01:00.0: Checking PRAMIN for VBIOS
Aug 14 22:04:56 tp kernel: [    1.477948] [drm] nouveau 0000:01:00.0: ... appears to be valid
Aug 14 22:04:56 tp kernel: [    1.477950] [drm] nouveau 0000:01:00.0: Using VBIOS from PRAMIN
Aug 14 22:04:56 tp kernel: [    1.477953] [drm] nouveau 0000:01:00.0: BIT BIOS found
Aug 14 22:04:56 tp kernel: [    1.477956] [drm] nouveau 0000:01:00.0: Bios version 60.86.3e.00
Aug 14 22:04:56 tp kernel: [    1.477959] [drm] nouveau 0000:01:00.0: TMDS table version 2.0
Aug 14 22:04:56 tp kernel: [    1.478099] [drm] nouveau 0000:01:00.0: MXM: no VBIOS data, nothing to do
Aug 14 22:04:56 tp kernel: [    1.478103] [drm] nouveau 0000:01:00.0: DCB version 4.0
Aug 14 22:04:56 tp kernel: [    1.478106] [drm] nouveau 0000:01:00.0: DCB outp 00: 01000323 00010034
Aug 14 22:04:56 tp kernel: [    1.478108] [drm] nouveau 0000:01:00.0: DCB outp 01: 02811300 00000028
Aug 14 22:04:56 tp kernel: [    1.478110] [drm] nouveau 0000:01:00.0: DCB outp 02: 02822312 00010030
Aug 14 22:04:56 tp kernel: [    1.478112] [drm] nouveau 0000:01:00.0: DCB outp 03: 014333f1 0080c080
Aug 14 22:04:56 tp kernel: [    1.478114] [drm] nouveau 0000:01:00.0: DCB conn 00: 0040
Aug 14 22:04:56 tp kernel: [    1.478117] [drm] nouveau 0000:01:00.0: DCB conn 01: 0100
Aug 14 22:04:56 tp kernel: [    1.478119] [drm] nouveau 0000:01:00.0: DCB conn 02: 1231
Aug 14 22:04:56 tp kernel: [    1.478121] [drm] nouveau 0000:01:00.0: DCB conn 03: 0311
Aug 14 22:04:56 tp kernel: [    1.478126] [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 0 at offset 0xDD0F
Aug 14 22:04:56 tp kernel: [    1.501321] [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 1 at offset 0xE04F
Aug 14 22:04:56 tp kernel: [    1.518159] [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 2 at offset 0xEAA4
Aug 14 22:04:56 tp kernel: [    1.518166] [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 3 at offset 0xEB96
Aug 14 22:04:56 tp kernel: [    1.519234] [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 4 at offset 0xED83
Aug 14 22:04:56 tp kernel: [    1.519236] [drm] nouveau 0000:01:00.0: Parsing VBIOS init table at offset 0xEDE8
Aug 14 22:04:56 tp kernel: [    1.539243] [drm] nouveau 0000:01:00.0: 0xEDE8: Condition still not met after 20ms, skipping following opcodes
Aug 14 22:04:56 tp kernel: [    1.542614] [drm] nouveau 0000:01:00.0: Detected 128MiB VRAM (GDDR3)
Aug 14 22:04:56 tp kernel: [    1.543759] [drm] nouveau 0000:01:00.0: 512 MiB GART (aperture)
Aug 14 22:04:56 tp kernel: [    1.580268] [drm] nouveau 0000:01:00.0: DCB encoder 1 unknown
Aug 14 22:04:56 tp kernel: [    1.580272] [drm] nouveau 0000:01:00.0: TV-1 has no encoders, removing
Aug 14 22:04:56 tp kernel: [    1.581255] [drm] nouveau 0000:01:00.0: ACPI backlight interface available, not registering our own
Aug 14 22:04:56 tp kernel: [    1.586298] [drm] nouveau 0000:01:00.0: 3 available performance level(s)
Aug 14 22:04:56 tp kernel: [    1.586303] [drm] nouveau 0000:01:00.0: 0: core 169MHz shader 338MHz memory 100MHz voltage 1150mV fanspeed 100%
Aug 14 22:04:56 tp kernel: [    1.586306] [drm] nouveau 0000:01:00.0: 1: core 275MHz shader 550MHz memory 301MHz voltage 1150mV fanspeed 100%
Aug 14 22:04:56 tp kernel: [    1.586310] [drm] nouveau 0000:01:00.0: 2: core 400MHz shader 800MHz memory 600MHz voltage 1200mV fanspeed 100%
Aug 14 22:04:56 tp kernel: [    1.586313] [drm] nouveau 0000:01:00.0: c: core 275MHz shader 550MHz memory 302MHz voltage 1150mV
Aug 14 22:04:56 tp kernel: [    1.604893] [drm] nouveau 0000:01:00.0: MM: using CRYPT for buffer copies
Aug 14 22:04:56 tp kernel: [    2.022275] [drm] nouveau 0000:01:00.0: allocated 1680x1050 fb: 0x2b0000, bo ffff880036d57400
Aug 14 22:04:56 tp kernel: [    2.022383] fbcon: nouveaufb (fb0) is primary device
Aug 14 22:04:56 tp kernel: [    2.058825] fb0: nouveaufb frame buffer device
Aug 14 22:04:56 tp kernel: [    2.058832] [drm] Initialized nouveau 1.0.0 20120316 for 0000:01:00.0 on minor 0
Aug 15 01:46:14 tp kernel: [13224.795225] [drm] nouveau 0000:01:00.0: Disabling display...
Aug 15 01:46:14 tp kernel: [13226.101488] [drm] nouveau 0000:01:00.0: Disabling fbcon...
Aug 15 01:46:14 tp kernel: [13226.101504] [drm] nouveau 0000:01:00.0: Unpinning framebuffer(s)...
Aug 15 01:46:14 tp kernel: [13226.101560] [drm] nouveau 0000:01:00.0: Evicting buffers...
Aug 15 01:46:14 tp kernel: [13226.271958] [drm] nouveau 0000:01:00.0: Idling channels...
Aug 15 01:46:14 tp kernel: [13226.273222] [drm] nouveau 0000:01:00.0: Suspending GPU objects...
Aug 15 01:46:14 tp kernel: [13231.454058] [drm] nouveau 0000:01:00.0: And we're gone!
Aug 15 01:46:14 tp kernel: [13231.465071] nouveau 0000:01:00.0: power state changed by ACPI to D3
Aug 15 01:46:14 tp kernel: [13231.730413] [drm] nouveau 0000:01:00.0: We're back, enabling device...
Aug 15 01:46:14 tp kernel: [13231.730425] nouveau 0000:01:00.0: power state changed by ACPI to D0
Aug 15 01:46:14 tp kernel: [13231.730429] nouveau 0000:01:00.0: power state changed by ACPI to D0
Aug 15 01:46:14 tp kernel: [13231.730434] nouveau 0000:01:00.0: power state changed by ACPI to D0
Aug 15 01:46:14 tp kernel: [13231.730437] nouveau 0000:01:00.0: power state changed by ACPI to D0
Aug 15 01:46:14 tp kernel: [13231.730444] [drm] nouveau 0000:01:00.0: POSTing device...
Aug 15 01:46:14 tp kernel: [13231.730446] [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 0 at offset 0xDD0F
Aug 15 01:46:14 tp kernel: [13231.753857] [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 1 at offset 0xE04F
Aug 15 01:46:14 tp kernel: [13231.779650] [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 2 at offset 0xEAA4
Aug 15 01:46:14 tp kernel: [13231.779678] [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 3 at offset 0xEB96
Aug 15 01:46:14 tp kernel: [13231.780767] [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 4 at offset 0xED83
Aug 15 01:46:14 tp kernel: [13231.780768] [drm] nouveau 0000:01:00.0: Parsing VBIOS init table at offset 0xEDE8
Aug 15 01:46:14 tp kernel: [13231.800781] [drm] nouveau 0000:01:00.0: Restoring GPU objects...
Aug 15 01:46:14 tp kernel: [13232.028965] [drm] nouveau 0000:01:00.0: Reinitialising engines...
Aug 15 01:46:14 tp kernel: [13232.029645] [drm] nouveau 0000:01:00.0: Restoring mode...
Aug 15 01:58:06 tp kernel: [13946.180009] [drm] nouveau 0000:01:00.0: PFIFO: channel 6 unload timeout
Aug 15 01:58:29 tp kernel: [13969.211018] [drm] nouveau 0000:01:00.0: Failed to idle channel 2.
Aug 15 01:58:29 tp kernel: [13969.212007] [drm] nouveau 0000:01:00.0: PFIFO: channel 2 unload timeout
Aug 15 01:58:34 tp kernel: [13974.219021] [drm] nouveau 0000:01:00.0: Failed to idle channel 3.
Aug 15 01:58:34 tp kernel: [13974.220015] [drm] nouveau 0000:01:00.0: PFIFO: channel 3 unload timeout
Aug 15 01:58:39 tp kernel: [13979.537021] [drm] nouveau 0000:01:00.0: Failed to idle channel 5.
Aug 15 01:58:39 tp kernel: [13979.538015] [drm] nouveau 0000:01:00.0: PFIFO: channel 5 unload timeout
Aug 15 01:58:44 tp kernel: [13984.545018] [drm] nouveau 0000:01:00.0: Failed to idle channel 4.
Aug 15 01:58:44 tp kernel: [13984.546007] [drm] nouveau 0000:01:00.0: PFIFO: channel 4 unload timeout
Aug 15 01:58:55 tp kernel: [13995.124015] [drm] nouveau 0000:01:00.0: Failed to idle channel 2.
Aug 15 01:58:55 tp kernel: [13995.125006] [drm] nouveau 0000:01:00.0: PFIFO: channel 2 unload timeout
Aug 15 01:59:00 tp kernel: [14000.130020] [drm] nouveau 0000:01:00.0: Failed to idle channel 3.
Aug 15 01:59:00 tp kernel: [14000.131007] [drm] nouveau 0000:01:00.0: PFIFO: channel 3 unload timeout
Aug 15 01:59:12 tp kernel: [14012.055021] [drm] nouveau 0000:01:00.0: Failed to idle channel 2.
Aug 15 01:59:12 tp kernel: [14012.056016] [drm] nouveau 0000:01:00.0: PFIFO: channel 2 unload timeout
Aug 15 01:59:17 tp kernel: [14017.062081] [drm] nouveau 0000:01:00.0: Failed to idle channel 3.
Aug 15 01:59:17 tp kernel: [14017.063049] [drm] nouveau 0000:01:00.0: PFIFO: channel 3 unload timeout
Aug 15 01:59:29 tp kernel: [14029.364018] [drm] nouveau 0000:01:00.0: Failed to idle channel 2.
Aug 15 01:59:29 tp kernel: [14029.365009] [drm] nouveau 0000:01:00.0: PFIFO: channel 2 unload timeout
Aug 15 01:59:34 tp kernel: [14034.377035] [drm] nouveau 0000:01:00.0: Failed to idle channel 3.
Aug 15 01:59:34 tp kernel: [14034.378014] [drm] nouveau 0000:01:00.0: PFIFO: channel 3 unload timeout
Aug 15 01:59:46 tp kernel: [14046.314040] [drm] nouveau 0000:01:00.0: Failed to idle channel 2.
Aug 15 01:59:46 tp kernel: [14046.315005] [drm] nouveau 0000:01:00.0: PFIFO: channel 2 unload timeout
Aug 15 01:59:51 tp kernel: [14051.325021] [drm] nouveau 0000:01:00.0: Failed to idle channel 3.
Aug 15 01:59:51 tp kernel: [14051.326016] [drm] nouveau 0000:01:00.0: PFIFO: channel 3 unload timeout
Aug 15 02:00:04 tp kernel: [14063.590041] [drm] nouveau 0000:01:00.0: Failed to idle channel 2.
Aug 15 02:00:04 tp kernel: [14063.591005] [drm] nouveau 0000:01:00.0: PFIFO: channel 2 unload timeout

...

Comment 2 Volker Sobek 2012-08-15 01:05:54 UTC
More details running lspci as root:

# lspci -v -s 01:00
01:00.0 VGA compatible controller: nVidia Corporation G84M [Quadro NVS 140M] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: Lenovo ThinkPad T61
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at d6000000 (32-bit, non-prefetchable) [size=16M]
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Memory at d4000000 (64-bit, non-prefetchable) [size=32M]
	I/O ports at 2000 [size=128]
	Expansion ROM at <unassigned> [disabled]
	Capabilities: [60] Power Management version 2
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Capabilities: [100] Virtual Channel
	Capabilities: [128] Power Budgeting <?>
	Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
	Kernel driver in use: nouveau

Comment 3 Volker Sobek 2012-08-15 09:32:51 UTC
Created attachment 604561 [details]
Xorg backtrace

Comment 4 Volker Sobek 2012-08-15 09:35:15 UTC
Created attachment 604563 [details]
Xorg backtrace

Comment 5 Claus Stefer 2012-08-19 21:46:30 UTC
Same problem here on a Lenovo T61 7663-E53 (Nvidia Quadro NVS 140M Graphics Card): Kernel 3.5.0 and 3.5.1 (F17, Gnome) can't resume properly after suspend. At first everything seems normal for a few seconds but then the problem appears: Touchpad and Trackpoint don' respond any more. Then the Screen flashes, shows "Loading initial ramdisk..." and contiues flashing. The following messages appear (x=increasing numbers like 295.842020]:
[ xxx.xxxxxxxx] [drm] nouveau 0000:01:00.0: Failed to idle channel 3.
[ xxx.xxxxxxxx] [drm] nouveau 0000:01:00.0: Failed to idle channel 2.
[ xxx.xxxxxxxx] [drm] nouveau 0000:01:00.0: Failed to idle channel 3.
[ xxx.xxxxxxxx] [drm] nouveau 0000:01:00.0: Failed to idle channel 2.
and so on.
After displaying "...channel 3" the screen flashes again and shows a completly scattered gdm login (it sorta looks like as if the parts of the screen have been moved like a rubik cube: sometimes the original upper upper part is displayed in the lower part and vice versa, sometimes the screen seems to be quatered and the quaters appear in wrong places, sometimes funky b/w-figures appear). This behaviour continues until I restart. This is possible ba switching to a console and entering reboot but that way is hard since the flashing of the screen contiues and the keyboard is still very, very laggy.
This problem appered with the 3.5.0-kernel, it doesn't apper when using an older one.

Since I am a long time Linux user but no computer scientist or programmer I don't know which log (or what else?) to enclose to help clearifying and solving this problem but I am of course willing to help as much as I can.

Best regards,
CS

Comment 6 Niki W. Waibel 2012-08-21 22:09:34 UTC
same here:
Linux localhost.localdomain 3.5.2-1.fc17.x86_64 #1 SMP Wed Aug 15 16:09:27 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

Comment 7 Volker Sobek 2012-08-22 08:46:59 UTC
Upstream bug: https://bugs.freedesktop.org/show_bug.cgi?id=53535

Comment 8 Claus Stefer 2012-09-13 06:40:51 UTC
The problem still exists using kernel 3.5.3-1 but shows up in a different way: The first suspend-resume-cycle works fine but after suspending a second time and trying to wake the computer the screen remains black. It's not just darkened, it just isn't switched on. Same problem in my Lenovo m58p with Intel GMA 4500, first susoend-resume-cycle's fine but when waking up a second time the screen (DVI-connected) seems not to be recognized. The status-LED which should be green reamins orange as if the computer wasn't switched on (but it, of course, is).

Comment 9 Marek Zukal 2012-10-03 09:11:31 UTC
Happens randomly during normal usage and with a greater probability during waking up. My card is G86 [GeForce 8400M GS]

Comment 10 Aioanei Rares 2012-10-06 07:41:51 UTC
Fedora 17, x86_64, kernel 3.5.4-2, G86 [GeForce 8400M GS] - still happening. I found a commit that allegedly takes care of this: http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?h=drm-nouveau-fixes&id=2064db725cc6d4ea19a24c138bc37939b63e3ae6

Comment 11 John Glotzer 2012-10-11 13:01:41 UTC
I don't think that above referenced commit fixes my resume crash because I see this issue with 3.5.6-1.fc17.x86_64 and I downloaded the kernel source and this patch is present in the source.

Comment 12 John Glotzer 2012-10-11 13:45:23 UTC
Should add my video HW:
01:00.0 VGA compatible controller: nVidia Corporation GF108 [Quadro 1000M] (rev a1)

Comment 13 John Glotzer 2012-10-12 00:18:39 UTC
One more wrinkle - in my build tree there is source under linux-3.5.6-1.fc17.x86_64 and there is source under vanilla-3.5. 

One has the proposed fix (linux-3.5.6-1.fc17.x86_64) and one does not (vanilla-3.5). 

If I had to guess I'd guess that linux-3.5.6-1.fc17.x86_64 is the one preferentially used but I guess I'm not 100% sure of that. I suppose maybe
I should patch the changes into the vanilla subtree and build with that
and see what happens.

diff /home/me/rpmbuild/BUILD/kernel-3.5.fc17/linux-3.5.6-1.fc17.x86_64/drivers/gpu/drm/nouveau/nv84_fifo.c /home/me/rpmbuild/BUILD/kernel-3.5.fc17/vanilla-3.5/drivers/gpu/drm/nouveau/nv84_fifo.c
120d119
<       u32 save;
127,128d125
<       save = nv_mask(dev, 0x002520, 0x0000003f, 0x15);
< 
134,135d130
<       nv_wr32(dev, 0x002520, save);
< 
192d186
<       u32 save;
197,198d190
<       save = nv_mask(dev, 0x002520, 0x0000003f, 0x15);
< 
210d201
<       nv_wr32(dev, 0x002520, save);

Comment 14 John Glotzer 2012-10-12 13:02:02 UTC
Downgraded to 

Linux 3.3.4-5.fc17.x86_64 #1 SMP Mon May 7 17:29:34 

From (most recently) 3.5.6-1.fc17.x86_64

and now suspend/resume works fine.

Comment 15 Claus Stefer 2012-10-12 13:10:22 UTC
The same here: When using a 3.4.X-Kernel everything works as intended.

Comment 16 John Glotzer 2012-10-12 14:20:40 UTC
Looks like a job for git-bisect.

Comment 17 Claus Stefer 2012-10-12 20:38:32 UTC
The Problem still exists in Kernel 3.6.1-1 on this machine: http://www.smolts.org/client/show/pub_d27be84b-a35e-461f-9c6c-eb284fdeb26b

Comment 18 John Glotzer 2012-10-13 14:20:02 UTC
Same here with this machine - crash still exists:
http://www.smolts.org/client/show/pub_248e4572-7c6c-4a85-984b-a224a36c6cc4

Comment 19 John Glotzer 2012-10-13 21:25:57 UTC
I built a kernel based on 3.5.6-1.fc17.x86_64 which had 

--------------------------------------------------------------------------
http://permalink.gmane.org/gmane.linux.redhat.fedora.extras.cvs/823735
[kernel/f16] Add patch to fix cpu pinning after suspend/resume (rhbz	714271)

commit 0e04e7c4764f828f62c0b6d61f1b0cb7e862e0a7
Author: Josh Boyer <jwboyer <at> redhat.com>
Date:   Tue Jul 24 12:59:03 2012 -0400

    Add patch to fix cpu pinning after suspend/resume (rhbz 714271)

fix backed out. 
--------------------------------------------------------------------------

The behavior is still not normal on resume but it seems improved to me.

On resume the screen is sort of a gray tiled mosaic with very fine granularity.

However unlike other crashes I can easily get a TTY, easily login, and easily
run startx. Basically I can get back to a working system. 

On logging in I get an ABRT message that Xorg crashed. 

Gone from the TTY are all the 

Aug 15 01:58:29 tp kernel: [13969.211018] [drm] nouveau 0000:01:00.0: Failed to idle channel 2.
Aug 15 01:58:29 tp kernel: [13969.212007] [drm] nouveau 0000:01:00.0: PFIFO: channel 2 unload timeout
Aug 15 01:58:34 tp kernel: [13974.219021] [drm] nouveau 0000:01:00.0: Failed to idle channel 3.
Aug 15 01:58:34 tp kernel: [13974.220015] [drm] nouveau 0000:01:00.0: PFIFO: channel 3 unload timeout
Aug 15 01:58:39 tp kernel: [13979.537021] [drm] nouveau 0000:01:00.0: Failed to idle channel 5.
Aug 15 01:58:39 tp kernel: [13979.538015] [drm] nouveau 0000:01:00.0: PFIFO: channel 5 unload timeout
Aug 15 01:58:44 tp kernel: [13984.545018] [drm] nouveau 0000:01:00.0: Failed to idle channel 4.

messages that one used to see. The behavior is not perfect, but to me it is
demonstrably better. Before on these crashes mostly a power cycle would work. 
I think you could sometimes log in in between the noveau driver messages but
not really do anything once logged in.

Comment 20 Claus Stefer 2012-10-24 15:13:23 UTC
Problem seems to be solved on my Lenovo T61 (Nvidia Quadro NVS 140M Graphics Card): Both 3.6 kernels so far released for fedora resume now as intended.

Comment 21 John Glotzer 2012-10-27 15:23:54 UTC
For me with 3.6.2-4.fc17.x86_64 the issue is better but still not resolved, as compared with 3.3.4-5.fc17.x86_64. This is on a Lenovo W520 using Nvidia hardware and nouveau driver.

With 3.6.2-4.fc17.x86_64 on resume from suspend I am greeted by the familiar speckled white screen (I'm guessing it's "white noise" in the video frame buffer).

However - and this is a change from before. There is a rectangular box in the middle of the screen that corresponds to the Gnome3 login prompt but it is barely recognizable as such. There is also the faint outline of a vertical cursor. I can basically type in my password, hit return and the rectangle goes away. At this point I'm presumably logged in (I could have confirmed this with a tty but didn't do so). 

I can then easily obtain a tty and there are none of the previous messages scrolling by (like failed to idle channel 1). 

I can then sudo killall Xorg and then I get a normal login prompt and a normal X display and, in fact, I'm typing this report from such a session.

So my conclusion is that the kernel is functioning fine after such a resume but Xorg is in a bad state - that basically Xorg needs to be kicked after the resume. 

I think prior to the 3.6 release probably there were video driver/kernel issues. Now I think it's just an Xorg problem. There might be a clean workaround at this point (other then get a TTY and killall xorg) but I haven't found it yet.

Comment 22 John Glotzer 2012-10-27 16:03:54 UTC
Addendum: Checked some of the logs after the last experiment and basically in /var/log/gdm there were a number of log files like :0.log.1, :0.log.1, etc. etc.
In some of these log files there were errors like this:

(EE) NOUVEAU(0): failed to set mode: Device or resource busy

So I basically decided to install nvidia drivers with method outlined here:
http://www.if-not-true-then-false.com/2012/fedora-17-nvidia-guide/

Once I did this (for the 3.6.2-4.fc17.x86_64 kernel) suspend/resume worked fine for me. 

The only three differences I noted:

1. The plymouth splash screen looked a bit funky with nvidia - the Fedora logo that fills in as the kernel booted was of a different size and shape.
2. I had to turn down my screen brightness once the laptop was up - maybe the default was set to 100% and I had to turn that down.
3. I had password authentication turned off for suspend/resume but with the nvidia driver I had to supply a password.

Packages Altered:
    Install     akmod-nvidia-1:304.51-1.fc17.x86_64             @rpmfusion-nonfree-updates
    Dep-Install akmods-0.4.0-4.fc17.noarch                      @rpmfusion-free
    Dep-Install kmodtool-1-21.fc17.noarch                       @rpmfusion-free-updates
    Dep-Install nvidia-settings-1.0-21.fc17.x86_64              @rpmfusion-nonfree-updates
    Dep-Install nvidia-xconfig-1.0-19.fc17.x86_64               @rpmfusion-nonfree-updates
    Dep-Install xorg-x11-drv-nvidia-1:304.51-1.fc17.x86_64      @rpmfusion-nonfree-updates
    Install     xorg-x11-drv-nvidia-libs-1:304.51-1.fc17.x86_64 @rpmfusion-nonfree-updates

# lsmod | grep nvidia
nvidia              11262717  53 
i2c_core               38314  3 i2c_i801,nvidia,videodev
# lsmod | grep nouveau
#

Comment 23 Claus Stefer 2012-11-06 13:19:56 UTC
I have to recall: Just now this error reappeared on my Lenovo T61 (Nvidia Quadro NVS 140M Graphics Card) with Kernel 3.6.3-1.fc17.i686, but since it was the first time for a longer time I can't figure out any regularities at the moment.

Comment 24 Mark Frey 2012-11-09 00:14:47 UTC
Still broken on kernel-3.6.5-1.fc17.x86_64 - still crashing X and outputting "Failed to idle channel" messages:

Nov  8 19:04:08 linux kernel: [  129.868806] [drm] nouveau 0000:04:00.0: We're back, enabling device...
Nov  8 19:04:08 linux kernel: [  129.868815] [drm] nouveau 0000:04:00.0: POSTing device...
Nov  8 19:04:08 linux kernel: [  129.868818] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 0 at offset 0xE0F0
Nov  8 19:04:08 linux kernel: [  129.868914] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 1 at offset 0xE41F
Nov  8 19:04:08 linux kernel: [  129.884190] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 2 at offset 0xE9BD
Nov  8 19:04:08 linux kernel: [  129.884244] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 3 at offset 0xEB38
Nov  8 19:04:08 linux kernel: [  129.885379] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 4 at offset 0xED91
Nov  8 19:04:08 linux kernel: [  129.885381] [drm] nouveau 0000:04:00.0: Restoring GPU objects...
Nov  8 19:04:08 linux kernel: [  129.901990] [drm] nouveau 0000:04:00.0: Reinitialising engines...
Nov  8 19:04:08 linux kernel: [  129.902093] [drm] nouveau 0000:04:00.0: Restoring mode...
Nov  8 19:04:08 linux kernel: [  129.913074] [drm] nouveau 0000:04:00.0: 0xD4A7: Parsing digital output script table
Nov  8 19:04:08 linux kernel: [  129.963968] [drm] nouveau 0000:04:00.0: Setting dpms mode 3 on vga encoder (output 0)
Nov  8 19:04:08 linux kernel: [  130.014021] [drm] nouveau 0000:04:00.0: Setting dpms mode 3 on vga encoder (output 1)
Nov  8 19:04:08 linux kernel: [  130.014023] [drm] nouveau 0000:04:00.0: Setting dpms mode 3 on TV encoder (output 3)
Nov  8 19:04:08 linux kernel: [  130.034645] [drm] nouveau 0000:04:00.0: Setting dpms mode 0 on vga encoder (output 0)
Nov  8 19:04:08 linux kernel: [  130.034648] [drm] nouveau 0000:04:00.0: Output VGA-1 is running on CRTC 0 using output A
Nov  8 19:04:08 linux kernel: [  130.035012] [drm] nouveau 0000:04:00.0: Setting dpms mode 3 on tmds encoder (output 2)
Nov  8 19:04:08 linux kernel: [  130.055624] [drm] nouveau 0000:04:00.0: 0xD4A7: Parsing digital output script table
Nov  8 19:04:08 linux kernel: [  130.105654] [drm] nouveau 0000:04:00.0: Setting dpms mode 0 on tmds encoder (output 2)
Nov  8 19:04:08 linux kernel: [  130.105656] [drm] nouveau 0000:04:00.0: Output DVI-I-1 is running on CRTC 1 using output A
Nov  8 19:04:16 linux kernel: [  144.528011] [drm] nouveau 0000:04:00.0: Failed to idle channel 1.
Nov  8 19:04:16 linux kdm[665]: X server for display :0 terminated unexpectedly
Nov  8 19:04:35 linux kernel: [  163.855043] [drm] nouveau 0000:04:00.0: Failed to idle channel 1.
Nov  8 19:04:35 linux kdm[665]: X server for display :0 terminated unexpectedly
Nov  8 19:04:39 linux kernel: [  167.546039] [drm] nouveau 0000:04:00.0: Failed to idle channel 3.

Comment 25 Kamil Páral 2012-11-14 18:59:09 UTC
Discussed at 2012-11-14 blocker bug meeting. Rejected as NTH:  It isn't clear that this affects F18, is unlikely to affect live images and could be fixed with an update

Comment 26 John Glotzer 2013-01-24 18:40:27 UTC
I did see it on a fresh F18 install - then switched to Nvidia driver. So yes, it's still there on F18.

Comment 27 Mark Frey 2013-03-23 12:46:41 UTC
Seems to be finally fixed in kernel 3.8.3-103.fc17 !

Comment 28 Martin 2013-06-11 17:01:21 UTC
John, can you confirm it's fixed? If true, please close this bug.

Comment 29 John Glotzer 2013-06-11 18:04:11 UTC
Sure I'll try as soon as I can - 

{1:59pm 0 fedora-w520} ~/python_class > cat /proc/cmdline

BOOT_IMAGE=/vmlinuz-3.9.4-200.fc18.x86_64 root=UUID=ab686964-0ca8-4f00-8d9b-39753d89a74b ro rd.md=0 rd.lvm=0 rd.dm=0 noapic nouveau.modeset=0 rd.driver.blacklist=nouveau SYSFONT=True KEYTABLE=us rd.luks=0 LANG=en_US.UTF-8 quiet


Is it enough to just delete the blacklist of nouveau from my commandline or must I also uninstall nvidia driver???

Comment 30 John Glotzer 2013-06-12 00:02:52 UTC
OK with Fedora 18
Linux fedora-w520 3.9.4-200.fc18.x86_64 #1 SMP Fri May 24 20:10:49 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

I 

1.yum erase xorg-x11-drv-nvidia* nvidia-settings nvidia-xconfig
2.Unblacklisted the nouveau driver
3.Rebooted and used lsmod to confirm expected behavior (no nvidia, yes nouveau)


Resume bug is exactly as it was - could easily suspend but on resume got 
a white screen with random flecks of color. I take from this that nothing has changed.

Reinstalled nvidia, remade the initramfs, blacklisted nouveau

and I can suspend/resume as before.

Comment 31 Claus Stefer 2013-07-02 12:24:12 UTC
Prbolem seems to be solved here, too, 3.9.6-200.fc18.x86_64.