603930 – [nouveau] Resume after suspend panic's kernel on G210M

Bug 603930 - [nouveau] Resume after suspend panic's kernel on G210M [NEEDINFO]

Summary: [nouveau] Resume after suspend panic's kernel on G210M

Keywords:
Status:	CLOSED CANTFIX
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	18
Hardware:	x86_64
OS:	Linux
Priority:	low
Severity:	high
Target Milestone:	---
Assignee:	Ben Skeggs
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2010-06-14 22:12 UTC by Roy
Modified:	2020-03-01 00:05 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2013-11-27 16:05:26 UTC
Type:	---
Embargoed:
Dependent Products:
Flags:	jforbes: needinfo?

Attachments	(Terms of Use)
dmesg with debugging bits after resume (417.93 KB, text/plain) 2010-07-15 21:37 UTC, Roy	no flags	Details
G210M video BIOS (61.00 KB, application/octet-stream) 2010-07-16 13:00 UTC, Roy	no flags	Details
View All

Description Roy 2010-06-14 22:12:29 UTC

Description of problem:
After trying to resume my laptop from suspension, it fails to do so with a panic'ing kernel.
Asus UL50Vg, nVidia Geforce G210M, happened on both FC12 and FC13. All updates installed.

Version-Release number of selected component (if applicable):
Kernel 2.6.33.5-112.fc13.x86_64

How reproducible:
Resume the laptop from suspend

Steps to Reproduce:
1. close laptop lid
2. open laptop lid
3. let kdump write the vmcore
  
Actual results:
A crash dump

Expected results:
A running system

Additional info:
[drm] nouveau 0000:01:00.0: Restoring mode...
[drm] nouveau 0000:01:00.0: PFIFO_INTR 0x00080010 - Ch 127
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa0087d32>] nouveau_gpuobj_ref_find+0x12/0x3a [nouveau]
PGD 1262df067 PUD 122dfb067 PMD 0 
Oops: 0000 [#1] SMP 
last sysfs file: /sys/power/state
CPU 0 
Pid: 2484, comm: pm-suspend Not tainted 2.6.33.5-112.fc13.x86_64 #1 UL50Vg    /UL50Vg              
RIP: 0010:[<ffffffffa0087d32>]  [<ffffffffa0087d32>] nouveau_gpuobj_ref_find+0x12/0x3a [nouveau]
RSP: 0018:ffff88000d803e18  EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffff880136958800 RCX: 0000000000000000
RDX: ffff88000d803ea0 RSI: 0000000000000000 RDI: ffff880137914ee8
RBP: ffff88000d803e18 R08: ffff88011de10000 R09: ffff88011de11c68
R10: 0000000000000000 R11: ffff88011de11bc8 R12: 0000000000000001
R13: ffff880137914e00 R14: 0000000000000000 R15: ffff880136500000
FS:  00007f1f46a1b700(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000001262e2000 CR4: 00000000000406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process pm-suspend (pid: 2484, threadinfo ffff88011de10000, task ffff88012a805d40)
Stack:
 ffff88000d803ed8 ffffffffa008c424 ffff880136958800 0000000000000282
<0> 000001000d803e48 ffff880136500b98 000000000d803e68 0000000100000703
<0> 0000000000000001 ffff880136500000 000000000d803eb8 000000000000007f
Call Trace:
 <IRQ> 
 [<ffffffffa008c424>] nouveau_irq_handler+0x1ec/0xa58 [nouveau]
 [<ffffffff8109e099>] ? __rcu_process_callbacks+0x75/0x27b
 [<ffffffff8109a0d2>] handle_IRQ_event+0x5b/0x11c
 [<ffffffff81021a69>] ? ack_apic_level+0x77/0x138
 [<ffffffff8109be55>] handle_fasteoi_irq+0x8d/0xc9
 [<ffffffff8100c2dd>] handle_irq+0x83/0x8e
 [<ffffffff8100b907>] do_IRQ+0x57/0xbe
 [<ffffffff8142b253>] ret_from_intr+0x0/0x11
 <EOI> 
 [<ffffffff81207664>] ? ioread32+0xf/0x30
 [<ffffffffa00bad0d>] nv50_display_init+0xcb/0xac7 [nouveau]
 [<ffffffffa0088005>] ? nouveau_gpuobj_resume+0xe2/0xee [nouveau]
 [<ffffffffa00847b9>] nouveau_pci_resume+0x32a/0x3a1 [nouveau]
 [<ffffffff812140a1>] pci_legacy_resume+0x33/0x42
 [<ffffffff812141f1>] pci_pm_resume+0x4f/0x82
 [<ffffffff8142ae33>] ? _raw_spin_unlock_irqrestore+0xf/0x16
 [<ffffffff812b25f4>] pm_op+0x88/0x11d
 [<ffffffff812b2f78>] dpm_resume_end+0xec/0x472
 [<ffffffff8107be2a>] suspend_devices_and_enter+0x178/0x1aa
 [<ffffffff8107bf36>] enter_state+0xda/0x12b
 [<ffffffff8107b714>] state_store+0xb1/0xce
 [<ffffffff811fce23>] kobj_attr_store+0x17/0x19
 [<ffffffff811542c9>] sysfs_write_file+0x10f/0x14b
 [<ffffffff81101ad3>] vfs_write+0xa9/0x106
 [<ffffffff81101be6>] sys_write+0x45/0x69
 [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Code: 04 24 48 89 58 08 48 89 18 31 c0 41 59 5b 41 5c 41 5d 41 5e 41 5f c9 c3 90 55 48 8b 8f e8 00 00 00 48 81 c7 e8 00 00 00 48 89 e5 <48> 8b 01 eb 17 39 71 28 75 0c 31 c0 48 85 d2 74 15 48 89 0a eb 
RIP  [<ffffffffa0087d32>] nouveau_gpuobj_ref_find+0x12/0x3a [nouveau]
 RSP <ffff88000d803e18>
CR2: 0000000000000000

Comment 1 Ben Skeggs 2010-07-14 04:12:11 UTC

Can you update your kernel to http://koji.fedoraproject.org/koji/buildinfo?buildID=183346 and see how you go now?

Comment 2 Roy 2010-07-14 07:37:23 UTC

After installing kernel-2.6.34.1-11 the system no longer panics when resuming. It does NOT however give me video. The system clearly responds to my commands.
Pressing the power button after resume does nothing as X.org intercepts the signal and asks for a password. Pressing Alt+F2 and the power button shuts it down, which is normal behaviour for a tty.

/var/log/messages gives me this:
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: We're back, enabling device...
Jul 14 09:28:38 Torres kernel: nouveau 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: POSTing device...
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 0 at offset 0xD359
Jul 14 09:28:38 Torres kernel: HDA Intel 0000:01:00.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16
Jul 14 09:28:38 Torres kernel: ath9k 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
Jul 14 09:28:38 Torres kernel: sd 0:0:0:0: [sda] Starting disk
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: 0xD62F: i2c wr fail: -6
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 1 at offset 0xD8A4
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 2 at offset 0xE3D3
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 3 at offset 0xE408
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 4 at offset 0xE59C
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table at offset 0xE601
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Couldn't find matching output script table
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: 0xC078: parsing output script 0
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Reinitialising engines...
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Restoring GPU objects...
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Restoring mode...
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Couldn't find matching output script table
Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Couldn't find matching output script table

Comment 3 Ben Skeggs 2010-07-14 23:32:35 UTC

The panic issue was a side-effect of something else going wrong during resume, I'm glad the panic is fixed however :)

Okay, since you have a responding system after resume, can you boot with "log_buf_len=1M drm.debug=14 nouveau.reg_debug=0x0200", suspend/resume, login blindly from the console and run "dmesg &>dmesg.log", reboot and attach that here.

dmesg rather than /var/log/messages is important as debug level messages won't make it to /var/log/messages.

Thanks!

Comment 4 Roy 2010-07-15 21:37:30 UTC

Created attachment 432237 [details]
dmesg with debugging bits after resume

Comment 5 Ben Skeggs 2010-07-15 23:52:03 UTC

Can you also attach (with debugfs mounted: mount -t debugfs debugfs /sys/kernel/debug) the /sys/kernel/debug/dri/0/vbios.rom file please.

Comment 6 Roy 2010-07-16 13:00:54 UTC

Created attachment 432383 [details]
G210M video BIOS

Comment 7 Ben Skeggs 2010-07-18 01:21:15 UTC

Can you give kernel-2.6.34.1-15.fc13
(http://koji.fedoraproject.org/koji/taskinfo?taskID=2323487) a try and see if
it helps?

Comment 8 Roy 2010-07-18 09:37:47 UTC

Thank you for this ongoing work :-).
This kernel brings improvement, but I'm afraid we're not there yet. This is the current scenario:
1. Boot my laptop, log in and (optionally) do something.
2. Close laptop lid to get her to suspend
3. Open laptop lid, fill in password, and have a working system (Hooray, progress!)
4. optionally do something more
5. Close laptop lid to put her back in suspend
6. Open laptop lid and see failure.

What happens with this second resume is that I can see the password entry X screen for a split second, it only blinks "on" once. When I switch to one of the other tty's, the image blinks on twice, and on the 3rd "on" it stays working. Switching back to X causes the screen to turn off and blink on once for a split second again. As I have no video recording equipment that can be combined with typing I hope this description is sufficient.

/var/log/messages shows this output for both resumes.

Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: POSTing device...
Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 0 at offset 0xD359
Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: 0xD62F: i2c wr fail: -6
Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 1 at offset 0xD8A4
Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 2 at offset 0xE3D3
Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 3 at offset 0xE408
Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 4 at offset 0xE59C
Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table at offset 0xE601
Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: 0xBC89: parsing output script 0
Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: 0xC078: parsing output script 0
Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Reinitialising engines...
Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Restoring GPU objects...
Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Restoring mode...
Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: 0xBB00: parsing clock script 0

Comment 9 Roy 2010-09-06 22:13:29 UTC

Kernel 2.6.34.6-47.fc13 did not fix all problems. I can suspend once, but it goes wrong with the second resume. X becomes unusable, terminals can be used but flickers three times before it stabilizes.
Please let me know if I can find you anything either here or on IRC (RSpliet), including peeking regs.

Comment 10 Chuck Ebbert 2010-09-07 04:44:39 UTC

Can you try 2.6.34.6-54, which has some more suspend/resume fixes?

Comment 11 Roy 2010-09-07 19:27:04 UTC

Unfortunately 2.6.34.6-54 doesn't make a difference.

Comment 12 Roy 2011-01-10 22:41:18 UTC

I'm afraid this bug has not been fixed so far on the latest Fedora kernel (2.6.35.10-74). I did a small test, let me "update" on the current situation:
Directly after boot the screen works appropriately. I can switch VT's flicker-free. After a suspend and resume, the machine resumes and the screen comes up fine. When I then switch to another VT, the screen flickers once. Trying to go back to the X screen results in a black flicker, then the X screen flickering, to end on a black screen.
If I did not switch to VT and back to X, but instead suspended-resume for the second time, the X screen flickers aswell ending in a black screen.

In short: after suspend-resume I cannot switch back to the X screen. No lockups, just a black screen as result.

Comment 13 Roy 2011-03-24 23:23:05 UTC

Right, by now I'm running the 2.6.38-0.rc1 kernel from koji, with an out-of-tree build of the latest GIT revision of nouveau, but still the same problem. What has changed is that now there is an error message for each the switch to X.
First switch to X:
[drm] nouveau 0000:01:00.0: EvoCh 0 Mthd 0x0080 Data 0x00000000 (0x0005 0x05)
Subsequent switches to X:
[drm] nouveau 0000:01:00.0: EvoCh 0 Mthd 0x0080 Data 0x00000000 (0x1005 0x05)

Comment 14 Bug Zapper 2011-06-02 10:50:39 UTC

This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 15 Roy 2011-10-20 09:44:05 UTC

I have just tested nouveau upstream (with all the recent suspend/resume work) against a Fedora 16 kernel, but unfortunately this bug still persists.

Comment 16 Dave Jones 2012-03-22 16:49:28 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 17 Dave Jones 2012-03-22 16:54:00 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 18 Dave Jones 2012-03-22 17:04:33 UTC

[mass update]
kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository.
Please retest with this update.

Comment 19 Roy 2012-03-26 09:42:11 UTC

Negative. Still the same (broken) behaviour where after suspend I cannot reliably modeset anymore (switch to a VT, suspend again). What struck me though is the following. Before suspend xrandr outputs this:
Screen 0: minimum 320 x 200, current 1366 x 768, maximum 8192 x 8192
LVDS-1 connected 1366x768+0+0 (normal left inverted right x axis y axis) 344mm x 193mm
   1366x768       60.0*+
   1024x768       59.9  
   800x600        59.9  
   640x480        59.4  
   720x400        59.6  
   640x400        60.0  
   640x350        59.8  
VGA-1 disconnected (normal left inverted right x axis y axis)
HDMI-1 disconnected (normal left inverted right x axis y axis)

After suspend it turns into this:
Screen 0: minimum 320 x 200, current 1366 x 768, maximum 8192 x 8192
LVDS-1 disconnected 1366x768+0+0 (normal left inverted right x axis y axis) 0mm x 0mm
VGA-1 disconnected (normal left inverted right x axis y axis)
HDMI-1 disconnected (normal left inverted right x axis y axis)
  1366x768 (0x64)   70.0MHz
        h: width  1366 start 1414 end 1446 total 1469 skew    0 clock   47.7KHz
        v: height  768 start  771 end  777 total  794           clock   60.0Hz

Disconnected?

Comment 20 Dave Jones 2012-10-23 15:36:14 UTC

# Mass update to all open bugs.

Kernel 3.6.2-1.fc16 has just been pushed to updates.
This update is a significant rebase from the previous version.

Please retest with this kernel, and let us know if your problem has been fixed.

In the event that you have upgraded to a newer release and the bug you reported
is still present, please change the version field to the newest release you have
encountered the issue with.  Before doing so, please ensure you are testing the
latest kernel update in that release and attach any new and relevant information
you may have gathered.

If you are not the original bug reporter and you still experience this bug,
please file a new report, as it is possible that you may be seeing a
different problem. 
(Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).

Comment 21 Justin M. Forbes 2012-11-14 15:35:08 UTC

With no response, we are closing this bug under the assumption that it is no longer an issue. If you still experience this bug, please feel free to reopen the bug report.

Comment 22 Roy 2012-12-02 22:42:50 UTC

Kernel 3.6.7-4 still exposes this particular problem.

Comment 23 Fedora End Of Life 2013-01-17 01:05:18 UTC

This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 24 Fedora End Of Life 2013-02-14 02:37:34 UTC

Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 25 Roy 2013-02-14 18:12:07 UTC

Still available in Fedora 18, and in the upstream kernel.

Comment 26 Fedora End Of Life 2013-02-26 15:31:37 UTC

Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 27 Roy 2013-02-26 17:56:19 UTC

I didn't make that last comment for shits and giggles, you "Fedora End Of Life"...

Comment 28 Justin M. Forbes 2013-10-18 21:14:45 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs.

Fedora 18 has now been rebased to 3.11.4-101.fc18.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19.

If you experience different issues, please open a new bug report for those.

Comment 29 Justin M. Forbes 2013-11-27 16:05:26 UTC

*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  

It has been over a month since we asked you to test the 3.11 kernel updates and let us know if your issue has been resolved or is still a problem. When this happened, the bug was set to needinfo.  Because the needinfo is still set, we assume either this is no longer a problem, or you cannot provide additional information to help us resolve the issue.  As a result we are closing with insufficient data. If this is still a problem, we apologize, feel free to reopen the bug and provide more information so that we can work towards a resolution

If you experience different issues, please open a new bug report for those.

Comment 30 Roy 2020-03-01 00:05:46 UTC

Attempting to properly remove the "needinfo" tag, no need for a daily reminder. The laptop that suffered from this bug is long dead.

Note You need to log in before you can comment on or make changes to this bug.