Description of problem: After trying to resume my laptop from suspension, it fails to do so with a panic'ing kernel. Asus UL50Vg, nVidia Geforce G210M, happened on both FC12 and FC13. All updates installed. Version-Release number of selected component (if applicable): Kernel 2.6.33.5-112.fc13.x86_64 How reproducible: Resume the laptop from suspend Steps to Reproduce: 1. close laptop lid 2. open laptop lid 3. let kdump write the vmcore Actual results: A crash dump Expected results: A running system Additional info: [drm] nouveau 0000:01:00.0: Restoring mode... [drm] nouveau 0000:01:00.0: PFIFO_INTR 0x00080010 - Ch 127 BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa0087d32>] nouveau_gpuobj_ref_find+0x12/0x3a [nouveau] PGD 1262df067 PUD 122dfb067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/power/state CPU 0 Pid: 2484, comm: pm-suspend Not tainted 2.6.33.5-112.fc13.x86_64 #1 UL50Vg /UL50Vg RIP: 0010:[<ffffffffa0087d32>] [<ffffffffa0087d32>] nouveau_gpuobj_ref_find+0x12/0x3a [nouveau] RSP: 0018:ffff88000d803e18 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff880136958800 RCX: 0000000000000000 RDX: ffff88000d803ea0 RSI: 0000000000000000 RDI: ffff880137914ee8 RBP: ffff88000d803e18 R08: ffff88011de10000 R09: ffff88011de11c68 R10: 0000000000000000 R11: ffff88011de11bc8 R12: 0000000000000001 R13: ffff880137914e00 R14: 0000000000000000 R15: ffff880136500000 FS: 00007f1f46a1b700(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000001262e2000 CR4: 00000000000406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process pm-suspend (pid: 2484, threadinfo ffff88011de10000, task ffff88012a805d40) Stack: ffff88000d803ed8 ffffffffa008c424 ffff880136958800 0000000000000282 <0> 000001000d803e48 ffff880136500b98 000000000d803e68 0000000100000703 <0> 0000000000000001 ffff880136500000 000000000d803eb8 000000000000007f Call Trace: <IRQ> [<ffffffffa008c424>] nouveau_irq_handler+0x1ec/0xa58 [nouveau] [<ffffffff8109e099>] ? __rcu_process_callbacks+0x75/0x27b [<ffffffff8109a0d2>] handle_IRQ_event+0x5b/0x11c [<ffffffff81021a69>] ? ack_apic_level+0x77/0x138 [<ffffffff8109be55>] handle_fasteoi_irq+0x8d/0xc9 [<ffffffff8100c2dd>] handle_irq+0x83/0x8e [<ffffffff8100b907>] do_IRQ+0x57/0xbe [<ffffffff8142b253>] ret_from_intr+0x0/0x11 <EOI> [<ffffffff81207664>] ? ioread32+0xf/0x30 [<ffffffffa00bad0d>] nv50_display_init+0xcb/0xac7 [nouveau] [<ffffffffa0088005>] ? nouveau_gpuobj_resume+0xe2/0xee [nouveau] [<ffffffffa00847b9>] nouveau_pci_resume+0x32a/0x3a1 [nouveau] [<ffffffff812140a1>] pci_legacy_resume+0x33/0x42 [<ffffffff812141f1>] pci_pm_resume+0x4f/0x82 [<ffffffff8142ae33>] ? _raw_spin_unlock_irqrestore+0xf/0x16 [<ffffffff812b25f4>] pm_op+0x88/0x11d [<ffffffff812b2f78>] dpm_resume_end+0xec/0x472 [<ffffffff8107be2a>] suspend_devices_and_enter+0x178/0x1aa [<ffffffff8107bf36>] enter_state+0xda/0x12b [<ffffffff8107b714>] state_store+0xb1/0xce [<ffffffff811fce23>] kobj_attr_store+0x17/0x19 [<ffffffff811542c9>] sysfs_write_file+0x10f/0x14b [<ffffffff81101ad3>] vfs_write+0xa9/0x106 [<ffffffff81101be6>] sys_write+0x45/0x69 [<ffffffff81009b02>] system_call_fastpath+0x16/0x1b Code: 04 24 48 89 58 08 48 89 18 31 c0 41 59 5b 41 5c 41 5d 41 5e 41 5f c9 c3 90 55 48 8b 8f e8 00 00 00 48 81 c7 e8 00 00 00 48 89 e5 <48> 8b 01 eb 17 39 71 28 75 0c 31 c0 48 85 d2 74 15 48 89 0a eb RIP [<ffffffffa0087d32>] nouveau_gpuobj_ref_find+0x12/0x3a [nouveau] RSP <ffff88000d803e18> CR2: 0000000000000000
Can you update your kernel to http://koji.fedoraproject.org/koji/buildinfo?buildID=183346 and see how you go now?
After installing kernel-2.6.34.1-11 the system no longer panics when resuming. It does NOT however give me video. The system clearly responds to my commands. Pressing the power button after resume does nothing as X.org intercepts the signal and asks for a password. Pressing Alt+F2 and the power button shuts it down, which is normal behaviour for a tty. /var/log/messages gives me this: Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: We're back, enabling device... Jul 14 09:28:38 Torres kernel: nouveau 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: POSTing device... Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 0 at offset 0xD359 Jul 14 09:28:38 Torres kernel: HDA Intel 0000:01:00.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16 Jul 14 09:28:38 Torres kernel: ath9k 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17 Jul 14 09:28:38 Torres kernel: sd 0:0:0:0: [sda] Starting disk Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: 0xD62F: i2c wr fail: -6 Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 1 at offset 0xD8A4 Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 2 at offset 0xE3D3 Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 3 at offset 0xE408 Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 4 at offset 0xE59C Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table at offset 0xE601 Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Couldn't find matching output script table Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: 0xC078: parsing output script 0 Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Reinitialising engines... Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Restoring GPU objects... Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Restoring mode... Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Couldn't find matching output script table Jul 14 09:28:38 Torres kernel: [drm] nouveau 0000:01:00.0: Couldn't find matching output script table
The panic issue was a side-effect of something else going wrong during resume, I'm glad the panic is fixed however :) Okay, since you have a responding system after resume, can you boot with "log_buf_len=1M drm.debug=14 nouveau.reg_debug=0x0200", suspend/resume, login blindly from the console and run "dmesg &>dmesg.log", reboot and attach that here. dmesg rather than /var/log/messages is important as debug level messages won't make it to /var/log/messages. Thanks!
Created attachment 432237 [details] dmesg with debugging bits after resume
Can you also attach (with debugfs mounted: mount -t debugfs debugfs /sys/kernel/debug) the /sys/kernel/debug/dri/0/vbios.rom file please.
Created attachment 432383 [details] G210M video BIOS
Can you give kernel-2.6.34.1-15.fc13 (http://koji.fedoraproject.org/koji/taskinfo?taskID=2323487) a try and see if it helps?
Thank you for this ongoing work :-). This kernel brings improvement, but I'm afraid we're not there yet. This is the current scenario: 1. Boot my laptop, log in and (optionally) do something. 2. Close laptop lid to get her to suspend 3. Open laptop lid, fill in password, and have a working system (Hooray, progress!) 4. optionally do something more 5. Close laptop lid to put her back in suspend 6. Open laptop lid and see failure. What happens with this second resume is that I can see the password entry X screen for a split second, it only blinks "on" once. When I switch to one of the other tty's, the image blinks on twice, and on the 3rd "on" it stays working. Switching back to X causes the screen to turn off and blink on once for a split second again. As I have no video recording equipment that can be combined with typing I hope this description is sufficient. /var/log/messages shows this output for both resumes. Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: POSTing device... Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 0 at offset 0xD359 Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: 0xD62F: i2c wr fail: -6 Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 1 at offset 0xD8A4 Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 2 at offset 0xE3D3 Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 3 at offset 0xE408 Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 4 at offset 0xE59C Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table at offset 0xE601 Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: 0xBC89: parsing output script 0 Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: 0xC078: parsing output script 0 Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Reinitialising engines... Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Restoring GPU objects... Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: Restoring mode... Jul 18 11:24:23 Torres kernel: [drm] nouveau 0000:01:00.0: 0xBB00: parsing clock script 0
Kernel 2.6.34.6-47.fc13 did not fix all problems. I can suspend once, but it goes wrong with the second resume. X becomes unusable, terminals can be used but flickers three times before it stabilizes. Please let me know if I can find you anything either here or on IRC (RSpliet), including peeking regs.
Can you try 2.6.34.6-54, which has some more suspend/resume fixes?
Unfortunately 2.6.34.6-54 doesn't make a difference.
I'm afraid this bug has not been fixed so far on the latest Fedora kernel (2.6.35.10-74). I did a small test, let me "update" on the current situation: Directly after boot the screen works appropriately. I can switch VT's flicker-free. After a suspend and resume, the machine resumes and the screen comes up fine. When I then switch to another VT, the screen flickers once. Trying to go back to the X screen results in a black flicker, then the X screen flickering, to end on a black screen. If I did not switch to VT and back to X, but instead suspended-resume for the second time, the X screen flickers aswell ending in a black screen. In short: after suspend-resume I cannot switch back to the X screen. No lockups, just a black screen as result.
Right, by now I'm running the 2.6.38-0.rc1 kernel from koji, with an out-of-tree build of the latest GIT revision of nouveau, but still the same problem. What has changed is that now there is an error message for each the switch to X. First switch to X: [drm] nouveau 0000:01:00.0: EvoCh 0 Mthd 0x0080 Data 0x00000000 (0x0005 0x05) Subsequent switches to X: [drm] nouveau 0000:01:00.0: EvoCh 0 Mthd 0x0080 Data 0x00000000 (0x1005 0x05)
This message is a reminder that Fedora 13 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 13. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '13'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 13's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 13 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
I have just tested nouveau upstream (with all the recent suspend/resume work) against a Fedora 16 kernel, but unfortunately this bug still persists.
[mass update] kernel-3.3.0-4.fc16 has been pushed to the Fedora 16 stable repository. Please retest with this update.
Negative. Still the same (broken) behaviour where after suspend I cannot reliably modeset anymore (switch to a VT, suspend again). What struck me though is the following. Before suspend xrandr outputs this: Screen 0: minimum 320 x 200, current 1366 x 768, maximum 8192 x 8192 LVDS-1 connected 1366x768+0+0 (normal left inverted right x axis y axis) 344mm x 193mm 1366x768 60.0*+ 1024x768 59.9 800x600 59.9 640x480 59.4 720x400 59.6 640x400 60.0 640x350 59.8 VGA-1 disconnected (normal left inverted right x axis y axis) HDMI-1 disconnected (normal left inverted right x axis y axis) After suspend it turns into this: Screen 0: minimum 320 x 200, current 1366 x 768, maximum 8192 x 8192 LVDS-1 disconnected 1366x768+0+0 (normal left inverted right x axis y axis) 0mm x 0mm VGA-1 disconnected (normal left inverted right x axis y axis) HDMI-1 disconnected (normal left inverted right x axis y axis) 1366x768 (0x64) 70.0MHz h: width 1366 start 1414 end 1446 total 1469 skew 0 clock 47.7KHz v: height 768 start 771 end 777 total 794 clock 60.0Hz Disconnected?
# Mass update to all open bugs. Kernel 3.6.2-1.fc16 has just been pushed to updates. This update is a significant rebase from the previous version. Please retest with this kernel, and let us know if your problem has been fixed. In the event that you have upgraded to a newer release and the bug you reported is still present, please change the version field to the newest release you have encountered the issue with. Before doing so, please ensure you are testing the latest kernel update in that release and attach any new and relevant information you may have gathered. If you are not the original bug reporter and you still experience this bug, please file a new report, as it is possible that you may be seeing a different problem. (Please don't clone this bug, a fresh bug referencing this bug in the comment is sufficient).
With no response, we are closing this bug under the assumption that it is no longer an issue. If you still experience this bug, please feel free to reopen the bug report.
Kernel 3.6.7-4 still exposes this particular problem.
This message is a reminder that Fedora 16 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 16. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '16'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 16's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 16 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged to click on "Clone This Bug" and open it against that version of Fedora. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.
Still available in Fedora 18, and in the upstream kernel.
I didn't make that last comment for shits and giggles, you "Fedora End Of Life"...
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. Due to this, we are doing a mass bug update across all of the Fedora 18 kernel bugs. Fedora 18 has now been rebased to 3.11.4-101.fc18. Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel. If you have moved on to Fedora 19, and are still experiencing this issue, please change the version to Fedora 19. If you experience different issues, please open a new bug report for those.
*********** MASS BUG UPDATE ************** We apologize for the inconvenience. There is a large number of bugs to go through and several of them have gone stale. It has been over a month since we asked you to test the 3.11 kernel updates and let us know if your issue has been resolved or is still a problem. When this happened, the bug was set to needinfo. Because the needinfo is still set, we assume either this is no longer a problem, or you cannot provide additional information to help us resolve the issue. As a result we are closing with insufficient data. If this is still a problem, we apologize, feel free to reopen the bug and provide more information so that we can work towards a resolution If you experience different issues, please open a new bug report for those.
Attempting to properly remove the "needinfo" tag, no need for a daily reminder. The laptop that suffered from this bug is long dead.