Description of problem: Resuming after suspend fails on this Ideapad Y550 which has an NV50 (NVIDIA GeForce GT 130M). I haven't seen resume work on this system yet (this includes F12). But I decided to try Fedora 13 (even in its bleeding form) because nouveau has seen updates upstream. Version-Release number of selected component (if applicable): 2.6.33-1.fc13.x86_64 How reproducible: always Steps to Reproduce: 1. suspend 2. resume 3. screen stays black (NULL pointer occurs in nouveau driver) Actual results: [drm] nouveau 0000:01:00.0: GPU lockup - switching to software fbcon usb 2-3: reset high speed USB device using ehci_hcd and address 2 BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 IP: [<ffffffffa005f835>] ttm_bo_pci_offset+0x5b/0x7b [ttm] PGD 137471067 PUD 136d9e067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/boot_vga CPU 1 Pid: 1365, comm: Xorg Not tainted 2.6.33-1.fc13.x86_64 #1 KIWB1/4186 RIP: 0010:[<ffffffffa005f835>] [<ffffffffa005f835>] ttm_bo_pci_offset+0x5b/0x7b [ttm] RSP: 0018:ffff8801274e18e8 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff8801274e1b78 RCX: ffff8801274e1910 RDX: ffff8801274e1900 RSI: ffff8801274e1b78 RDI: ffff8801366800f0 RBP: ffff8801274e18e8 R08: ffff8801274e1908 R09: 0000000000000180 R10: 0000000000000002 R11: 0000000000000000 R12: ffff8801274e19f8 R13: ffff8801366800f0 R14: 0000000000000002 R15: 0000000000000000 FS: 00007f543a55d7a0(0000) GS:ffff880006600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000028 CR3: 0000000136d7b000 CR4: 00000000000406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process Xorg (pid: 1365, threadinfo ffff8801274e0000, task ffff880137f3c920) Stack: ffff8801274e1948 ffffffffa006261d ffffffff81478dbe 00000000d0000000 <0> 0000000000000000 0000000000680000 0000000000000018 ffff8801274e1b78 <0> ffff880115281120 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffffa006261d>] ttm_mem_reg_ioremap+0x3b/0xaf [ttm] [<ffffffff81478dbe>] ? _raw_spin_unlock_irqrestore+0x4c/0x56 [<ffffffffa0062ab1>] ttm_bo_move_memcpy+0x79/0x3d7 [ttm] [<ffffffffa007e05e>] ? nouveau_fence_wait+0x54/0xb4 [nouveau] [<ffffffffa007de62>] ? nouveau_fence_unref+0x29/0x34 [nouveau] [<ffffffffa007d2cc>] ? nouveau_bo_move_m2mf+0x354/0x366 [nouveau] [<ffffffffa007d66e>] nouveau_bo_move+0x390/0x41d [nouveau] [<ffffffff810f34db>] ? unmap_mapping_range+0x10e/0x11d [<ffffffffa006030b>] ttm_bo_handle_move_mem+0x1ad/0x2b5 [ttm] [<ffffffffa0061ec7>] ttm_bo_move_buffer+0xbc/0x10c [ttm] [<ffffffffa006006a>] ? ttm_bo_reserve+0x38/0xf8 [ttm] [<ffffffffa0061fc5>] ttm_bo_validate+0xae/0xf7 [ttm] [<ffffffffa007e775>] validate_list+0x157/0x288 [nouveau] [<ffffffffa007fa7a>] nouveau_gem_ioctl_pushbuf+0xca9/0xcdf [nouveau] [<ffffffffa0019418>] drm_ioctl+0x28f/0x373 [drm] [<ffffffffa007edd1>] ? nouveau_gem_ioctl_pushbuf+0x0/0xcdf [nouveau] [<ffffffff8111e922>] ? do_sync_read+0xc4/0x101 [<ffffffff8112bcfc>] vfs_ioctl+0x32/0xa6 [<ffffffff8112c27c>] do_vfs_ioctl+0x490/0x4d6 [<ffffffff8112c318>] sys_ioctl+0x56/0x79 [<ffffffff81009c72>] system_call_fastpath+0x16/0x1b Code: 69 c0 c0 00 00 00 8b 44 07 64 a8 01 75 13 45 85 d2 74 0a a8 08 75 06 f6 46 22 01 74 04 31 c0 c9 c3 48 8b 06 4d 69 c9 c0 00 00 00 <48> 8b 40 28 48 c1 e0 0c 48 89 01 48 8b 46 10 48 c1 e0 0c 49 89 RIP [<ffffffffa005f835>] ttm_bo_pci_offset+0x5b/0x7b [ttm] RSP <ffff8801274e18e8> CR2: 0000000000000028 ---[ end trace 7f2055f83ed9bd03 ]--- Expected results: Avoid NULL pointer and successfully resume after suspend. Additional Info: I'll be attaching the full dmesg.
Created attachment 396803 [details] dmesg from y550 that shows NULL pointer after resume
Created attachment 416601 [details] kernel log showing same NULL pointer with 2.6.33.4-95.fc13.x86_64 Still happens with latest F13 kernel: 2.6.33.4-95.fc13.x86_64
Happens with 2.6.33.5-112.fc13.x86_64 too: BUG: unable to handle kernel NULL pointer dereference at 0000000000000028 IP: [<ffffffffa006f71e>] ttm_bo_pci_offset+0x5d/0x7d [ttm] ... Other bug reports for this same issue: https://bugs.freedesktop.org/show_bug.cgi?id=26521 https://bugs.freedesktop.org/show_bug.cgi?id=27574 https://bugzilla.kernel.org/show_bug.cgi?id=15120
Cc'ing Jerome given his recent ttm rework that is now upstream in 2.6.35. The following commits remove ttm_bo_pci_offset entirely: http://git.kernel.org/linus/0c321c7962718 http://git.kernel.org/linus/82c5da6bf8b55 I'll see if I can test these bits (via rawhide or applying by hand) relative to this bug.
(In reply to comment #5) > Cc'ing Jerome given his recent ttm rework that is now upstream in 2.6.35. The > following commits remove ttm_bo_pci_offset entirely: > http://git.kernel.org/linus/0c321c7962718 > http://git.kernel.org/linus/82c5da6bf8b55 > > I'll see if I can test these bits (via rawhide or applying by hand) relative to > this bug. Good news is rawhide's 2.6.34-20.fc14.x86_64 (which I understand has 2.6.35-rc1's drm) no longer hits a NULL pointer in TTM. Bad news is the display still stays blank after resume. Here are the details from the log: kernel: PM: Syncing filesystems ... done. kernel: Freezing user space processes ... (elapsed 0.01 seconds) done. kernel: Freezing remaining freezable tasks ... (elapsed 0.01 seconds) done. kernel: Suspending console(s) (use no_console_suspend to debug) kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache kernel: sd 0:0:0:0: [sda] Stopping disk kernel: [drm] nouveau 0000:01:00.0: Disabling fbcon acceleration... kernel: [drm] nouveau 0000:01:00.0: Unpinning framebuffer(s)... kernel: [drm] nouveau 0000:01:00.0: Evicting buffers... kernel: pci 0000:00:1f.3: PCI INT C disabled kernel: ehci_hcd 0000:00:1d.7: PCI INT A disabled kernel: uhci_hcd 0000:00:1d.2: PCI INT C disabled kernel: uhci_hcd 0000:00:1d.1: PCI INT B disabled kernel: uhci_hcd 0000:00:1d.0: PCI INT A disabled kernel: ehci_hcd 0000:00:1a.7: PCI INT C disabled kernel: uhci_hcd 0000:00:1a.2: PCI INT C disabled kernel: uhci_hcd 0000:00:1a.1: PCI INT B disabled kernel: uhci_hcd 0000:00:1a.0: PCI INT A disabled kernel: [drm] nouveau 0000:01:00.0: Idling channels... kernel: [drm] nouveau 0000:01:00.0: Suspending GPU objects... kernel: HDA Intel 0000:00:1b.0: PCI INT A disabled kernel: HDA Intel 0000:00:1b.0: power state changed by ACPI to D3 kernel: HDA Intel 0000:01:00.1: PCI INT A disabled kernel: [drm] nouveau 0000:01:00.0: And we're gone! kernel: nouveau 0000:01:00.0: PCI INT A disabled kernel: nouveau 0000:01:00.0: power state changed by ACPI to D3 kernel: PM: suspend of devices complete after 854.583 msecs kernel: PM: late suspend of devices complete after 22.012 msecs ... kernel: nouveau 0000:01:00.0: power state changed by ACPI to D0 kernel: nouveau 0000:01:00.0: power state changed by ACPI to D0 kernel: nouveau 0000:01:00.0: power state changed by ACPI to D0 kernel: nouveau 0000:01:00.0: power state changed by ACPI to D0 kernel: nouveau 0000:01:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16 kernel: [drm] nouveau 0000:01:00.0: POSTing device... kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 0 at offset 0xD359 kernel: sd 0:0:0:0: [sda] Starting disk kernel: [drm] nouveau 0000:01:00.0: 0xD62F: Failed parsing init table opcode: INIT_ZM_I2C_BYTE -6 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 1 at offset 0xD8A4 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 2 at offset 0xE3D3 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 3 at offset 0xE408 kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table 4 at offset 0xE59C kernel: [drm] nouveau 0000:01:00.0: Parsing VBIOS init table at offset 0xE601 kernel: [drm] nouveau 0000:01:00.0: 0xBC89: parsing output script 0 kernel: [drm] nouveau 0000:01:00.0: 0xC078: parsing output script 0 kernel: [drm] nouveau 0000:01:00.0: Reinitialising engines... kernel: [drm] nouveau 0000:01:00.0: Restoring GPU objects... kernel: usb 2-3: reset high speed USB device using ehci_hcd and address 2 kernel: ata5: SATA link down (SStatus 0 SControl 300) kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) kernel: ata2.00: configured for UDMA/133 kernel: [drm] nouveau 0000:01:00.0: Restoring mode... kernel: [drm] nouveau 0000:01:00.0: 0xBC8D: parsing output script 1 kernel: [drm] nouveau 0000:01:00.0: 0xBB00: parsing clock script 0 kernel: [drm] nouveau 0000:01:00.0: 0xBC84: parsing clock script 1 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) kernel: ata1.00: configured for UDMA/133 kernel: PM: resume of devices complete after 3247.071 msecs kernel: Restarting tasks ... done. kernel: video LNXVIDEO:00: Restoring backlight state
(In reply to comment #6) > (In reply to comment #5) > > Bad news is the display still stays blank after resume. Here are the details > from the log: > > kernel: [drm] nouveau 0000:01:00.0: 0xD62F: Failed parsing init table opcode: > INIT_ZM_I2C_BYTE -6 This is likely the reason why. I'd say there's more setup for your card in that init table that we're skipping because INIT_ZM_I2C_BYTE fails. Can you file a new bug report against rawhide to track this issue please. It'd be great if you could include your dmesg output after a suspend/resume with "drm.debug=14 log_buf_len=1M" appended to your boot options, as well as a vbios image (I may want vbtracetool traces later too, but we'll discuss that when you open a new bug). Thanks!
(In reply to comment #6) > (In reply to comment #5) > > Cc'ing Jerome given his recent ttm rework that is now upstream in 2.6.35. The > > following commits remove ttm_bo_pci_offset entirely: > > http://git.kernel.org/linus/0c321c7962718 > > http://git.kernel.org/linus/82c5da6bf8b55 > > > > I'll see if I can test these bits (via rawhide or applying by hand) relative to > > this bug. > > Good news is rawhide's 2.6.34-20.fc14.x86_64 (which I understand has > 2.6.35-rc1's drm) no longer hits a NULL pointer in TTM. Unfortunately, it seems a comparable NULL pointer still exists w/ 2.6.34-20.fc14.x86_64, see: https://bugzilla.redhat.com/show_bug.cgi?id=601002#c4 But it appears harder to hit (rather than always hitting it on resume with f13's kernel(s): 1 out of 3 resumes hit it with rawhide's 2.6.34-20.fc14.x86_64).
Mike, can you give the kernel at http://koji.fedoraproject.org/koji/buildinfo?buildID=183346 a try and see if there's any change?
kernel-2.6.34.2-34.fc13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/kernel-2.6.34.2-34.fc13
kernel-2.6.34.2-34.fc13 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/kernel-2.6.34.2-34.fc13
kernel-2.6.34.3-37.fc13 has been submitted as an update for Fedora 13. http://admin.fedoraproject.org/updates/kernel-2.6.34.3-37.fc13
kernel-2.6.34.3-37.fc13 has been pushed to the Fedora 13 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/kernel-2.6.34.3-37.fc13
2.6.34 kernel has been withdrawn.
kernel-2.6.34.6-47.fc13 has been submitted as an update for Fedora 13. https://admin.fedoraproject.org/updates/kernel-2.6.34.6-47.fc13
kernel-2.6.34.6-47.fc13 has been pushed to the Fedora 13 stable repository. If problems still persist, please make note of it in this bug report.