Bug 527874

Summary: KMS:RV515:X1400 Thinkpad T60 resume fails
Product: [Fedora] Fedora Reporter: Peng Huang <phuang>
Component: xorg-x11-drv-atiAssignee: Jérôme Glisse <jglisse>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: medium    
Version: 12CC: airlied, a.meganov, christophe, david, dougsland, dwagelaar, ehabkost, gansalmon, itamar, jglisse, kernel-maint, mailings, matthias_haase, mcepl, mcepl, mkgunstrom, phuang, rderooy, renard, shamardin, steven, tim, vedran, xgl-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 473195 Environment:
Last Closed: 2009-11-16 15:33:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 473195    
Bug Blocks:    
Attachments:
Description Flags
dmesg after pm-suspent
none
The photo of my screen is suspend/resume in X
none
x1300-bios-post
none
x1300-asicinit-asm
none
x1300-asicinit-reg
none
clean boot without kms, initlevel = 3
none
without kms, after resume
none
without kms, after resume and vbetool post
none
with kms, initlevel = 3, before resume
none
with kms, initlevel = 3, after resume
none
output of dmesg
none
dump for new kernel with kms
none
dump for new kernel with kms, after resume
none
outputs of lspci
none
output of lspci for nokms
none
Stop mc at suspend and reset it at resume
none
Reset HostDataPath at resume
none
X1400 restore mc+hdp before asic_init and put vram at 0x10000000
none
X1400 restore mc+hdp before asic_init and put vram at 0x10000000 + VGA HDP
none
Shutdown lvds, force mc to be on, dump regs
none
dmesg output none

Description Peng Huang 2009-10-08 02:46:07 UTC
+++ This bug was initially created as a clone of Bug #473195 +++

Description of problem:
The resume after suspend or hibernate of a Thinkpad T60 with Radeon Mobility X1400 fails fails. The light of the display is turned on, but it remains black. In /var/log/messages I have the following lines:

kernel: [drm:radeon_resume] *ERROR* 
kernel: [drm] Loading R500 Microcode
kernel: [drm] Num pipes: 1
kernel: [drm] writeback test failed
kernel: [drm:drm_ttm_bind] *ERROR* Couldn't bind backend.
kernel: executing set pll
kernel: executing set crtc timing
kernel: [drm] LVDS-8: set mode 1400x1050 11
kernel: executing set LVDS encoder

When booting with nomodeset suspend/resume works just fine, but without the nice new eye candy... The machine has been upgraded from F-9 to F-10 via a yum upgrade.

Version-Release number of selected component (if applicable):
kernel-2.6.27.5-117.fc10.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Boot laptop (without nomodeset)
2. Suspend
3. Resume, see black screen
  
Actual results:
The machine cannot be used. A hard power down and power up is required.

Expected results:
The screensaver password prompt should appear.

Additional info:
The smolt profile of the machine can be found at http://www.smolts.org/client/show/pub_d3521300-de3d-40ee-be30-5c99bb593c3b

--- Additional comment from a.meganov on 2008-11-27 11:02:46 EDT ---

Same hardware, same problem.

--- Additional comment from matthias_haase on 2008-11-30 05:31:03 EDT ---

suspend/resume fails on Thinkpad T40 (Radeon Mobility 7500) too after upgrading from FC9 -> FC10 without error messages.

#
Nov 30 10:59:12 thinkpad kernel: [drm] Loading R100 Microcode
Nov 30 10:59:12 thinkpad kernel: [drm] writeback test succeeded in 2 usecs
Nov 30 11:07:46 thinkpad kernel: [drm] Initialized drm 1.1.0 20060810
Nov 30 11:07:46 thinkpad kernel: [drm] Initialized radeon 1.29.0 20080528 on minor 0
Nov 30 11:07:54 thinkpad kernel: [drm] Setting GART location based on new memory map
#

Machine is unusable... black screen after resume.

kernel-2.6.27.5-117.fc10.i686
rhgb/plymouth is enabled using vga=0x318 as kernel boot arg.

--- Additional comment from matthias_haase on 2008-11-30 05:48:16 EDT ---

pm-suspend --quirk-none doesn't help. Problem is related to xorg. I can find countless related messages in previous xorg.log:

...
II) Macintosh mouse button emulation: Device reopened after 1 attempts.
(II) USB Optical Mouse: Device reopened after 1 attempts.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.

Backtrace:
0: /usr/bin/Xorg(xorg_backtrace+0x3b) [0x812bc5b]
1: /usr/bin/Xorg(mieqEnqueue+0x289) [0x810b379]
2: /usr/bin/Xorg(xf86PostMotionEventP+0xc2) [0x80d4262]
3: /usr/bin/Xorg(xf86PostMotionEvent+0x68) [0x80d43c8]
4: /usr/lib/xorg/modules/input//evdev_drv.so [0x355a8d]
5: /usr/bin/Xorg [0x80bcdb7]
6: /usr/bin/Xorg [0x80ac91e]
7: [0x110400]
8: [0x110416]
9: /lib/libc.so.6(ioctl+0x19) [0x484949]
10: /usr/lib/libdrm.so.2 [0x20026cf]
11: /usr/lib/libdrm.so.2(drmCommandWriteRead+0x34) [0x2002934]
12: /usr/lib/dri/radeon_dri.so [0x3089b2]
13: /usr/lib/dri/radeon_dri.so [0x308b38]
14: /usr/lib/dri/radeon_dri.so(radeonCopyBuffer+0x102) [0x30a960]
15: /usr/lib/dri/radeon_dri.so(radeonCopySubBuffer+0x6d) [0x307b95]
16: /usr/lib/dri/radeon_dri.so [0x304af8]
17: /usr/lib/xorg/modules/extensions//libglx.so [0x1824c4]
18: /usr/lib/xorg/modules/extensions//libglx.so [0x174a55]
19: /usr/lib/xorg/modules/extensions//libglx.so [0x173ae7]
20: /usr/lib/xorg/modules/extensions//libglx.so [0x17863a]
21: /usr/bin/Xorg(Dispatch+0x34f) [0x8085e9f]
22: /usr/bin/Xorg(main+0x47d) [0x806b71d]
23: /lib/libc.so.6(__libc_start_main+0xe5) [0x3bf6d5]
24: /usr/bin/Xorg [0x806ab01]
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.
...

--- Additional comment from matthias_haase on 2008-11-30 08:20:15 EDT ---

May be my problem is different... Booting with nomodeset doesn't change the broken suspend/hypernate/resume. These functions were always stable on FC9.

--- Additional comment from matthias_haase on 2008-11-30 10:06:34 EDT ---

Minutes ago I have tried Dave Airlie's build on koji kernel-2.6.27.7-132.fc10.i686. Unfortunately this doesn't fix the described error above for IBM Thinkpad T40 using R100 (up to T43).

My xorg.conf contains
Section "Device"
    Identifier "Videocard0"
    Driver "radeon"
    Option "RenderAccel" "true"
    Option "AGPMode" "4"
    Option "GARTSize" "128"
    Option "AGPSize" "128"
    #Option "AGPFastWrite" "true"
    Option "EnableDepthMoves" "true"
    Option "AccelDFS" "true"
    Option "AccelMethod" "XAA"
    Option "XAANoOffscreenPixmaps" "true"
    Option "ColorTiling" "on"
    Option "DynamicClocks" "on"
    Option "SWcursor" "off"
EndSection

(3d is working fast enough).

--- Additional comment from shamardin on 2008-12-03 01:41:55 EDT ---

I can confirm the problem. The smolt profile is here:
http://www.smolts.org/client/show/pub_d33f4595-a01e-49c9-9ba8-e363b8ffccfa

I've got these messages in logs:

Dec  2 12:25:35 lopeptoid kernel: Suspending console(s) (use no_console_suspend
to debug)
Dec  2 12:25:35 lopeptoid kernel: [drm:drm_bo_evict_mm] *ERROR* lru empty
Dec  2 12:25:35 lopeptoid kernel: [drm] Num pipes: 1
...
Dec  2 12:25:35 lopeptoid kernel: pci 0000:01:00.0: PCI INT A -> GSI 16 (level,
low) -> IRQ 16
Dec  2 12:25:35 lopeptoid kernel: [drm:radeon_resume] *ERROR* 
Dec  2 12:25:35 lopeptoid kernel: [drm] Loading R500 Microcode
Dec  2 12:25:35 lopeptoid kernel: [drm] Num pipes: 1
Dec  2 12:25:35 lopeptoid kernel: [drm] writeback test failed
Dec  2 12:25:35 lopeptoid kernel: [drm:drm_ttm_bind] *ERROR* Couldn't bind
backend.
Dec  2 12:25:35 lopeptoid kernel: executing set pll
Dec  2 12:25:35 lopeptoid kernel: executing set crtc timing
Dec  2 12:25:35 lopeptoid kernel: [drm] LVDS-8: set mode 1280x800 10
Dec  2 12:25:35 lopeptoid kernel: executing set LVDS encoder
Dec  2 12:25:35 lopeptoid kernel: Restarting tasks ... done.

I've also found a workaround. I have disabled that fancy boot stuff, I mean
"nomodeset" option to the kernel in /etc/grub.conf and suspend worked without
problems already for 6 times.

Some additional remarks:

1. I've discovered that the machine is not completely dead, it just pretends to
be. At least if you have a second machine around you still can ssh to a
semi-dead host and reboot it remotely.

2. Switching to radeonhd driver partially fixes the problem: machine gets back
from suspend, but the picture on the screen is covered with dotty garbage. But
it is still usable enough to make a clean reboot and may be even save some
files before that. Switched back to radeon.

--- Additional comment from shamardin on 2008-12-03 01:44:37 EDT ---

*** Bug 473340 has been marked as a duplicate of this bug. ***

--- Additional comment from airlied on 2008-12-03 01:55:45 EDT ---

can someone do a boot with drm.debug=1 then suspend/resume and ssh in afterwards and get the logs? and attach the full log here.

--- Additional comment from shamardin on 2008-12-03 02:21:17 EDT ---

Created an attachment (id=325492)
/var/log/messages for boot with drm.debug=1 with failed suspend

Here you go.

A strange thing: when I booted with drm.debug=1 after resume I've got garbled screen and no wireless (not sure about last thing, may be I should wait a bit longer). Booting without nomodeset and without drm.debug brings empty screen and working wireless on my system.

--- Additional comment from matthias_haase on 2008-12-03 02:55:58 EDT ---

Booting with drm.debug=1 as grub kernel arg results in "unknown kernel option" for me. There are error messages..

08:35:21 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:22 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:22 thinkpad kernel: <0x51, dev 0xe200, auth=1
Dec  3 08:35:22 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:22 thinkpad kernel: <70x51, dev 0xe200, auth=1
Dec  3 08:35:22 thinkpad ntpd[1956]: Listening on interface #7 eth1, 192.168.3.121#123 Enabled
Dec  3 08:35:22 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:23 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:23 thinkpad kernel: 0x51, dev 0xe200, auth=1
Dec  3 08:35:23 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:23 thinkpad kernel: 0x51, dev 0xe200, auth=1
Dec  3 08:35:24 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:24 thinkpad kernel: 0x51, dev 0xe200, auth=1
Dec  3 08:35:24 thinkpad kernel: 0x51, dev 0xe200, auth=1
Dec  3 08:35:25 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:25 thinkpad kernel: <0x51, dev 0xe200, auth=1
Dec  3 08:35:25 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:25 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:25 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:26 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:26 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:26 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:27 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:28 thinkpad kernel: 0x51, dev 0xe200, auth=1
Dec  3 08:35:28 thinkpad kernel: <7_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:29 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:30 thinkpad kernel: <70x51, dev 0xe200, auth=1
Dec  3 08:35:31 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:31 thinkpad kernel: 0x51, dev 0xe200, auth=1
Dec  3 08:35:32 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:33 thinkpad kernel: <7_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:34 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:34 thinkpad kernel: <0x51, dev 0xe200, auth=1
Dec  3 08:35:34 thinkpad kernel: <0x51, dev 0xe200, auth=1
Dec  3 08:35:35 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:35 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:36 thinkpad kernel: 0x51, dev 0xe200, auth=1
Dec  3 08:35:36 thinkpad kernel: <0x51, dev 0xe200, auth=1
Dec  3 08:35:37 thinkpad kernel: <0x51, dev 0xe200, auth=1
Dec  3 08:35:37 thinkpad kernel: 0x51, dev 0xe200, auth=1
Dec  3 08:35:38 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:39 thinkpad kernel: 0x51, dev 0xe200, auth=1
Dec  3 08:35:39 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:39 thinkpad kernel: <7_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:39 thinkpad kernel: <0x51, dev 0xe200, auth=1
Dec  3 08:35:40 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:40 thinkpad kernel: <0x51, dev 0xe200, auth=1
Dec  3 08:35:41 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:41 thinkpad kernel: 0x51, dev 0xe200, auth=1
Dec  3 08:35:41 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:43 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:43 thinkpad kernel: <0x51, dev 0xe200, auth=1
Dec  3 08:35:44 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:46 thinkpad kernel: <0x51, dev 0xe200, auth=1
Dec  3 08:35:46 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:47 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:47 thinkpad kernel: 0x51, dev 0xe200, auth=1
Dec  3 08:35:48 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:48 thinkpad kernel: 0x51, dev 0xe200, auth=1
Dec  3 08:35:48 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:48 thinkpad kernel: <0x51, dev 0xe200, auth=1
Dec  3 08:35:49 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:49 thinkpad kernel: <70x51, dev 0xe200, auth=1
Dec  3 08:35:49 thinkpad kernel: 0x51, dev 0xe200, auth=1
Dec  3 08:35:49 thinkpad kernel: <0x51, dev 0xe200, auth=1
Dec  3 08:35:50 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:50 thinkpad kernel: <_ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:50 thinkpad kernel: <0x51, dev 0xe200, auth=1
Dec  3 08:35:50 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:51 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
Dec  3 08:35:51 thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev 0xe200

--- Additional comment from matthias_haase on 2008-12-03 03:09:44 EDT ---

Because I have no other box available at home, I'll post /var/log/messages here tommorow after ssh login. "thinkpad kernel: _ioctl] pid=2061, cmd=0xc0086451, nr=0x51, dev
0xe200, auth=1" was written while wrong suspend has happen, but is useless info (for me).

--- Additional comment from shamardin on 2008-12-04 02:35:17 EDT ---

*** Bug 474035 has been marked as a duplicate of this bug. ***

--- Additional comment from matthias_haase on 2008-12-04 03:34:02 EDT ---

Created an attachment (id=325650)
/var/log/messages via ssh while wrong suspending on IBM T40

--- Additional comment from matthias_haase on 2008-12-04 03:36:30 EDT ---

Created an attachment (id=325651)
related xorg.log (wrong suspend)

--- Additional comment from matthias_haase on 2008-12-04 03:37:26 EDT ---

Created an attachment (id=325652)
dmesg terminal output (wrong suspend)

--- Additional comment from matthias_haase on 2008-12-04 03:43:45 EDT ---

Required files are there now...
I'm not sure about drm.debug=1 "unknown command... ignoring" message.

dmesg: 
[drm:drm_ioctl] pid=2043, cmd=0xc0086451, nr=0x51, dev 0xe200, auth=1
[drm:radeon_cp_getparam] pid=2043
..

(see last attachement)

--- Additional comment from david.au on 2008-12-07 07:43:10 EDT ---

My Clevo D870P notebook ( http://smolts.org/show?uuid=pub_483d83b6-cdb4-456c-bebb-42f8c704839a ) which works fine with f8 and f9 has an ATI Mobility radeon 9700 chipset whose dmesg details are in duplicate bug https://bugzilla.redhat.com/show_bug.cgi?id=473340 exhibits this behaviour...

Without the "nomodeset" option, I find that once the computer has been suspended and then resumed, that all windows are rendered without any symbols (minimise, maximise, close) in the top right corner of the windows, and instead I get corruption - it seems that the minimise, maximise and close are being wrongly rendered, corrupting the display.  An easy way to reproduce the corruption is to open a gnome-terminal windows and grab the bottom of it and resize the windows vertically repeatedly increasing and decreasing the size of the window...the more I do this, the more corruption I see on screen.  It might be that the resize is causing the re-rendering the minimise, maximise, and close and hence the corruption.

With the "nomodeset" kernel option, I don't see any screen corruption on resume, but I do see mouse-pointer corruption, for example the mouse-pointer that is supposed to show when resizing a window becomes a dotty mess.

--- Additional comment from airlied on 2008-12-08 22:47:21 EDT ---

So I need another try, I need to see the dmesg after the resume not /var/log/messages with drm.debug=1

as it appears to lose stuff in /var/log/messages.

--- Additional comment from david.au on 2008-12-09 00:51:39 EDT ---

Dave Airlie, does this link provide the info you need?

https://bugzilla.redhat.com/show_bug.cgi?id=473340

[the above bug is now a duplicate of this one]

--- Additional comment from airlied on 2008-12-09 01:05:55 EDT ---

nope, it only has the dmesg, I need it drm debugging enabled.

btw enabling debugging before suspend might produce less crap..

echo 1 > /sys/module/drm/parameters/debug
pm-suspend --quirk-none
echo 0 > /sys/module/drm/parameters/debug

--- Additional comment from matthias_haase on 2008-12-11 04:08:22 EDT ---

(In reply to comment #20)
> nope, it only has the dmesg, I need it drm debugging enabled.
> 
> btw enabling debugging before suspend might produce less crap..
> 
> echo 1 > /sys/module/drm/parameters/debug
> pm-suspend --quirk-none
> echo 0 > /sys/module/drm/parameters/debug

Done, dmesg after resume.. attached... same crap I think.
(updated to 2.6.27.7-134.fc10.i686 kernel). 

I see the desktop and can move the mouse after resume... but desktop is frozen and unusable.

--- Additional comment from matthias_haase on 2008-12-11 04:11:34 EDT ---

Created an attachment (id=326596)
dmesg after suspend on kernel 2.6.27.7-134.fc10.i686

--- Additional comment from tim on 2008-12-11 05:17:46 EDT ---

Created an attachment (id=326599)
dmesg after suspend on kernel 2.6.27.7-134.fc10.x86_64

dmesg output after suspend. Executed immediately after pm-suspend returned. It's truncated at the beginning, but the resume output is complete. Sufficient?

--- Additional comment from tim on 2008-12-17 18:05:12 EDT ---

Problem persists with kernel-2.6.27.9-159.fc10.x86_64 and xorg-x11-drv-ati-6.9.0-62.fc10.x86_64. Recently I've seen the following in the log:

During suspend, immediately after "Suspending console"
[drm:drm_bo_evict_mm] *ERROR* lru empty

During resume:
[drm:radeon_resume] *ERROR* 
[drm] Loading R500 Microcode
[drm] Num pipes: 1
[drm] writeback test failed

Hope that helps.

--- Additional comment from matthias_haase on 2009-01-26 13:21:01 EDT ---

Same as described early for 2.6.27.12-2.5.fc10.i686 from updates-testing on IBM Thinkpad T40 :-(  [ATI Technologies Inc Radeon Mobility M7 LW [Radeon Mobility 7500]] 
No problem before with FC9 on T40, kernel 2.6.26.*-.

@Dave: Can I do some more precisely debugging for you?

-- 
Regards from Germany
                   Matthias

--- Additional comment from matthias_haase on 2009-01-26 13:33:58 EDT ---

(In reply to comment #25)
Reply to myself:
No problems for suspend / resume using for these functions from gdm without the use of DRI.

--- Additional comment from matthias_haase on 2009-03-04 15:58:42 EDT ---

Latest update to 2.6.27.19-170.2.35.fc10 does the fix!
This bug can be closed now.

--- Additional comment from a.meganov on 2009-03-05 07:31:59 EDT ---

Not for me. Still the same effect. Dark screen, but the OS is functional.

--- Additional comment from tim on 2009-03-12 13:58:47 EDT ---

Hasn't fixed it for me either. Still the same.

--- Additional comment from tim on 2009-06-11 18:52:56 EDT ---

kernel-2.6.27.24-170.2.68.fc10.x86_64 is still broken. Haven't tried F-11, yet. Has anyone else, is it fixed?

--- Additional comment from steven on 2009-06-11 20:13:49 EDT ---

(In reply to comment #30)
> kernel-2.6.27.24-170.2.68.fc10.x86_64 is still broken. Haven't tried F-11, yet.
> Has anyone else, is it fixed?  

I'm seeing the problem in a stock install of Fedora 11 (T60 with X1400).

--- Additional comment from rivanvx on 2009-09-06 05:29:48 EDT ---



*** This bug has been marked as a duplicate of 464866 ***

--- Additional comment from rivanvx on 2009-09-06 05:39:48 EDT ---

Sorry, wrong tab.

--- Additional comment from tim on 2009-09-29 05:55:24 EDT ---

Still a problem with F-11. Any progress on this?

--- Additional comment from phuang on 2009-10-07 22:00:00 EDT ---

This problem still happens in rawhide.

kernel-PAE-2.6.31.1-56.fc12.i686
xorg-x11-drv-ati-6.13.0-0.7.20091006git457646d73.fc12.i686
xorg-x11-server-Xorg-1.7.0-1.fc12.i686

Comment 1 Peng Huang 2009-10-08 02:51:15 UTC
Created attachment 364054 [details]
dmesg after pm-suspent

[phuang@phuang-notebook ~]$ lspci |grep VGA
01:00.0 VGA compatible controller: ATI Technologies Inc M52 [Mobility Radeon X1300]

After below commands, I got the dmesg output.
1> switch vt from X to text console
2> echo 1 > /sys/module/drm/parameters/debug
3> pm-suspend --quirk-none

Comment 2 Peng Huang 2009-10-08 03:02:56 UTC
Created attachment 364056 [details]
The photo of my screen is suspend/resume in X

Comment 3 Tim Niemueller 2009-10-08 12:48:39 UTC
Exactly the same here.

Comment 4 Dave Airlie 2009-10-09 04:33:37 UTC
can you retry with the 2.6.31.1-65 or higher kernel from koji?

Comment 5 Peng Huang 2009-10-09 05:53:44 UTC
Just tired 2.6.31.1-65.fc12.i686.PAE.
The problem still happens.

But the dmesg output is different. The old error is
*ERROR* radeon: couldn't schedule IB(11)., after update the kernel, IB(11) changed to IB(13).


ADDRCONF(NETDEV_UP): eth0: link is not ready
Registered led device: iwl-phy0::radio
Registered led device: iwl-phy0::assoc
Registered led device: iwl-phy0::RX
Registered led device: iwl-phy0::TX
ADDRCONF(NETDEV_UP): wlan0: link is not ready
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(13).
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !

Comment 6 Jérôme Glisse 2009-10-16 15:35:38 UTC
Hello Peng

I have discussed with Dave about your issue so it seems VRAM is not properly restored after suspend, we need you to run 3 tools to get information:
http://people.freedesktop.org/~glisse/atomtools.tar.bz2
http://people.freedesktop.org/~glisse/vbetool-0.7.tar.bz2

You will need to install before building each tools
(for vbetool make should be enough, for atomtools: cmake . then make)

On you got the tools built we will need you to run each of them in init 3 without kms (add "radeon.modeset=0 3" to your kernel boot parameters).

Then do the following as root:
./vbetool post 2> x1400-bios-post
./atomtools > x1400-asicinit-asm
./atomtoools2 > x1400-asicinit-reg

And attach the 3 files x1400-bios-post, 1400-asicinit-asm, x1400-asicinit-reg to the bug report. Thanks for your help.

Comment 7 Jérôme Glisse 2009-10-16 15:36:04 UTC
Sorry you need libpciaccess-devel.i686 package

Comment 8 Peng Huang 2009-10-19 02:28:52 UTC
Created attachment 365188 [details]
x1300-bios-post

Comment 9 Peng Huang 2009-10-19 02:31:39 UTC
Created attachment 365190 [details]
x1300-asicinit-asm

After executing atomtools, my screen can not display text correctly.

Comment 10 Peng Huang 2009-10-19 02:32:22 UTC
Created attachment 365191 [details]
x1300-asicinit-reg

Comment 11 Peng Huang 2009-10-19 02:40:44 UTC
After executing ./atomtools > x1400-asicinit-asm, my screen can not display correctly.
And I tried to execute `vbetool post 2` again, it can recover my display, and everything will be OK.

Comment 12 Jérôme Glisse 2009-10-19 18:03:16 UTC
I need more dump, can you please download :
http://people.freedesktop.org/~glisse/radeondump.tar.bz2

And do the following:
After a cold boot in init 3 without KMS (add radeon.modeset=0 to kernel cmdline):
sudo ./radeondump -d x1400.regs x1400-init3

Then suspend/resume launch vbetool post and then:
sudo ./radeondump -d x1400.regs x1400-init3-resume

Then reboot with kms enabled, suspend/resume and do through ssh:
sudo ./radeondump -d x1400.regs x1400-kms-resume

Hopefully it should give us enough information to address this bug :)

Comment 13 Peng Huang 2009-10-20 02:07:00 UTC
Created attachment 365296 [details]
clean boot without kms, initlevel = 3

Comment 14 Peng Huang 2009-10-20 02:08:12 UTC
Created attachment 365297 [details]
without kms, after resume

Comment 15 Peng Huang 2009-10-20 02:09:01 UTC
Created attachment 365298 [details]
without kms, after resume and vbetool post

Comment 16 Peng Huang 2009-10-20 02:09:49 UTC
Created attachment 365299 [details]
with kms, initlevel = 3, before resume

Comment 17 Peng Huang 2009-10-20 02:10:18 UTC
Created attachment 365300 [details]
with kms, initlevel = 3, after resume

Comment 18 Peng Huang 2009-10-20 02:11:22 UTC
Hope those files could help you resolve this issue.

Comment 19 Jérôme Glisse 2009-10-21 16:52:34 UTC
Can you test if kernel at:
http://people.freedesktop.org/~glisse/

Fix the issue, install with rpm -ivh --nodeps don't worry about few warnings.

Comment 20 Peng Huang 2009-10-21 23:42:58 UTC
Created attachment 365622 [details]
output of dmesg

The problem still happens with the new kernel.

Comment 21 Peng Huang 2009-10-21 23:48:03 UTC
BTW, I have a git clone of upstream linux kernel. I can help you to test patches too.

Comment 22 Peng Huang 2009-10-21 23:54:48 UTC
Created attachment 365625 [details]
dump for new kernel with kms

Comment 23 Peng Huang 2009-10-21 23:55:23 UTC
Created attachment 365626 [details]
dump for new kernel with kms, after resume

Comment 24 Jérôme Glisse 2009-10-27 16:28:21 UTC
Peng please run :
sudo lspci -vvv -nn -xxxx -s 01:00.0
Before and after resume with KMS (replace 01:00.0 by the correct busid of your GPU a simple lspci should give it to you). Attach output as 2 different files. Thanks.

Comment 25 Peng Huang 2009-10-27 23:48:18 UTC
Should I test it on the kernel in comment #19 ? Or use the latest kernel for fedora 12?

Comment 26 Peng Huang 2009-10-28 03:26:51 UTC
Created attachment 366381 [details]
outputs of lspci

Comment 27 Peng Huang 2009-10-28 03:32:51 UTC
Hi Jerome,

This problem also happens in level 3 without Xserver. Why put this bug to xorg-x11-drv-ati?

Comment 28 Dave Airlie 2009-10-28 06:01:46 UTC
Peng we assign kms bugs to X drivers because the kernel gets too many bugs, hopefully we can separate kernel stuff out later.

Comment 29 Jérôme Glisse 2009-10-28 09:48:28 UTC
Can you run the same lcpi command after resume & vbetool post with KMS disable in init 3. I put the to ati because it's easier for us to find ati hw related bug their.

Comment 30 Peng Huang 2009-10-28 10:13:33 UTC
Created attachment 366412 [details]
output of lspci for nokms

Comment 31 Jérôme Glisse 2009-10-28 19:33:49 UTC
Here are my observation before i forget about them :

Register dump i asked are all register the vbetool post ever read or write on
this specific hw. Thus if there is any difference in the way vbetool or KMS restore the card it should reflect in various dumps. It's not the case. The dumps
show that with KMS VGA is disable, PLL are different too (because video mode
setup by KMS and vbetool are different), of course video mode related register
are different.

Interesting things that diff btw dump shows, is that MC is idle on KMS and the
3D pipe configuration isn't restored. MC being idle could be either the source 
of the bug or just reflect the fact that VRAM is not working. If MC is not
properly restored or in bogus state it could report IDLE because it doesn't
answer to any memory request from the GPU. Or if VRAM is not properly restored
MC can simply fail at executing request from the GPU and thus report IDLE.

Otherwise all others register have similar values.

My first attempt to fix the issue tried to reset the MC at resume, i found a bug
in my patch i am working on new one which will do the following (order matter) :
-stop MC at suspend
-reset MC at resume
-restore MC
-ASIC_Init

I am relooking at Atombios dump as i was looking at the wrong disasm of the atom bios tools, to check if vbetool post takes a different path than ASIC_Init.

Comment 32 Robert de Rooy 2009-10-29 09:25:13 UTC
I think I am seeing this problem also on an old ThinkPad T41 with RV250 on resume. I first see this:
radeon 0000:01:00.0: PCI INT A -> Link[LNKA] -> GSI 11 (level, low) -> IRQ 11
Oct 29 10:02:08 t41 kernel: [drm] GPU reset succeed (RBBM_STATUS=0x00000140)
Oct 29 10:02:08 t41 kernel: [drm] radeon: cp idle (0x02000000)
Oct 29 10:02:08 t41 kernel: [drm] radeon: ring at 0x00000000D0000000
Oct 29 10:02:08 t41 kernel: [drm:r100_ring_test] *ERROR* radeon: ring test failed (sracth(0x15E4)=0xCAFEDEAD)
Oct 29 10:02:08 t41 kernel: [drm:r100_cp_init] *ERROR* radeon: cp isn't working (-22).
Oct 29 10:02:08 t41 kernel: radeon 0000:01:00.0: failled initializing CP (-22).
Oct 29 10:02:08 t41 kernel: [drm] LVDS-13: set mode 1400x1050 1e

After which I get a continuous stream of these errors:
[drm:radeon_cs_ioctl] *ERROR* Faild to schedule IB !
[drm:radeon_ib_schedule] *ERROR* radeon: couldn't schedule IB(7).

My display is garbled, but I can switch to a VT. Would it help if I collected the debug data also?

kernel-2.6.31.5-97.fc12.i686
xorg-x11-drv-ati-6.13.0-0.10.20091006git457646d73.fc12.i686
xorg-x11-server-Xorg-1.7.1-1.fc12.i686

Comment 33 Robert de Rooy 2009-10-29 09:35:23 UTC
not sure if it helps, but after installing the -104 kernel from koji, IB(7) changed to IB(11)

Comment 34 Jérôme Glisse 2009-10-29 14:04:53 UTC
Robert i am confident your issue is different. Please open a new bug with following bug title:
RADEON:RV250:KMS Suspend/Resume fails (ThinkPad T41)

Attach full output of lspci -v and full dmesg after resume. Thanks.

Comment 35 Jérôme Glisse 2009-10-29 15:47:15 UTC
Created attachment 366649 [details]
Stop mc at suspend and reset it at resume

Please try the attached patch it apply on top of lastest drm-next branch of Dave repo.

Comment 36 Christophe Saout 2009-11-02 12:25:38 UTC
I'm seeing the same issue on my T60 with the X1400 as in comment #5.  Everything works until I suspend/resume, after which the whole screen is garbled and blinking, but otherwise responsive.  This is with 2.6.32-rc5-git4.  Unfortunately the drm-next branch didn't work (unclear why, produced a hard lockup), so I stuck with drm-linus which seemed to contain a few safe bugfixes.

After applying your proposed workaround patch, things are still not working, and the error message you added is triggered: "[drm] (rv370_pcie_gart_set_page 78) VRAM seems to not work properly !", which seems to get emitted every time I am switching between a VT and the X server (none of which puts the graphics card into a sane state), with tons of the "couldn't schedule IB(15)" around them.

Comment 37 Jérôme Glisse 2009-11-02 13:58:07 UTC
Peng, Christophe can you try to build :
http://people.freedesktop.org/~glisse/radeonvram.tar.bz2

You will need libpciaccess-dev (iirc name correctly). Than boot with KMS enabled in init 3 (add 3 to kernel boot cmd line). Suspend/resume and on resume when you get garbled screen run radeondump program (which is in radeonvram.tar.bz2) as root and report the output of the program, you likely need to do all this through ssh from another computer. Thanks.

Comment 38 Christophe Saout 2009-11-02 14:23:50 UTC
Before suspending:

Found card 1002:7145 RV515
  region: (base: 0x0000000000000000, bus: 0x00000000D8000000, size: 134217728, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000002000, size: 256, is_io: 1)
  region: (base: 0x0000000000000000, bus: 0x00000000EE100000, size: 65536, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000000000, size: 0, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000000000, size: 0, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000000000, size: 0, is_io: 0)
BUS_CNTL:       0x00000001
CONFIG_CNTL:    0x00020100
CONFIG_MEMSIZE: 0x08000000
COMMAND|STATUS: 0x00100107
vram_test_hdp succeed

After resuming:

Found card 1002:7145 RV515
  region: (base: 0x0000000000000000, bus: 0x00000000D8000000, size: 134217728, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000002000, size: 256, is_io: 1)
  region: (base: 0x0000000000000000, bus: 0x00000000EE100000, size: 65536, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000000000, size: 0, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000000000, size: 0, is_io: 0)
  region: (base: 0x0000000000000000, bus: 0x0000000000000000, size: 0, is_io: 0)
BUS_CNTL:       0x00000001
CONFIG_CNTL:    0x00020000
CONFIG_MEMSIZE: 0x08000000
COMMAND|STATUS: 0x20100107
vram_test_hdp failed
vram_test_gpu succeed

Comment 39 Christophe Saout 2009-11-02 14:26:58 UTC
Sorry, I cut off the last line containing a "vram_test_gup succeed" from the first output before suspending.  This is BTW with the last patch you posted (where you attempt to reset the part that supposedly broke).

Comment 40 Jérôme Glisse 2009-11-02 15:06:46 UTC
*** Bug 522253 has been marked as a duplicate of this bug. ***

Comment 41 Christophe Saout 2009-11-02 18:57:39 UTC
Just an interesting observation from just now:

I put my laptop into suspend out of habit, and now that resumed it about an hour later (as opposed to my tests where I always suspended it for like 5 seconds), surprisingly it came back correctly.

It shows:

BUS_CNTL:       0x00000001
CONFIG_CNTL:    0x00020000
CONFIG_MEMSIZE: 0x08000000
COMMAND|STATUS: 0x00100107
vram_test_hdp succeed
vram_test_gpu succeed

While CONFIG_CNTL is 0x20000 like for the non-working case after resuming, COMMAND|STATUS doesn't have 0x20 in the upper byte, but 0x00 instead like before resuming.

Comment 42 Jérôme Glisse 2009-11-03 10:33:49 UTC
So quick comment it seems vram is properly working but that we can't access it through HDP (pci aperture). I will do a patch to reset hdp after resume to see if it helps. (Btw this is kind of good news as it means VRAM is likely properly restored by atombios which was my feeling).

Comment 43 Jérôme Glisse 2009-11-04 12:43:35 UTC
Created attachment 367460 [details]
Reset HostDataPath at resume

Please test this patch which apply on top of drm-next branch of Dave repo:
git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git

I can generate rpm if you want but this will take me sometime.

Comment 44 Christophe Saout 2009-11-04 14:53:44 UTC
I'm afraid to tell that this patch doesn't make any difference.  If the power is plugged in, it always comes back in a broken state (and if it is not, chances are good that it does, as before)  So, I guess it must be something else.  If I had the slightest idea how modern graphics hardware works, I could have tried to help you figuring things out, but unfortunately I don't...

Comment 45 Matěj Cepl 2009-11-05 17:17:33 UTC
Since this bugzilla report was filed, there have been several major updates in various components of the Xorg system, which may have resolved this issue. Users who have experienced this problem are encouraged to upgrade their system to the latest version of their packages (at least F12Beta, but even better if the very latest versions).

Please, if you experience this problem on the up-to-date system, let us now in the comment for this bug, or whether the upgraded system works for you.

If you won't be able to reply in one month, I will have to close this bug as INSUFFICIENT_DATA. Thank you.

[This is a bulk message for all open Fedora Rawhide Xorg-related bugs. I'm adding myself to the CC list for each bug, so I'll see any comments you make after this and do my best to make sure every issue gets proper attention.]

Comment 46 David Campbell 2009-11-05 21:55:55 UTC
This problem still happens in F12 beta on a Clevo D870P with Mobility Radeon 9700.

Comment 47 Jérôme Glisse 2009-11-06 08:35:36 UTC
David you more than likely have another issue, this one is specific to X1400 on T60. Please open a new bug with dmesg, Xorg.log and a description of what you are seeing on resume. Also if it's an AGP GPU try booting with radeon.agpmode=-1 and report in the bug if it works when doing that. Thanks.

Comment 48 Jérôme Glisse 2009-11-06 09:25:10 UTC
Created attachment 367798 [details]
X1400 restore mc+hdp before asic_init and put vram at 0x10000000

Please try this patch, top of drm-next again, hope it works

Comment 49 Peng Huang 2009-11-09 07:37:12 UTC
The last two patches still have the problem.
BTW, this week, I am on trip, so can not get more logs.

Comment 50 Jérôme Glisse 2009-11-09 14:55:47 UTC
Created attachment 368232 [details]
X1400 restore mc+hdp before asic_init and put vram at 0x10000000 + VGA HDP

Please test new patch. In this version i program the VGA HDP, maybe some VGA stuff happens at one point. Crossing fingers, but i don't think this one will help much.

Comment 51 Ferry Huberts 2009-11-09 16:37:37 UTC
ok, a step back :-(
with the updates of today my laptop doesn't even boot anymore with KMS. I get a hard hang during modeset. using nomodeset allows the laptop to boot.

kernel.x86_64                 2.6.31.5-127.fc12
xorg-x11-drv-ati.x86_64       6.13.0-0.10.20091006git457646d73.fc12
xorg-x11-server-Xorg.x86_64   1.7.1-7.fc12
mesa-dri-drivers.x86_64       7.6-0.13.fc12

Comment 52 Christophe Saout 2009-11-09 16:43:06 UTC
Yes, I saw this too with drm-next at some point, since then I decided to stick with drm-linus.  Anyhow, no luck with the latest patch either. :(

Can anyone of you confirm the strange effect that 90% of the times everything comes back as it should if you have the notebook unplugged when resuming?

Comment 53 Peng Huang 2009-11-09 23:40:54 UTC
(In reply to comment #50)
> Created an attachment (id=368232) [details]
> X1400 restore mc+hdp before asic_init and put vram at 0x10000000 + VGA HDP
> 
> Please test new patch. In this version i program the VGA HDP, maybe some VGA
> stuff happens at one point. Crossing fingers, but i don't think this one will
> help much.  

Hi Jerome,
Which version does the patch base on? I can not apply the patch on drm-next of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6.git successfully.

Comment 54 Jérôme Glisse 2009-11-10 15:04:57 UTC
Created attachment 368412 [details]
Shutdown lvds, force mc to be on, dump regs

Please try new patch, should apply cleanly on top of drm-next. This one take a different path i try to shutdown things at suspend and reactivate them at resume it also dump few registers which might be helpfull to further debug the issue. Please try it and attach full dmesg after a suspend/resume cycle. Thanks

Comment 55 Peng Huang 2009-11-10 22:45:20 UTC
Created attachment 368963 [details]
dmesg output

The problem and NMI still happens with last patch.

Comment 56 Christophe Saout 2009-11-11 10:40:12 UTC
I see the same effect. Also, nothing useful in the debug output.

I was wondering what could trigger the NMI.  I read somewhere (I know that is not a very reliable citation, was somewhere in a forum) that it doesn't need to be the device itself, it might also be the bus.  I looked at lspci output and also checked the PCIE bridge.  It is some sort of memory access from the CPU through a PCIE "aperture" that isn't working, right?  Can the bridge be at fault perhaps?

Here are the diffs for the bridge (00:01.0) and the GPU (01:00.0) between a successful resume (with power unplugged) and a failed resume:

Here the working full output of lspci -vvv 00:01.0

00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express PCI Express Root Port (rev 03) (prog-if 00 [Normal decode])
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: 00002000-00002fff
	Memory behind bridge: ee100000-ee1fffff
	Prefetchable memory behind bridge: 00000000d8000000-00000000dfffffff
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [88] Subsystem: Lenovo Device 2014
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit-
		Address: 00000000  Data: 0000
	Capabilities: [a0] Express (v1) Root Port (Slot+), MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- RBE- FLReset-
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		LnkCap:	Port #2, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <1us, L1 <4us
			ClockPM- Surprise- LLActRep- BwNot-
		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		SltCap:	AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surpise-
			Slot #  1, PowerLimit 75.000000; Interlock- NoCompl-
		SltCtl:	Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
			Control: AttnInd Off, PwrInd On, Power- Interlock-
		SltSta:	Status: AttnBtn- PowerFlt- MRL- CmdCplt+ PresDet+ Interlock-
			Changed: MRL- PresDet- LinkState-
		RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
		RootCap: CRSVisible-
		RootSta: PME ReqID 0000, PMEStatus- PMEPending-
	Capabilities: [100] Virtual Channel <?>
	Capabilities: [140] Root Complex Link <?>
	Kernel driver in use: pcieport

and the diff to after a failed resume:

@@ -1,6 +1,6 @@
 00:01.0 PCI bridge: Intel Corporation Mobile 945GM/PM/GMS, 943/940GML and 945GT Express PCI Express Root Port (rev 03) (prog-if 00 [Normal decode])
 	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
-	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
+	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR+ <PERR- INTx-
 	Latency: 0, Cache Line Size: 64 bytes
 	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
 	I/O behind bridge: 00002000-00002fff
@@ -21,7 +21,7 @@
 		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
 			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
 			MaxPayload 128 bytes, MaxReadReq 128 bytes
-		DevSta:	CorrErr+ UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
+		DevSta:	CorrErr+ UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransPend-
 		LnkCap:	Port #2, Speed 2.5GT/s, Width x16, ASPM L0s L1, Latency L0 <1us, L1 <4us
 			ClockPM- Surprise- LLActRep- BwNot-
 		LnkCtl:	ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk-

I am just wondering, the "Uncorr Err" and "UnsuppReq" went to +.  Also, for the GPU MAbort went to + as well.

Comment 57 Dave Airlie 2009-11-12 06:01:23 UTC
Two patches here

http://people.freedesktop.org/~airlied/scratch/0001-drm-radeon-kms-fix-handling-of-d1-d2-vga.patch
http://people.freedesktop.org/~airlied/scratch/0002-drm-radeon-kms-read-back-register-before-writing-in-.patch

Can you guys please try them, I'll try and make a Fedora kernel with them in it ASAP, they are also on the drm-radeon-testing of my drm-2.6 tree.

I've tested them on Peng's laptop with my USB disk, hopefully when he tests them with his normal install they also work.

Comment 58 Christophe Saout 2009-11-12 10:17:35 UTC
I almost feel bad by telling you this, but I am now running 2.6.32-rc6 + drm-radeon-testing and made sure these two patches are in it - but it's still giving the same results as before. If the notebook is unplugged on resume, there's a 90% of it coming back correctly, otherwise not.

Comment 59 Christophe Saout 2009-11-12 11:04:46 UTC
In order to avoid any confusion: With "unplugged" I am referring to the power, not an external monitor (as in the description of patch no 1).

Comment 60 Christophe Saout 2009-11-14 13:37:37 UTC
OMG, I'm so stupid.  I was actually booting the wrong kernel when doing my last test (I made sure the modules contained the patch but I never checked if I was actually loading it).

I can confirm this patch is fixing the issue for me.  Great. :-)

Sorry for the confusion, I hope I didn't cause any additional work.

Comment 61 Mark Gunstrom 2009-11-14 17:13:31 UTC
Dave's patches also fixed the suspend/resume problem on a KMS-enabled T60/x1400 for me.

Comment 62 Peng Huang 2009-11-16 05:25:58 UTC
The patch can fix this S/R problem. Thanks

Comment 63 Bug Zapper 2009-11-16 13:24:00 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 64 Matěj Cepl 2009-11-16 15:33:08 UTC
Thank you for letting us know.

Comment 65 Tim Niemueller 2009-11-16 17:17:18 UTC
Will this update be available for F-12 (closed as rawhide)?