65384 – Radeon Mobility M6 and IBM Thinkpad X22 lockup on apm resume

Bug 65384 - Radeon Mobility M6 and IBM Thinkpad X22 lockup on apm resume

Summary: Radeon Mobility M6 and IBM Thinkpad X22 lockup on apm resume

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Linux
Classification:	Retired
Component:	XFree86
Sub Component:
Version:	7.3
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Mike A. Harris
QA Contact:	David Lawrence
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	62067 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2002-05-22 22:24 UTC by Robert Spier
Modified:	2007-04-18 16:42 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2005-04-20 14:49:28 UTC
Embargoed:

Attachments	(Terms of Use)
To help trace the problem, here is the lspci output for my X22 Laptop. (8.16 KB, text/plain) 2002-05-28 07:41 UTC, Need Real Name	no flags	Details
View All

Description Robert Spier 2002-05-22 22:24:50 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0rc3) Gecko/20020519

Description of problem:
After loading the radeon and agpgart modules and starting X, I can not resume
from an apm suspend.  It starts - and then hangs attempting to resume. 
(Somewhere in the BIOS resume function.)

There are no problems (except for speed) if the standard 'vesa' XFree86 driver
is used.  (But it would be really really nice to have the accelerated driver.)

This may be related to bug #37694.


Additional comment by sahai 2002-05-16 14:28:45:
I can confirm the problem on the IBM Thinkpad X22 under RH7.3 (both 2.4.18-3 and
2.4.18-4). The hardware is a ATI Radeon Mobility M6 LY and the problem is
related in some way to the use of the radeom and agpgart modules and using X.

If I start X using a configuration that doesn't have 3d accelleration, then the
radeon and agpgart modules are not loaded and there is no trouble with suspend
and resume. It works regardless of whether I suspend while in a VT or in X. Just
"modprobe agpgart; modprobe radeon" doesn't make the system un-suspendable either. 

But if I run X with 3d enabled, then the suspend apparently succeeds, but the
subsequent resume hangs forever with a flashing crescent light on the laptop.
Exiting X before trying to suspend does not help and neither does exiting X AND
"rmmod radeon; rmmod agpgart" The modules unload but the system will still hang
on suspend-and-resume. Weirdly enough, this happens even if I start X up again
after removing the modules without 3d enabled. 

It is as though there is something about starting X in 3d mode on the Radeon
that puts the system in a state from which it does not want to return.

Comment 1 Robert Spier 2002-05-23 03:39:03 UTC

I played around with this a little more tonight.

If I disable agpgart with 'alias agpgart off' in /etc/modules.conf, then we get
some interesting results.  
  1- the first time I run startx, the kernel oopses, as below.
  2- The text console is horked, but I can still type, and if I run startx
again, X starts fine, albeit unaccelerated.  I can suspend and resume just fine.  

  Of course, during my playing, I was also able to get the machine to
spontaneously reboot on starting the X server, with various combinations of
installing agpgart, removing it, and various versions of radeon.o.  (Not that
any of that is useful, but something is definitely odd.)


[drm:radeon_do_init_cp] *ERROR* PCI GART not yet supported for Radeon!
Unable to handle kernel NULL pointer dereference at virtual address 0000001c
 printing eip:
d88d4fff
*pde = 17934067
*pte = 00000000
Oops: 0000
radeon ds yenta_socket pcmcia_core eepro100 ipchains usb-uhci usbcore ext3 jbd
CPU:    0
EIP:    0010:[<d88d4fff>]    Not tainted
EFLAGS: 00013246

EIP is at radeon_do_cp_idle [radeon] 0x1f (2.4.18-3)
eax: 00000000   ebx: 00000000   ecx: 00000000   edx: 00000001
esi: 00000000   edi: 00000001   ebp: 00000000   esp: d4b5ff44
ds: 0018   es: 0018   ss: 0018
Process X (pid: 1112, stackpage=d4b5f000)
Stack: d4b5ff58 d88d5f85 00000000 00000000 d5297800 00000001 00000001 d5297800 
       d52ad3a0 bffff940 40086442 d88d0e14 d42a65e0 d52ad3a0 40086442 bffff940 
       40086442 ffffffe7 bffff940 d52ad3a0 c0146547 d42a65e0 d52ad3a0 40086442 
Call Trace: [<d88d5f85>] radeon_cp_stop [radeon] 0xf5 
[<d88d0e14>] radeon_ioctl [radeon] 0xe4 
[<c0146547>] sys_ioctl [kernel] 0x217 
[<c0108923>] system_call [kernel] 0x33 


Code: 8b 43 1c 83 f8 18 77 19 6a 18 53 e8 01 15 00 00 59 58 8b 43

Comment 2 Arjan van de Ven 2002-05-27 14:42:26 UTC

Ok interesting; my laptop also has an M6 and resume has never failed for me.
(but it's no IBM)

Comment 3 Need Real Name 2002-05-28 07:41:18 UTC

Created attachment 58724 [details]
To help trace the problem, here is the lspci output for my X22 Laptop.

Comment 4 Derrien 2002-05-28 20:15:43 UTC

We have the same pb with a Compaq N600c (ATI Technologies Inc Radeon Mobility M6
LY) when we are switching from console to console :

startx
CRTL+ALT+F1
CTRL+ALT+F7

We get back to a corrupted X-windows screen and locked keyboard (the
pointer moves but doesn't do anything.)

If you can remote login on the machine you see :

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 1287 root      24  -1 60424  16M  3004 R <  99.4  3.2   9:17 X

Comment 5 Robert Spier 2002-05-29 05:38:57 UTC

My lspci configuration is almost identical to sahai's, except for a different
location for the IDE controller's memory.

--- /tmp/x22.lspci.txt  Tue May 28 20:42:49 2002
+++ /tmp/robert.lspci.txt       Tue May 28 22:35:03 2002
@@ -69,7 +69,7 @@
        Region 2: I/O ports at 0170 [size=8]
        Region 3: I/O ports at 0374
        Region 4: I/O ports at 1860 [size=16]
-       Region 5: Memory at 28000000 (32-bit, non-prefetchable) [size=1K]
+       Region 5: Memory at 18000000 (32-bit, non-prefetchable) [size=1K]
 
 00:1f.3 SMBus: Intel Corp. 82801CA/CAM SMBus (rev 01)
        Subsystem: IBM: Unknown device 0220

Comment 6 Mike A. Harris 2002-06-18 18:33:42 UTC

I wonder if busmastering gets disabled upon APM resume.  This has happened
with other hardware in the past.  If the laptop has busmastering enabled
for video and suspends, then comes back with it disabled, the machine most
likely will hang.

Comment 7 Arjan van de Ven 2002-06-18 18:35:35 UTC

the kernel ought to detect that and printk something... not that you can see it in X



passing resume=force on the kernel commandline will reforce the busmaster bit to on

Comment 8 Robert Spier 2002-06-19 02:19:01 UTC

Updated to 2.4.18-4
Added resume=force to kernel command line.
Updated to latest BIOS.

Had conversation with mharris:

<mharris> Rbrt: It depends on what the real problem turns out to be.  If it is
indeed bus mastering, it is a BIOS flaw on your machine.
<mharris> In that case you may need to switch to a VT before suspending, and
after resuming, run a shell script which enables busmastering again, then switch
back to X.

<mharris> setpci <somereallyobscureoptions>
<mharris> setpci -s 1:0.0 4.L=0187
<mharris> Something like that is supposed to do it.  Replace 1:0.0 with the bus
ID of your video card.

The lspci -s 1:0.0 -x output has 0x87 at offset 4 already, so I believe bus
mastering is on.

But - this doesn't help with the problem.

Even if I switch to a text VT before suspending, the machine never comes back
up.  It just sits in "trying to resume" mode, with the suspend light blinking.

Should mharris be added as a CC to this ticket?
Do you (RH) have an IBM contact?

Comment 9 Robert Spier 2002-06-19 06:47:06 UTC

Supend/Resume works fine if I do not enable DRI.  (i.e. comment out Load "dri"
in XF86Config-4)

Comment 10 Mike A. Harris 2002-06-19 13:10:11 UTC

Another user has reported this also, and it seems his AGP chipset is unsupported
so agpgart wont load.  The Radeon driver then tries to use pcigart and fails.

pcigart support is enabled in X, but "unsupported".  The idea being, if it
works great, if not, no harm done.  Aparently our kernel may have missed
getting the kernel side of this support though so it might not work anyway.

Comment 11 Kostas Georgiou 2002-06-19 13:29:01 UTC

The unknown AGP chipset is an Ali one (device id: 1671)
options agpgart agp_try_unsupported=1 fixes the startup crash
but X still get stuck in a loop somewhere (99% cpu) after a switch to
console and back.
The laptop is an hp xt6200 btw.

Comment 12 Robert Spier 2002-06-19 17:39:17 UTC

The ix86 kernel's don't have PCIGART enabled for the RADEON driver. 
PCIGART_ENABLED is undef except on alphas.  (drivers/char/drm/radeon_cp.c)

The thinkpads have i830 flavor AGP GARTs, which is mostly supported (iirc.)

Comment 13 Robert Spier 2002-06-21 21:05:56 UTC

Related to or possible duplicate of bug 62067.

Comment 14 Mike A. Harris 2002-07-20 13:26:23 UTC

*** Bug 62067 has been marked as a duplicate of this bug. ***

Comment 15 Need Real Name 2003-09-06 08:56:16 UTC

Just wanted to report that this bug is still here in the latest redhat 9. I am
using kernel-2.4.20-20.9 and the XFree86 that comes with RH9 with my IBM X22.

The same workaround, disabling 3d accelleration, continues to avoid the lockups
on suspend.

Comment 16 Mike A. Harris 2003-09-06 10:13:45 UTC

Our kernels do have PCIGART enabled for Radeon, unless someone disabled it
without letting me know about it.  I've CC'd some of our kernel guys for
them to comment on it.  PCIGART on radeon should be enabled in Red Hat Linux 9
and later I believe.  Keep in mind this means "enabled" and not "supported",
the difference being that that means it is supported as-is, and if it happens
to work for someone, that's great, but if it does not work, then we don't
consider it a bug, however if someone debugs the problems they have and
solves them and submits a patch for review, it's possible we might apply
their patch to a future kernel build if it doesn't risk any regression.

Back to this particular bug/issue though..  This problem seems almost certainly
to be something that might be resolved by Charl Botha's DRI-resume patches
perhaps, which as I understand it are a workaround for some broken BIOSs out
there.  The dri-resume patches are both XFree86 and kernel intrusive however,
and there is no intention of applying them to our XFree86 4.3.0 or kernel.
XFree86 4.4.0 once released will support Charl's dri-resume patch however,
and so this problem will likely be resolved automatically in a future Red Hat
Linux release when 4.4.0 gets integrated.

Defering until 4.4.0 is released, or developmental builds are available in
our rawhide tree for future OS development.

Comment 17 Mike A. Harris 2005-04-20 14:49:28 UTC

Since this bugzilla report was filed, there have been several major
updates to the X Window System, which may resolve this issue.  Users
who have experienced this problem are encouraged to upgrade to the
latest version of Fedora Core, which can be obtained from:

        http://fedora.redhat.com/download

If this issue turns out to still be reproduceable in the latest
version of Fedora Core, please file a bug report in the X.Org
bugzilla located at http://bugs.freedesktop.org in the "xorg"
component.

Once you've filed your bug report to X.Org, if you paste the new
bug URL here, Red Hat will continue to track the issue in the
centralized X.Org bug tracker, and will review any bug fixes that
become available for consideration in future updates.

Setting status to "CURRENTRELEASE".

Note You need to log in before you can comment on or make changes to this bug.