194105 – xorg-x11 radeon driver has troubles with setting AGP and locks up

Bug 194105 - xorg-x11 radeon driver has troubles with setting AGP and locks up

Summary: xorg-x11 radeon driver has troubles with setting AGP and locks up

Keywords:
Status:	CLOSED RAWHIDE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	kernel
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Dave Jones
QA Contact:	Brian Brock
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-06-05 20:09 UTC by Michal Jaegermann
Modified:	2015-01-04 22:27 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-10-12 00:14:45 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Michal Jaegermann 2006-06-05 20:09:38 UTC

Description of problem:

I happen to have ATI Technologies Inc R300 AD [Radeon 9500 Pro]
card in my test machine.  It works well and reliably as long
as I will put

        Option      "AGPMode" "8"

in a "Device" section of my /etc/X11/xorg.conf.  With this skipped
an attempt to start X leads to immediately to machine with a blank
screen, locked keyboard and even SysRq key stops working very quickly
even if initially it is possible to use it a bit "in blind".

Checking logs afterwards one can find there:

[drm] Initialized drm 1.0.1 20051102 ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI
16 (level, low) -> IRQ 185 [drm] Initialized radeon 1.24.0 20060225 on minor 0
agpgart: Found an AGP 3.5 compliant device at 0000:00:00.0.
agpgart: Badness. Don't know which AGP mode to set. [bridge_agpstat:1f000a0a
vga_agpstat:ff00021b fell back to:- bridge_agpstat:1f000208 vga_agpstat:ff00021b]
agpgart: Bridge couldn't do AGP x4.
agpgart: Putting AGP V3 device at 0000:00:00.0 into 0x mode
agpgart: Putting AGP V3 device at 0000:01:00.0 into 0x mode
[drm] Setting GART location based on new memory map
[drm] Loading R300 Microcode
[drm] writeback test failed

and we are ready to pull a plug.

With "AGPMode" explicitely set the picture is different:

[drm] Initialized drm 1.0.1 20051102
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 201 [drm]
Initialized radeon 1.24.0 20060225 on minor 0
agpgart: Found an AGP 3.5 compliant device at 0000:00:00.0.
agpgart: Xorg tried to set rate=x12. Setting to AGP3 x8 mode.
agpgart: Putting AGP V3 device at 0000:00:00.0 into 8x mode
agpgart: Putting AGP V3 device at 0000:01:00.0 into 8x mode
[drm] Setting GART location based on new memory map
[drm] Loading R300 Microcode
[drm] writeback test succeeded in 1 usecs 

and I have a picture, no complaints, no lockups or anything of that sort.

As far as I can tell, from what I managed to collect with SysRq,
with "AGPMode" not given we are sitting here:

Call Trace: <ffffffff885529df>{:radeon:radeon_do_wait_for_idle+113}
       <ffffffff8855304f>{:radeon:radeon_cp_idle+0}
<ffffffff88534d74>{:drm:drm_ioctl+371}
       <ffffffff80245975>{do_ioctl+85} <ffffffff8023260f>{vfs_ioctl+598}       
<ffffffff8025047b>{sys_ioctl+89} <ffffffff80261bc1>{tracesys+209}

It does not seem to be anyting special on a list of blocking locks:

S          startx: 2521 [ffff81001fe9d0c0, 125] (not blocked on mutex)
S           xinit: 2537 [ffff810018580780, 123] (not blocked on mutex)
R               X: 2538 [ffff810018588080, 124] (not blocked on mutex)

Simply we are not going anywhere.

Version-Release number of selected component (if applicable):
xorg-x11-drv-ati-6.6.0-3 but this is like that really for "all the time".

How reproducible:
always - unfortunately

Additional info:
AFAIK I am not the only one with similar observations although possibly
a value for "AGPMode" is not always 8.  I know that 4 did not work in
my case.  12 is also for me a "killer value".

It seems to be a different problem than bug #182196 although I am not
sure.

Comment 1 Mike A. Harris 2006-06-07 07:27:04 UTC

From the above log snippets, it appears this is an agpgart issue and not
an X server issue.  In general you should _not_ use the AgpMode setting
in the X server config file, and instead set the AGP rate in your BIOS
and the X server should "just work" with it.

> my case.  12 is also for me a "killer value".

Because it is invalid.  Valid modes are 1/2/4/8.  You can only use the
modes the hardware was designed for, which is the overlap of what the
video card can do, combined with what the motherboard can do.  In some
cases some modes might not work with certain hardware combinations due
to hardware flaws in the motherboard chipset, video card, or both.

Reassigning this to the kernel for now, although it isn't clear if this
is really a kernel bug or not.  If the kernel folk think it is not a
kernel bug, then you should probably file this directly in Xorg bugzilla.

Comment 2 Ian Kent 2006-06-07 12:38:48 UTC

(In reply to comment #1)
> From the above log snippets, it appears this is an agpgart issue and not
> an X server issue.  In general you should _not_ use the AgpMode setting
> in the X server config file, and instead set the AGP rate in your BIOS
> and the X server should "just work" with it.
> 

That's an interesting statement.

I set the AGPMode to 8 and my card started working as well. At least
until I tried 1.2244_FC6.

I'll check to see if the BIOS setting makes any difference.

Ian

Comment 3 Mike A. Harris 2006-06-07 15:37:54 UTC

(In reply to comment #2)
> (In reply to comment #1)
> > From the above log snippets, it appears this is an agpgart issue and not
> > an X server issue.  In general you should _not_ use the AgpMode setting
> > in the X server config file, and instead set the AGP rate in your BIOS
> > and the X server should "just work" with it.
> > 
> 
> That's an interesting statement.

Yeah, to further clarify...  What I mean by "in general", is that it is
*supposed* to automatically autodetect the AGP capabilities of the
hardware, and automatically set the highest AGP rate that the hardware
can do, if it isn't blacklisted for some reason.  When the hardware
isn't broken, and the kernel and X are working right, this generally
happens.  When the hardware is broken or having a bad day, or if the
kernel AGP drivers aren't up to scratch, then it doesn't always work
that way.

Dave Jones can provide more accurate up to date info probably though.

 
> I set the AGPMode to 8 and my card started working as well. At least
> until I tried 1.2244_FC6.

That suggests to me 2 things:

1) The hardware is capable of AGP 8x, but the kernel is not setting that
   by default for whatever reason.  Possibly kernel bug, or just bad
   assumptions or something like that.

2) The hardware combination might not work at AGP 4x due to quirks, or
   perhaps the kernel AGP support has a glitch at 4x, and so 4x hangs.

To be clear though, this is just an educated hypothesis of what is happening,
but not conclusive.

 
> I'll check to see if the BIOS setting makes any difference.

Setting the AGP mode in the BIOS usually makes things work right, however
some systems do not have an AGP mode setting in the CMOS unfortunately.

It is probably a good idea to attach /var/log/messages from a problematic
startup, as well as a working one, for comparison too.

HTH

Comment 4 Michal Jaegermann 2006-06-07 16:13:49 UTC

> In general you should _not_ use the AgpMode setting
> in the X server config file, and instead set the AGP rate
> in your BIOS and the X server should "just work" with it.

That indeed would be nice; as long as you have such settings
in BIOS, which is very far from certain, and you know what is
a correct value.  Moreover I happen to have such BIOS knob.
and it even says "8x" for a long time.  I just checked to be sure.
The catch is that it does not help at all.

As for 12 beeing invalid; I did that only out of sheer curiosity
to see what will happen because I found in logs "Xorg tried to set
rate=x12".  Well, if it tried then let it and see ...
Still I wonder why Xorg did that, or at least something claims
that it did, if it is known in advance that this is an invalid rate?

AFAICT there is really nothing on the subject in /var/log/messages
beyond those two snippets I quoted in my original report.

BTW - in the past I tried to turn off drm.  No changes.

Comment 5 Dave Jones 2006-09-17 05:47:42 UTC

I think I've fixed this in the current rawhide kernel.

Comment 6 Michal Jaegermann 2006-09-17 17:36:02 UTC

> I think I've fixed this in the current rawhide kernel.

Somewhat dependent on what "current" means.  If you are thinking
about 2.6.17-1.2630.fc6, which is now the latest available, then
nothing really changed and "agpgart: Badness....", and so on, it 
is still there and a display without an explicit "AGPMode"
configuration is dead.

OTOH if you are talking about something newer then I will see what
will happen when it will show up on servers.

Comment 7 Michal Jaegermann 2006-09-21 22:11:10 UTC

If the fix mentioned in comment #5 was supposed to be present in
2.6.18-1.2679.fc6 then, I am afraid, I have bad news.  When trying
with this kernel and missing '"AGPMode" "8"' line after an attempt
to start X I see in dmesg (after loging from a remote):

[drm] Initialized drm 1.0.1 20051102
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 201
[drm] Initialized radeon 1.25.0 20060524 on minor 0
agpgart: Found an AGP 3.5 compliant device at 0000:00:00.0.
agpgart: Badness. Don't know which AGP mode to set. [bridge_agpstat:1f000a0a
vga_agpstat:ff00021b fell back to:- bridge_agpstat:1f000208 vga_agpstat:ff00021b]
agpgart: Bridge couldn't do AGP x4.
agpgart: Putting AGP V3 device at 0000:00:00.0 into 0x mode
agpgart: Putting AGP V3 device at 0000:01:00.0 into 0x mode
[drm] Setting GART location based on new memory map
[drm] Loading R300 Microcode
[drm] writeback test failed

Process X apparently loops with the following line in 'top':

 2524 root      24  -1  237m 5800 4008 R 99.7  1.2   2:25.32 X

and there is no apparent way to kill it save of a reboot.  Black
screen and no response from a keyboard - like reported previously.

With an explicit "AGPMode" set I see now this:

[drm] Initialized drm 1.0.1 20051102
ACPI: PCI Interrupt 0000:01:00.0[A] -> GSI 16 (level, low) -> IRQ 201
[drm] Initialized radeon 1.25.0 20060524 on minor 0
agpgart: Found an AGP 3.5 compliant device at 0000:00:00.0.
agpgart: Putting AGP V3 device at 0000:00:00.0 into 8x mode
agpgart: Putting AGP V3 device at 0000:01:00.0 into 8x mode
[drm] Setting GART location based on new memory map
[drm] Loading R300 Microcode
[drm] writeback test succeeded in 1 usecs

and X starts and works fine.

BTW - booting with 2.6.17-1.2647.fc6 does not change anything.

Comment 8 Dave Jones 2006-09-28 22:44:43 UTC

2679 was the last kernel that didn't have the fix :)
The AGP 'Badness' messages you quote shouldn't be in the fixed kernel.

Comment 9 Michal Jaegermann 2006-09-28 23:30:40 UTC

> 2679 was the last kernel that didn't have the fix :)

Hm, talk about "current" from comment #5.

With 2.6.18-1.2699.fc6 I indeed got with AGPMode not specified
in an explicit manner:

agpgart: Found an AGP 3.5 compliant device at 0000:00:00.0.
agpgart: Putting AGP V3 device at 0000:00:00.0 into 4x mode
agpgart: Putting AGP V3 device at 0000:01:00.0 into 4x mode
[drm] Setting GART location based on new memory map
[drm] Loading R300 Microcode
[drm] writeback test succeeded in 1 usecs

Kind of ironic in the face of previous claims
"agpgart: Bridge couldn't do AGP x4".

OTOH the whole setup still works fine in 8x mode if asked.

Comment 10 Dave Jones 2006-09-28 23:48:22 UTC

Hmm, those messages should only be displayed if you explicitly asked for x4.
The behaviour should be.. if you ask for nothing, it'll try to do x8, and if the
hardware supports it, you'll get it. If the hardware doesn't support it, you'll
fall back to x4 mode with a warning (which you don't get).

You seem to have silently fallen back to x4 mode, which shouldn't be possible.

I'll look at the code some more.

Comment 11 Dave Jones 2006-09-28 23:54:07 UTC

ah, actually this makes sense.  X is being smart and realising that you didn't
specify '8', it's falling back to the only other thing it can do in this
situation - x4.

So this sounds like this is all fixed up ?

Comment 12 Michal Jaegermann 2006-09-29 03:25:47 UTC

> ah, actually this makes sense.  X is being smart ....

Frankly I am not entirely sure what is an X behaviour really
expected here.  Why x4 is a default here and not x8, for example?
In any case, it works without a need to guess how to fix what
in practice looks like a crashed machine (even if it did not
entirely crashed :-).

Comment 13 Dave Jones 2006-10-12 00:14:45 UTC

X plays it safe and goes with the more conservative setting because I believe
that there are some cards that can do x8, but in certain boards, they exhibit
problems.
So if you wanted x8 on such a system, you'd need to specify it explicitly.

Note You need to log in before you can comment on or make changes to this bug.