Bug 31917 - (G200) Too many GL instances hang Xserver.
Summary: (G200) Too many GL instances hang Xserver.
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: XFree86
Version: 7.1
Hardware: i386
OS: Linux
high
high
Target Milestone: ---
Assignee: Mike A. Harris
QA Contact: David Lawrence
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2001-03-15 23:28 UTC by Ed McKenzie
Modified: 2005-10-31 22:00 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2001-06-28 14:55:41 UTC
Embargoed:


Attachments (Terms of Use)
XFree86.0.log (22.15 KB, text/plain)
2001-06-28 10:57 UTC, Bruce Garlock
no flags Details

Description Ed McKenzie 2001-03-15 23:28:15 UTC
Steps to reproduce:

1. Open several instances of 'gears' and 'gloss' (for me, seven or eight
does it pretty quickly)
2. Let them all run for a few seconds
3. The Xserver hangs. The cursor still moves, but the keyboard is dead and
open applications stop responding to mouse events. The system is still
running, and the following is in /var/log/messages:

Mar 13 20:08:50 eem12 kernel: [drm:mga_fire_primary] *ERROR* num_dwords ==
0 when dispatched

Pertinent hardware specs: Matrox G200, AMD K6/2-400. I suspect this is a
race of some sort, because it happens basically at random (I got it to
happen once with one instance of gears running while previewing GLPlanet in
the Control Center.)

Possibly related is the fact that running a GL app locks up the Xserver if
it's occluded completely behind another window. That could explain all of
the above.

Comment 1 Mike A. Harris 2001-03-16 01:43:56 UTC
Mismatched XFree/Mesa RPMs.  Upgrade to the latest packages:

XFree86-4.0.2a-1 & Mesa-3.4-12 from:

ftp://people.redhat.com/mharris

Comment 2 Ed McKenzie 2001-03-16 02:54:10 UTC
Hm. This one still locks up, with the added bonus of not being able to sync my
monitor in any mode besides 640x480 (and a rather funky 640x480 at that :-/

Some oddness -- when the Xserver is killed, it says something about an
unresolved function being called, and "this shouldn't happen." I've never seen
that one before!

Comment 3 Mike A. Harris 2001-03-16 15:03:20 UTC
I completely agree with you.  We are testing a new experimental driver from
Matrox, and it is possible that it breaks things.  I am reverting the new
driver right now, and will be releasing a 4.0.2a release in a few hours
that uses the old driver, and should fix the problem you're having.

Sorry for the inconvenience.

Comment 4 Mike A. Harris 2001-03-18 19:18:12 UTC
Ok, I had released 4.0.2a-2 that reverted the driver, and now I have
4.0.3-1 out, and Mesa-3.4-13.  Please try them and let me know if
it works for you.  I can run many of these GL apps simultaneously
and it works ok.  I've left them running overnight, and they're running
while I type too.  This is on a G400 singlehead configuration.

I have a G200 here I'll test out as well, but I might not get to it right
away, so if you could try out 4.0.3 that would be cool.

Take care, and hope it works for ya.  If not, reopen.





Comment 5 Ed McKenzie 2001-03-25 20:04:16 UTC
I still get hangs running multiple instances of gloss from Mesa-demos:

$ for i in 1 2 3 4 5 6 7 8 9 10 11 12 ; do sh -c "gloss &" ; done

...and the Xserver hangs after a bit, though the mouse cursor still moves. I
haven't tried this with gears in awhile, but I don't think the particular
application makes a difference. (I've also seen this with just two instances,
which is why I think it's a race of some sort.) 

Covering a window that contains a GL app still hangs the Xserver after not too
long (uncovering the window is usually triggers it.)


Comment 6 Ed McKenzie 2001-03-25 20:06:46 UTC
BTW, this was tested on XFree86 4.0.3-1 and Mesa-3.4-13 (running on a G200, as
mentioned before.)

Comment 7 Mike A. Harris 2001-04-03 06:28:26 UTC
There is a new Matrox patch in 4.0.3-5.  It might solve the problem
for you.  If not, attach your X server log from after a crash, config.

Comment 8 Ed McKenzie 2001-04-03 07:54:30 UTC
No difference. Xserver log and config attached, as well as tail of dmesg.

Comment 9 Ed McKenzie 2001-04-03 08:21:35 UTC
Go here instead: http://nebulascape.dyndns.org/bugzilla-31917

Comment 10 Ed McKenzie 2001-04-08 17:03:12 UTC
After updating from 04/07 rawhide, it seems things are more unstable than ever.
Previewing GL screensavers in the Control Center is now enough to crash the
machine reproducibly. :-/

Comment 11 Alexei Podtelezhnikov 2001-04-09 19:41:39 UTC
I assume that you have 8Mb card. Your top resolution 1152x864x16bpp may be too 
large for 3d-acceleration which indeed requires 3-4 times more memory than 2d 
rendering. See if you can reproduce your behavior with a little smaller 
resolution 1024x768x16bpp or even smaller. That fixed my 3d-accelerated freezes 
caused by insufficient memory. Report your findings here.

Comment 12 Ed McKenzie 2001-04-10 14:40:59 UTC
The Xserver also crashes at 640x480x16bpp, in both cases (run a GL app and
switch desktops, or open too many gears windows.) gdb shows the Xserver stuck in
an __ioctl call, if that's at all relevant.

Quake3 et. al. don't render to a hidden window, or to more than one window at a
time, and it's my cynical feeling that it'll be some time before the target
audience for Linux OpenGL consists of more than FPS players. For now, I don't
think the bleeding edge is for me. DRI does show promise, though, and I'll be
sure to keep an eye on things from time to time. Thanks, all.

Comment 13 Bruce Garlock 2001-06-28 10:57:42 UTC
Created attachment 22027 [details]
XFree86.0.log

Comment 14 Bruce Garlock 2001-06-28 10:58:19 UTC
I am having the same problem, but with a G400.  I noticed this problem when
running 'gears', and moving a window over the gears window.  X locked up, but I
could ssh into the box.  Killing X didn't do much, and the screen stayed
"frozen" on the console.  I had to use Alt-SysRq-S,U to sync the disks, and do a
clean unmount.

Here is the output from /var/log/messages when the lockup occured:

kernel: [drm:mga_fire_primary] *ERROR* num_dwords == 0 when dispatched

I am attaching my XFree86.0.log


Comment 15 Ed McKenzie 2001-06-28 12:26:16 UTC
This particular bug is apparently fixed in XFree86 4.1.0.  You'll need to 

build your own kernel to use drm w/ that release, but I haven't had the time 

or inclination to do so.  I'll likely try again once kernel-* gets updated in 

rawhide.

Comment 16 Bruce Garlock 2001-06-28 13:15:03 UTC
What in the kernel config do I need to change to build with drm?  I'm thinking about grabbing the XFree86 from Rawhide,
and giving it a try.

Comment 17 Ed McKenzie 2001-06-28 13:33:04 UTC
It's not a question of kernel options, but matching the correct kernel drm 
version with the right XFree86 version.  In my experience, XFree86 tends 
to do really bad things if run with the wrong kernel version.

Comment 18 Bruce Garlock 2001-06-28 14:42:49 UTC
Ok, so how do you make sure the versions are correct?  Compile the whole deal (XFree, Mesa, kernel) from
source?

Comment 19 Ed McKenzie 2001-06-28 14:55:37 UTC
The packager usually has this responsibility.  Rawhide packages for 
XFree86 have been good about not installing without all the correct 
versions, at least recently.  I haven't tried with 2.4.5-0.4 for other reasons 
(USB issues), but I expect it'll be fixed soon.  The mailing lists (xfree-xpert, 
etc.) are a good resource if you're going to do this by hand (not that I 
recommend it. :-)

I'm still waiting for 2.4.5-1 to show up in rawhide, as another bug report 
I've filed depends on it ...

Comment 20 Ed McKenzie 2001-08-17 16:14:04 UTC
Fixed in 4.1.0


Note You need to log in before you can comment on or make changes to this bug.