Bug 76603

Summary: (AUDIO VIA82CXXX_AUDIO)via82cxxx_audio + esd == bad news
Product: [Retired] Red Hat Linux Reporter: Michal Jaegermann <michal>
Component: kernelAssignee: Jeff Garzik <jgarzik>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: 8.0CC: ckjohnson, noa, peterm, rhult
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-09-30 15:40:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
strace outputs before crashes
none
A trace from a stuck gdm none

Description Michal Jaegermann 2002-10-24 00:23:48 UTC
Description of Problem:

Attempt to start a gnome session from gdm result in a display of a
"Blue curve" splash and machine dies.  There is no reaction to a keyboard,
mouse or network.  Just a dead picture on a screen.  Turning on sysrq key
is of no help.

This is VIA KX133 board with 750 MHz Athlon; Matrox G400 AGP video.

The same hardware does not have 
any problems with any earlier of Red Hat releases and it was used also
during a beta period for 8.0 when such drastic effects were not observed.
KDE session from the same gdm screen starts just fine.  This happens both
from a "normal installation" and after applying all available updates.

Modyfying /usr/share/gnome/default.session to contain only

[Default]
num_clients=1
0,id=default0
0,Priority=10
0,RestartCommand=/root/bin/mywm --default-wm gnome-wm --sm-client-id default0

does not change things at all.  '/root/bin/mywm' is a modified 'gnome-wm'
script.  Giving in it preference to other window managers like sawish and twm
also did not help.  Hacking this script I captured some strace
output from attempts to start a window manager.  Not sure if of any help
but attached is an archive with some traces (whatever I got before everything
died):
'gdm.trace' - strace attached to gdm top proces while trying to login
'metacity.try1' - an attempt to start metacity before updates
'metacity.try2' - as above but after current updates to 8.0 were applied

This is a test "clean slate" installation and not an update.  Currently
it has only a root account and no "regular users" accounts.

Version-Release number of selected component (if applicable):
gnome-session-2.0.5-7.i386.rpm

How Reproducible:
I did not manage to log even once and every attempt requires hard reboot.

Comment 1 Michal Jaegermann 2002-10-24 00:25:57 UTC
Created attachment 81844 [details]
strace outputs before crashes

Comment 2 Havoc Pennington 2002-10-24 02:26:11 UTC
It's sticking on unix domain sockets, in one case the ORBit socket in 
another the ICE (session management) socket.

This happened to me yesterday with a version of GNOME built from CVS, just after
upgrading the machine to 8.0. I figured it was some random bad CVS snap and 
recompiled, haven't tried the new build yet.

It's possible metacity is just stuck because gnome-session or something is
stuck, i.e. the metacity trace may not mean much other than "blocking on a
socket for another process"


Comment 3 Michal Jaegermann 2002-10-24 05:50:58 UTC
I also think that metacity traces are not very important here.  I included
them lacking really any better information.  Still effects are not very
nice and any other session type (I tried whatever was handy - KDE, windowmaker,
twm and even failsafe) just works from gdm.  With gnome-session twm and sawfish
do not make any difference.  Things are getting stuck in exactly the same
manner as with metacity.

I know that there are installations where this does not happen (or at least
not with a 100% hit rate :-).


Comment 4 Havoc Pennington 2002-10-24 14:39:33 UTC
Right, there are very few reports of this that I know of.
I would expect a number of reports if it was happening to lots 
of people. I guess that's a happy thought.

It's possible what we need is a backtrace from gnome-session to see where 
it's stuck.



Comment 5 Michal Jaegermann 2002-10-24 20:59:09 UTC
> It's possible what we need is a backtrace from gnome-session to see where 
> it's stuck.

Hm, any ideas where to hook-up to get that?

I tried with /etc/X11/gdm/gnomerc which has only these two lines

#!/bin/sh
/tmp/gnomesess.sh

where /tmp/gnomessess.sh is a executable shell script running this:

( strace -o /tmp/gnomess.$$ /usr/bin/gnome-session ) > /dev/ttyS0 2>&1 &

and a serial console on /dev/ttyS0.  On a machine hooked on the other end
of a serial cable I captured only this:

Assertion failed! buffer != NULL,via82cxxx_audio.c,via_dsp_write,line=2308

The same message on two tries.  Possibly just a kernel bug; if yes then
it affects at least two kernels.

OTOH by attaching strace to "one-level-down" gdm process (storing on a local
disk) I got something.  It also basically shows "yes, things are stuck" but
maybe you will find there some new information.

Comment 6 Michal Jaegermann 2002-10-24 21:00:13 UTC
Created attachment 81985 [details]
A trace from a stuck gdm

Comment 7 Michal Jaegermann 2002-10-24 21:17:30 UTC
Oops!  I had to reread what I posted to notice that cut-and-paste is
detrimental to your health.  But loosing "-o /tmp/gnomess.$$" from strace
call above did not change anything on a receiver side.  It does not look
that gnomerc is in use at all.  "Assertion failed" is persistent.

Comment 8 Havoc Pennington 2002-10-25 05:17:32 UTC
The gdm trace just shows X traffic. :-/ (fd 13 is the X connection apparently)

To get a gnome-session backtrace you might do:

 - in .Xclients put "twm & exec xterm" or something
 - from the xterm run "gdb gnome-session"
 - when it hangs, ctrl+c in gdb, then bt


Comment 9 Michal Jaegermann 2002-10-25 15:37:09 UTC
This may be a hard trick to acomplish.  As I wrote I can just start fine,
say, "Failsafe" or even richer environment up to KDE, but when described bug
hits my machine dies completely.  In particular even sysrq is not reacting and
more mundane keyboard actions even less. :-)   Network connections go away
too.

Comment 10 Havoc Pennington 2002-10-25 16:59:51 UTC
Oh, doh. This I have not seen, I read too quickly and was thinking just the
gnome login was hanging.

Presumably it's an X server or kernel bug then; those should be robust against 
anything gnome-session can do. gnome-session is just happening to trigger the
bug by using whatever X feature is buggy, etc.

I'm not sure how to go about debugging.



Comment 11 Michal Jaegermann 2002-10-25 17:24:42 UTC
Possibly kernel and/or X server.  Triggered assertions quoted earlier seem
to suggest kernel.  But keep in mind that an updated 7.3 installation, which
has pretty close kernel and X to those in question, does not show this bug.
Also only gnome-session seems to tickle that and this was NOT happening
in betas.

What really displays a splash logo?  Everything is frozen just after this is
shown with a mouse pointer stuck in the middle of a screen.

Comment 12 Havoc Pennington 2002-10-25 18:34:29 UTC
gnome-session displays the splash.

The reason I say kernel/X is that by definition apps are not supposed to be
able to crash kernel/X. If there's a bug in say the "draw circle" function in X,
then only a specific app or app version may actually call "draw circle" with the
buggy arguments, but that doesn't mean it's a bug in the app.

Comment 13 Michal Jaegermann 2002-10-25 23:28:13 UTC
Well, I found a "solution" tipped by asserts quoted earlier.  If I will turn
off in my /etc/modules.conf sound modules then the whole things starts.
Are you trying to play some annoying sounds on a startup?  Apparently this
is a killer.

OTOH if I use 'play' or 'timidity' from a command line with some sound files
laying around there are no ill-effects (sound modules are, obviously, configured
on at that time).  The sound card is "VIA Technologies, Inc. VT82C686
AC97 Audio Controller"; nothing very exotic.  Should I file a separate bug
report about that?

Still I can lock-up the whole session, keyboard and all but machine accesible
over a network, in no time flat toying a bit, say, with a sawfish configurator.
The whole thing is so fragile that this is not even funny.

Comment 14 Mike A. Harris 2002-10-26 02:17:25 UTC
XFree86-Servers doesn't exist in RHL 8.0.

Comment 15 Mike A. Harris 2002-10-26 02:23:45 UTC
Oops, the component reassign didn't take...

I doubt this is an XFree86 bug if it didn't show up in the beta.  Sounds
to me more like either a kernel or hardware issue.  In any case, the
bug report contains only gnome-session and other bug report info and
nothing useful to troubleshooting the problem from a kernel or XFree86
issue, so it is hard to even guess without more info.  I also don't have
a VIA VT82C686 with onboard AC97 sound....

Suggestion:  Disable sound on the machine completely.  If the problem
goes away, it is a kernel sound driver issue. If it doesn't go away,
then you need to supply details of the problem from an XFree86 and
kernel angle.  Also, switch to runlevel 3 instead of 5, and try
"startx".  Does "startx" work?  If it does work, then something in
the GNOME startup files used in runlevel 5 is triggering a kernel
or hardware bug IMHO, the likely candidate being your onboard audio.

Please provide information.

Comment 16 Michal Jaegermann 2002-10-26 03:29:36 UTC
> Suggestion:  Disable sound on the machine completely.  If the problem
> goes away, it is a kernel sound driver issue.

If I am not loading sound modules then sound is disabled.  What else do
you have in mind?  Then a lockup is avoided; see above.

As for beeing a completely sound driver issue keep in mind
that with 2.4.18-17.7.x kernel nothing bad happens and also sound does not
seem to mind if gnome-session is not around.  KDE does start without any
tricks.  Could be a compiler bug, I guess, as 2.4.18-17.8.0 is compiled
with a different compiler.  Once again - whatever was in betas did not
have this behaviour.

X servers are also similar.  Both 4.2.0 with different subversions.
(I do not mean a server package here but X does supply a server.)

Comment 17 Mike A. Harris 2002-10-26 04:34:59 UTC
You need to determine wether it is the X server, or the kernel at this
point, since I don't have the hardware to reproduce.  Once that is
determined, if it is the X server, I will need a backtrace of a coredump,
or the results of an interactive gdb session of the X server.

If it is the kernel at fault, which I am assuming since you claim it
works with the latest 7.3 kernel (presumably the 7.3 kernel on an 8.0
system), reassign to the kernel component, as that is the variable
that is showing it working or not working (if this is the case).

Comment 18 Michal Jaegermann 2002-10-26 16:26:48 UTC
> You need to determine wether it is the X server, or the kernel at this
> point,

I "need to"?  I hope that this is only an unfortunate choice of words on your
part.  I already sunk much more time than I can really afford in that issue.
I "need to" report that gnome-session in RH 8.0 kills a machine cold while
this was not happening in earlier versions; and which I did.

Putting that aside how do you propose to do the above without me spending
a week on debugging?  Keep in mind that in a lockup situation I can get
only

Assertion failed! buffer != NULL,via82cxxx_audio.c,via_dsp_write,line=2308

and this only with a help of a serial console.  If sound is turned off
then I can get all kinds of traces but then thing work normal.

All information I provided earlier to me seems to indicate that X is not
a culprit here.  Windowmaker, KDE or even GNOME seessions (but the last one
only if sound modules are not loaded) just work.  Looks to me like X is
healthy enough.

> If it is the kernel at fault, which I am assuming since you claim it
> works with the latest 7.3 kernel (presumably the 7.3 kernel on an 8.0
> system)

Of course NOT 7.3 kernel on 8.0 system.  ABI is different.  Still versions
2.4.18-17.7.x and 2.4.18-17.8.0 on a source level seem to be pretty close. :-)


Comment 19 Mike A. Harris 2002-11-01 01:37:43 UTC
> I "need to"?  I hope that this is only an unfortunate choice of words
> on your part

No, it was not an unfortunate choice of words, it was quite explicit.
You are describing a problem which I can not reproduce.  That means
unless you can provide information that is conducive to finding a
solution, then this bug sits here and rots, or I close it WONTFIX
now or at some other point.  I can't read people's minds or pull
bugfixes out of a hat really, not even a red one.

>Putting that aside how do you propose to do the above without me
>spending a week on debugging?  Keep in mind that in a lockup
>situation I can get only

Putting that aside, how do you propose I fix a bug that occurs only
on some hardware which I do not have, cannot reproduce, do not have
enough information to begin to look into it?  I could sit in a
debugger for 10 years and it wont help find a solution to the
problem.

>Assertion failed! buffer != NULL,via82cxxx_audio.c,via_dsp_write,line=2308

That is _definitely_ not an X assertion.  That is a kernel error message
as best I can tell, and has nothing at all whatsoever to do with X.

I don't know if this is a kernel bug or not although that is my
best guess.  It isn't an X issue though, so rather than closing as
NOTABUG for XFree86, I'm reassigning to the kernel for now.








Comment 20 Michal Jaegermann 2002-11-01 01:55:53 UTC
> That is _definitely_ not an X assertion.

I do not recall ever suggesting that this is bug in X. Quite to the contrary.
Hints of that sort came from somebody else.  Still between the last beta
and a final release somebody managed to introduce a killer bug.

Comment 21 Michal Jaegermann 2003-01-02 21:03:09 UTC
That problem is absent for 2.4.20-2.2 kernel from "Phoebe" (although the
later one has other creative ways to lock things :-).  It may be that
a gnome startup is different than in a configuration which was creating
troubles.  Unfortunately my new "Phoebe" installation _replaced_ 8.0 one.
My resources I can put aside for testing are limited.


Comment 22 Jeff Garzik 2003-03-24 16:42:34 UTC
*** Bug 80968 has been marked as a duplicate of this bug. ***

Comment 23 Jeff Garzik 2003-03-24 16:46:12 UTC
Alan Cox has identified that the Via audio kernel driver works fine for all
cases he has tested, except one:  esd.  It seems that esd does _something_ that
triggers bad behavior in either the hardware or the Via audio driver.

This problem has not been tracked down further (AFAIK), so this is merely a
status update.


Comment 24 Michal Jaegermann 2003-03-24 23:51:55 UTC
Indeed, after I renamed /usr/bin/esd to /usr/bin/broken_esd I can 
logout without killing the whole machine in the process.  No problems
also when artsd runs.

Apparently gnome-session now brings up esd on its own while it was
not doing that with gnome-1.4 (RH 7.3) so the trouble failed to materialize
(or maybe esd was not _that_ broken).


Comment 25 Tom Wood 2003-04-07 18:46:09 UTC
I can also confirm that this problem exists under RH9 with an Abit AT7 MAX mobo
with Via sound.  esd won't die when exiting X.  Restarting X without killing the
old esd hangs startx before the Gnome splash screen appears.  Killing esd
immediately allows Gnome to continue.

Comment 26 Noa Resare 2003-05-03 19:09:27 UTC
An awful lot of noise on this bug. Perhaps it's time to open a new one?

I'm running rh9 and the system doesn't hang, the only problem is "denial of
sound service" and that esd doesn't do its job.

To me this bug is pretty easy to isolate and reproduce. esd locks up when any
program is trying to play a sound. Killing esd and letting the program access
/dev/dsp manually works around the problem. When killing esd, the following line
gets appended to the output of 'dmesg':

via_audio: ignoring drain playback error -512

I'd be more than willing to help esd or sound driver people debugging this.

Comment 27 Alan Cox 2003-06-05 12:25:22 UTC
Its on my todo list. I've spent some time working on it and can duplicate it
locally to
order but still don't understand why.


Comment 28 Bugzilla owner 2004-09-30 15:40:06 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/