Red Hat Bugzilla – Bug 76603
(AUDIO VIA82CXXX_AUDIO)via82cxxx_audio + esd == bad news
Last modified: 2013-07-02 22:07:31 EDT
Description of Problem:
Attempt to start a gnome session from gdm result in a display of a
"Blue curve" splash and machine dies. There is no reaction to a keyboard,
mouse or network. Just a dead picture on a screen. Turning on sysrq key
is of no help.
This is VIA KX133 board with 750 MHz Athlon; Matrox G400 AGP video.
The same hardware does not have
any problems with any earlier of Red Hat releases and it was used also
during a beta period for 8.0 when such drastic effects were not observed.
KDE session from the same gdm screen starts just fine. This happens both
from a "normal installation" and after applying all available updates.
Modyfying /usr/share/gnome/default.session to contain only
0,RestartCommand=/root/bin/mywm --default-wm gnome-wm --sm-client-id default0
does not change things at all. '/root/bin/mywm' is a modified 'gnome-wm'
script. Giving in it preference to other window managers like sawish and twm
also did not help. Hacking this script I captured some strace
output from attempts to start a window manager. Not sure if of any help
but attached is an archive with some traces (whatever I got before everything
'gdm.trace' - strace attached to gdm top proces while trying to login
'metacity.try1' - an attempt to start metacity before updates
'metacity.try2' - as above but after current updates to 8.0 were applied
This is a test "clean slate" installation and not an update. Currently
it has only a root account and no "regular users" accounts.
Version-Release number of selected component (if applicable):
I did not manage to log even once and every attempt requires hard reboot.
Created attachment 81844 [details]
strace outputs before crashes
It's sticking on unix domain sockets, in one case the ORBit socket in
another the ICE (session management) socket.
This happened to me yesterday with a version of GNOME built from CVS, just after
upgrading the machine to 8.0. I figured it was some random bad CVS snap and
recompiled, haven't tried the new build yet.
It's possible metacity is just stuck because gnome-session or something is
stuck, i.e. the metacity trace may not mean much other than "blocking on a
socket for another process"
I also think that metacity traces are not very important here. I included
them lacking really any better information. Still effects are not very
nice and any other session type (I tried whatever was handy - KDE, windowmaker,
twm and even failsafe) just works from gdm. With gnome-session twm and sawfish
do not make any difference. Things are getting stuck in exactly the same
manner as with metacity.
I know that there are installations where this does not happen (or at least
not with a 100% hit rate :-).
Right, there are very few reports of this that I know of.
I would expect a number of reports if it was happening to lots
of people. I guess that's a happy thought.
It's possible what we need is a backtrace from gnome-session to see where
> It's possible what we need is a backtrace from gnome-session to see where
> it's stuck.
Hm, any ideas where to hook-up to get that?
I tried with /etc/X11/gdm/gnomerc which has only these two lines
where /tmp/gnomessess.sh is a executable shell script running this:
( strace -o /tmp/gnomess.$$ /usr/bin/gnome-session ) > /dev/ttyS0 2>&1 &
and a serial console on /dev/ttyS0. On a machine hooked on the other end
of a serial cable I captured only this:
Assertion failed! buffer != NULL,via82cxxx_audio.c,via_dsp_write,line=2308
The same message on two tries. Possibly just a kernel bug; if yes then
it affects at least two kernels.
OTOH by attaching strace to "one-level-down" gdm process (storing on a local
disk) I got something. It also basically shows "yes, things are stuck" but
maybe you will find there some new information.
Created attachment 81985 [details]
A trace from a stuck gdm
Oops! I had to reread what I posted to notice that cut-and-paste is
detrimental to your health. But loosing "-o /tmp/gnomess.$$" from strace
call above did not change anything on a receiver side. It does not look
that gnomerc is in use at all. "Assertion failed" is persistent.
The gdm trace just shows X traffic. :-/ (fd 13 is the X connection apparently)
To get a gnome-session backtrace you might do:
- in .Xclients put "twm & exec xterm" or something
- from the xterm run "gdb gnome-session"
- when it hangs, ctrl+c in gdb, then bt
This may be a hard trick to acomplish. As I wrote I can just start fine,
say, "Failsafe" or even richer environment up to KDE, but when described bug
hits my machine dies completely. In particular even sysrq is not reacting and
more mundane keyboard actions even less. :-) Network connections go away
Oh, doh. This I have not seen, I read too quickly and was thinking just the
gnome login was hanging.
Presumably it's an X server or kernel bug then; those should be robust against
anything gnome-session can do. gnome-session is just happening to trigger the
bug by using whatever X feature is buggy, etc.
I'm not sure how to go about debugging.
Possibly kernel and/or X server. Triggered assertions quoted earlier seem
to suggest kernel. But keep in mind that an updated 7.3 installation, which
has pretty close kernel and X to those in question, does not show this bug.
Also only gnome-session seems to tickle that and this was NOT happening
What really displays a splash logo? Everything is frozen just after this is
shown with a mouse pointer stuck in the middle of a screen.
gnome-session displays the splash.
The reason I say kernel/X is that by definition apps are not supposed to be
able to crash kernel/X. If there's a bug in say the "draw circle" function in X,
then only a specific app or app version may actually call "draw circle" with the
buggy arguments, but that doesn't mean it's a bug in the app.
Well, I found a "solution" tipped by asserts quoted earlier. If I will turn
off in my /etc/modules.conf sound modules then the whole things starts.
Are you trying to play some annoying sounds on a startup? Apparently this
is a killer.
OTOH if I use 'play' or 'timidity' from a command line with some sound files
laying around there are no ill-effects (sound modules are, obviously, configured
on at that time). The sound card is "VIA Technologies, Inc. VT82C686
AC97 Audio Controller"; nothing very exotic. Should I file a separate bug
report about that?
Still I can lock-up the whole session, keyboard and all but machine accesible
over a network, in no time flat toying a bit, say, with a sawfish configurator.
The whole thing is so fragile that this is not even funny.
XFree86-Servers doesn't exist in RHL 8.0.
Oops, the component reassign didn't take...
I doubt this is an XFree86 bug if it didn't show up in the beta. Sounds
to me more like either a kernel or hardware issue. In any case, the
bug report contains only gnome-session and other bug report info and
nothing useful to troubleshooting the problem from a kernel or XFree86
issue, so it is hard to even guess without more info. I also don't have
a VIA VT82C686 with onboard AC97 sound....
Suggestion: Disable sound on the machine completely. If the problem
goes away, it is a kernel sound driver issue. If it doesn't go away,
then you need to supply details of the problem from an XFree86 and
kernel angle. Also, switch to runlevel 3 instead of 5, and try
"startx". Does "startx" work? If it does work, then something in
the GNOME startup files used in runlevel 5 is triggering a kernel
or hardware bug IMHO, the likely candidate being your onboard audio.
Please provide information.
> Suggestion: Disable sound on the machine completely. If the problem
> goes away, it is a kernel sound driver issue.
If I am not loading sound modules then sound is disabled. What else do
you have in mind? Then a lockup is avoided; see above.
As for beeing a completely sound driver issue keep in mind
that with 2.4.18-17.7.x kernel nothing bad happens and also sound does not
seem to mind if gnome-session is not around. KDE does start without any
tricks. Could be a compiler bug, I guess, as 2.4.18-17.8.0 is compiled
with a different compiler. Once again - whatever was in betas did not
have this behaviour.
X servers are also similar. Both 4.2.0 with different subversions.
(I do not mean a server package here but X does supply a server.)
You need to determine wether it is the X server, or the kernel at this
point, since I don't have the hardware to reproduce. Once that is
determined, if it is the X server, I will need a backtrace of a coredump,
or the results of an interactive gdb session of the X server.
If it is the kernel at fault, which I am assuming since you claim it
works with the latest 7.3 kernel (presumably the 7.3 kernel on an 8.0
system), reassign to the kernel component, as that is the variable
that is showing it working or not working (if this is the case).
> You need to determine wether it is the X server, or the kernel at this
I "need to"? I hope that this is only an unfortunate choice of words on your
part. I already sunk much more time than I can really afford in that issue.
I "need to" report that gnome-session in RH 8.0 kills a machine cold while
this was not happening in earlier versions; and which I did.
Putting that aside how do you propose to do the above without me spending
a week on debugging? Keep in mind that in a lockup situation I can get
Assertion failed! buffer != NULL,via82cxxx_audio.c,via_dsp_write,line=2308
and this only with a help of a serial console. If sound is turned off
then I can get all kinds of traces but then thing work normal.
All information I provided earlier to me seems to indicate that X is not
a culprit here. Windowmaker, KDE or even GNOME seessions (but the last one
only if sound modules are not loaded) just work. Looks to me like X is
> If it is the kernel at fault, which I am assuming since you claim it
> works with the latest 7.3 kernel (presumably the 7.3 kernel on an 8.0
Of course NOT 7.3 kernel on 8.0 system. ABI is different. Still versions
2.4.18-17.7.x and 2.4.18-17.8.0 on a source level seem to be pretty close. :-)
> I "need to"? I hope that this is only an unfortunate choice of words
> on your part
No, it was not an unfortunate choice of words, it was quite explicit.
You are describing a problem which I can not reproduce. That means
unless you can provide information that is conducive to finding a
solution, then this bug sits here and rots, or I close it WONTFIX
now or at some other point. I can't read people's minds or pull
bugfixes out of a hat really, not even a red one.
>Putting that aside how do you propose to do the above without me
>spending a week on debugging? Keep in mind that in a lockup
>situation I can get only
Putting that aside, how do you propose I fix a bug that occurs only
on some hardware which I do not have, cannot reproduce, do not have
enough information to begin to look into it? I could sit in a
debugger for 10 years and it wont help find a solution to the
>Assertion failed! buffer != NULL,via82cxxx_audio.c,via_dsp_write,line=2308
That is _definitely_ not an X assertion. That is a kernel error message
as best I can tell, and has nothing at all whatsoever to do with X.
I don't know if this is a kernel bug or not although that is my
best guess. It isn't an X issue though, so rather than closing as
NOTABUG for XFree86, I'm reassigning to the kernel for now.
> That is _definitely_ not an X assertion.
I do not recall ever suggesting that this is bug in X. Quite to the contrary.
Hints of that sort came from somebody else. Still between the last beta
and a final release somebody managed to introduce a killer bug.
That problem is absent for 2.4.20-2.2 kernel from "Phoebe" (although the
later one has other creative ways to lock things :-). It may be that
a gnome startup is different than in a configuration which was creating
troubles. Unfortunately my new "Phoebe" installation _replaced_ 8.0 one.
My resources I can put aside for testing are limited.
*** Bug 80968 has been marked as a duplicate of this bug. ***
Alan Cox has identified that the Via audio kernel driver works fine for all
cases he has tested, except one: esd. It seems that esd does _something_ that
triggers bad behavior in either the hardware or the Via audio driver.
This problem has not been tracked down further (AFAIK), so this is merely a
Indeed, after I renamed /usr/bin/esd to /usr/bin/broken_esd I can
logout without killing the whole machine in the process. No problems
also when artsd runs.
Apparently gnome-session now brings up esd on its own while it was
not doing that with gnome-1.4 (RH 7.3) so the trouble failed to materialize
(or maybe esd was not _that_ broken).
I can also confirm that this problem exists under RH9 with an Abit AT7 MAX mobo
with Via sound. esd won't die when exiting X. Restarting X without killing the
old esd hangs startx before the Gnome splash screen appears. Killing esd
immediately allows Gnome to continue.
An awful lot of noise on this bug. Perhaps it's time to open a new one?
I'm running rh9 and the system doesn't hang, the only problem is "denial of
sound service" and that esd doesn't do its job.
To me this bug is pretty easy to isolate and reproduce. esd locks up when any
program is trying to play a sound. Killing esd and letting the program access
/dev/dsp manually works around the problem. When killing esd, the following line
gets appended to the output of 'dmesg':
via_audio: ignoring drain playback error -512
I'd be more than willing to help esd or sound driver people debugging this.
Its on my todo list. I've spent some time working on it and can duplicate it
order but still don't understand why.
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases,
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/