Bug 76603
Summary: | (AUDIO VIA82CXXX_AUDIO)via82cxxx_audio + esd == bad news | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | Michal Jaegermann <michal> | ||||||
Component: | kernel | Assignee: | Jeff Garzik <jgarzik> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | |||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | medium | ||||||||
Version: | 8.0 | CC: | ckjohnson, noa, peterm, rhult | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | i386 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2004-09-30 15:40:06 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Michal Jaegermann
2002-10-24 00:23:48 UTC
Created attachment 81844 [details]
strace outputs before crashes
It's sticking on unix domain sockets, in one case the ORBit socket in another the ICE (session management) socket. This happened to me yesterday with a version of GNOME built from CVS, just after upgrading the machine to 8.0. I figured it was some random bad CVS snap and recompiled, haven't tried the new build yet. It's possible metacity is just stuck because gnome-session or something is stuck, i.e. the metacity trace may not mean much other than "blocking on a socket for another process" I also think that metacity traces are not very important here. I included them lacking really any better information. Still effects are not very nice and any other session type (I tried whatever was handy - KDE, windowmaker, twm and even failsafe) just works from gdm. With gnome-session twm and sawfish do not make any difference. Things are getting stuck in exactly the same manner as with metacity. I know that there are installations where this does not happen (or at least not with a 100% hit rate :-). Right, there are very few reports of this that I know of. I would expect a number of reports if it was happening to lots of people. I guess that's a happy thought. It's possible what we need is a backtrace from gnome-session to see where it's stuck. > It's possible what we need is a backtrace from gnome-session to see where
> it's stuck.
Hm, any ideas where to hook-up to get that?
I tried with /etc/X11/gdm/gnomerc which has only these two lines
#!/bin/sh
/tmp/gnomesess.sh
where /tmp/gnomessess.sh is a executable shell script running this:
( strace -o /tmp/gnomess.$$ /usr/bin/gnome-session ) > /dev/ttyS0 2>&1 &
and a serial console on /dev/ttyS0. On a machine hooked on the other end
of a serial cable I captured only this:
Assertion failed! buffer != NULL,via82cxxx_audio.c,via_dsp_write,line=2308
The same message on two tries. Possibly just a kernel bug; if yes then
it affects at least two kernels.
OTOH by attaching strace to "one-level-down" gdm process (storing on a local
disk) I got something. It also basically shows "yes, things are stuck" but
maybe you will find there some new information.
Created attachment 81985 [details]
A trace from a stuck gdm
Oops! I had to reread what I posted to notice that cut-and-paste is detrimental to your health. But loosing "-o /tmp/gnomess.$$" from strace call above did not change anything on a receiver side. It does not look that gnomerc is in use at all. "Assertion failed" is persistent. The gdm trace just shows X traffic. :-/ (fd 13 is the X connection apparently) To get a gnome-session backtrace you might do: - in .Xclients put "twm & exec xterm" or something - from the xterm run "gdb gnome-session" - when it hangs, ctrl+c in gdb, then bt This may be a hard trick to acomplish. As I wrote I can just start fine, say, "Failsafe" or even richer environment up to KDE, but when described bug hits my machine dies completely. In particular even sysrq is not reacting and more mundane keyboard actions even less. :-) Network connections go away too. Oh, doh. This I have not seen, I read too quickly and was thinking just the gnome login was hanging. Presumably it's an X server or kernel bug then; those should be robust against anything gnome-session can do. gnome-session is just happening to trigger the bug by using whatever X feature is buggy, etc. I'm not sure how to go about debugging. Possibly kernel and/or X server. Triggered assertions quoted earlier seem to suggest kernel. But keep in mind that an updated 7.3 installation, which has pretty close kernel and X to those in question, does not show this bug. Also only gnome-session seems to tickle that and this was NOT happening in betas. What really displays a splash logo? Everything is frozen just after this is shown with a mouse pointer stuck in the middle of a screen. gnome-session displays the splash. The reason I say kernel/X is that by definition apps are not supposed to be able to crash kernel/X. If there's a bug in say the "draw circle" function in X, then only a specific app or app version may actually call "draw circle" with the buggy arguments, but that doesn't mean it's a bug in the app. Well, I found a "solution" tipped by asserts quoted earlier. If I will turn off in my /etc/modules.conf sound modules then the whole things starts. Are you trying to play some annoying sounds on a startup? Apparently this is a killer. OTOH if I use 'play' or 'timidity' from a command line with some sound files laying around there are no ill-effects (sound modules are, obviously, configured on at that time). The sound card is "VIA Technologies, Inc. VT82C686 AC97 Audio Controller"; nothing very exotic. Should I file a separate bug report about that? Still I can lock-up the whole session, keyboard and all but machine accesible over a network, in no time flat toying a bit, say, with a sawfish configurator. The whole thing is so fragile that this is not even funny. XFree86-Servers doesn't exist in RHL 8.0. Oops, the component reassign didn't take... I doubt this is an XFree86 bug if it didn't show up in the beta. Sounds to me more like either a kernel or hardware issue. In any case, the bug report contains only gnome-session and other bug report info and nothing useful to troubleshooting the problem from a kernel or XFree86 issue, so it is hard to even guess without more info. I also don't have a VIA VT82C686 with onboard AC97 sound.... Suggestion: Disable sound on the machine completely. If the problem goes away, it is a kernel sound driver issue. If it doesn't go away, then you need to supply details of the problem from an XFree86 and kernel angle. Also, switch to runlevel 3 instead of 5, and try "startx". Does "startx" work? If it does work, then something in the GNOME startup files used in runlevel 5 is triggering a kernel or hardware bug IMHO, the likely candidate being your onboard audio. Please provide information. > Suggestion: Disable sound on the machine completely. If the problem
> goes away, it is a kernel sound driver issue.
If I am not loading sound modules then sound is disabled. What else do
you have in mind? Then a lockup is avoided; see above.
As for beeing a completely sound driver issue keep in mind
that with 2.4.18-17.7.x kernel nothing bad happens and also sound does not
seem to mind if gnome-session is not around. KDE does start without any
tricks. Could be a compiler bug, I guess, as 2.4.18-17.8.0 is compiled
with a different compiler. Once again - whatever was in betas did not
have this behaviour.
X servers are also similar. Both 4.2.0 with different subversions.
(I do not mean a server package here but X does supply a server.)
You need to determine wether it is the X server, or the kernel at this point, since I don't have the hardware to reproduce. Once that is determined, if it is the X server, I will need a backtrace of a coredump, or the results of an interactive gdb session of the X server. If it is the kernel at fault, which I am assuming since you claim it works with the latest 7.3 kernel (presumably the 7.3 kernel on an 8.0 system), reassign to the kernel component, as that is the variable that is showing it working or not working (if this is the case). > You need to determine wether it is the X server, or the kernel at this > point, I "need to"? I hope that this is only an unfortunate choice of words on your part. I already sunk much more time than I can really afford in that issue. I "need to" report that gnome-session in RH 8.0 kills a machine cold while this was not happening in earlier versions; and which I did. Putting that aside how do you propose to do the above without me spending a week on debugging? Keep in mind that in a lockup situation I can get only Assertion failed! buffer != NULL,via82cxxx_audio.c,via_dsp_write,line=2308 and this only with a help of a serial console. If sound is turned off then I can get all kinds of traces but then thing work normal. All information I provided earlier to me seems to indicate that X is not a culprit here. Windowmaker, KDE or even GNOME seessions (but the last one only if sound modules are not loaded) just work. Looks to me like X is healthy enough. > If it is the kernel at fault, which I am assuming since you claim it > works with the latest 7.3 kernel (presumably the 7.3 kernel on an 8.0 > system) Of course NOT 7.3 kernel on 8.0 system. ABI is different. Still versions 2.4.18-17.7.x and 2.4.18-17.8.0 on a source level seem to be pretty close. :-) > I "need to"? I hope that this is only an unfortunate choice of words > on your part No, it was not an unfortunate choice of words, it was quite explicit. You are describing a problem which I can not reproduce. That means unless you can provide information that is conducive to finding a solution, then this bug sits here and rots, or I close it WONTFIX now or at some other point. I can't read people's minds or pull bugfixes out of a hat really, not even a red one. >Putting that aside how do you propose to do the above without me >spending a week on debugging? Keep in mind that in a lockup >situation I can get only Putting that aside, how do you propose I fix a bug that occurs only on some hardware which I do not have, cannot reproduce, do not have enough information to begin to look into it? I could sit in a debugger for 10 years and it wont help find a solution to the problem. >Assertion failed! buffer != NULL,via82cxxx_audio.c,via_dsp_write,line=2308 That is _definitely_ not an X assertion. That is a kernel error message as best I can tell, and has nothing at all whatsoever to do with X. I don't know if this is a kernel bug or not although that is my best guess. It isn't an X issue though, so rather than closing as NOTABUG for XFree86, I'm reassigning to the kernel for now. > That is _definitely_ not an X assertion.
I do not recall ever suggesting that this is bug in X. Quite to the contrary.
Hints of that sort came from somebody else. Still between the last beta
and a final release somebody managed to introduce a killer bug.
That problem is absent for 2.4.20-2.2 kernel from "Phoebe" (although the later one has other creative ways to lock things :-). It may be that a gnome startup is different than in a configuration which was creating troubles. Unfortunately my new "Phoebe" installation _replaced_ 8.0 one. My resources I can put aside for testing are limited. *** Bug 80968 has been marked as a duplicate of this bug. *** Alan Cox has identified that the Via audio kernel driver works fine for all cases he has tested, except one: esd. It seems that esd does _something_ that triggers bad behavior in either the hardware or the Via audio driver. This problem has not been tracked down further (AFAIK), so this is merely a status update. Indeed, after I renamed /usr/bin/esd to /usr/bin/broken_esd I can logout without killing the whole machine in the process. No problems also when artsd runs. Apparently gnome-session now brings up esd on its own while it was not doing that with gnome-1.4 (RH 7.3) so the trouble failed to materialize (or maybe esd was not _that_ broken). I can also confirm that this problem exists under RH9 with an Abit AT7 MAX mobo with Via sound. esd won't die when exiting X. Restarting X without killing the old esd hangs startx before the Gnome splash screen appears. Killing esd immediately allows Gnome to continue. An awful lot of noise on this bug. Perhaps it's time to open a new one? I'm running rh9 and the system doesn't hang, the only problem is "denial of sound service" and that esd doesn't do its job. To me this bug is pretty easy to isolate and reproduce. esd locks up when any program is trying to play a sound. Killing esd and letting the program access /dev/dsp manually works around the problem. When killing esd, the following line gets appended to the output of 'dmesg': via_audio: ignoring drain playback error -512 I'd be more than willing to help esd or sound driver people debugging this. Its on my todo list. I've spent some time working on it and can duplicate it locally to order but still don't understand why. Thanks for the bug report. However, Red Hat no longer maintains this version of the product. Please upgrade to the latest version and open a new bug if the problem persists. The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, and if you believe this bug is interesting to them, please report the problem in the bug tracker at: http://bugzilla.fedora.us/ |