Bug 157868
Summary: | X server crashes randomly with signal 11 on epia M10000 board | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Lucas Maneos <redhat> | ||||||||||||||||
Component: | xorg-x11 | Assignee: | X/OpenGL Maintenance List <xgl-maint> | ||||||||||||||||
Status: | CLOSED WONTFIX | QA Contact: | David Lawrence <dkl> | ||||||||||||||||
Severity: | high | Docs Contact: | |||||||||||||||||
Priority: | medium | ||||||||||||||||||
Version: | 4 | CC: | olivier.baudron | ||||||||||||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||||||||||||
Target Release: | --- | ||||||||||||||||||
Hardware: | i686 | ||||||||||||||||||
OS: | Linux | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||||||
Clone Of: | Environment: | ||||||||||||||||||
Last Closed: | 2006-03-07 11:59:46 UTC | Type: | --- | ||||||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||||||
Documentation: | --- | CRM: | |||||||||||||||||
Verified Versions: | Category: | --- | |||||||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||||||
Embargoed: | |||||||||||||||||||
Attachments: |
|
Description
Lucas Maneos
2005-05-16 16:11:23 UTC
Please try to narrow the problem down using a single video driver (via) to specific steps to reproduce. If there is a particular web page that triggers this, or specific action in firefox, ie: moving the scroll bars, or somesuch, this would be useful to know. Please attach sysreport, X server log, X config file, /var/log/messages, output of lsmod from after the X server is started - as individual file attachments using the link below. Setting status to "NEEDINFO" Also, I notice this is reported for "i386", but bugzilla says your web browser is PPC. Is it just different machine you're reporting it from, or should it be marked as a "ppc" problem? Created attachment 114430 [details]
xorg.conf
I've commented out modules fbdevhw, glx, record and dri but it hasn't made a
difference.
Created attachment 114431 [details]
Xorg.0.log.old - X server log after crash
Created attachment 114432 [details]
Xorg.0.log - X server log while server is still running
Not sure if this is significant, but note that 'Frame buffer start' is
different every time the server starts.
Created attachment 114433 [details]
/var/log/messages contents
All syslog output (*.*) between X server startup and crash.
Created attachment 114434 [details]
/var/log/dmesg
Sysreport output to follow later (it seems to be running an rpm -Va, which seems to be running prelink which takes ages). I am indeed reporting from another machine, the problem is on i386. I still haven't managed to isolate a specific thing that will trigger the crash, but using firefox/mozilla for a few minutes seems to do it every time. Created attachment 114438 [details]
Sysreport output
The problem appears to be, at least in part, windowmanager-related. I had been running twm since yesterday afternoon without incident, wheras after switching to metacity or WindowMaker the X server crashes again after a few minutes (the crash frequency is actually much higher with wmaker). Now running a kde session (for the last half hour or so) and it seems stable so far. Managed to get a core file, but a stack backtrace isn't very illuminating: (gdb) bt #0 0x007cab04 in malloc_consolidate () from /lib/libc.so.6 #1 0x007cbe4d in _int_malloc () from /lib/libc.so.6 #2 0x007cd552 in malloc () from /lib/libc.so.6 #3 0x080e523b in Xalloc () #4 0x080e5e3d in Xcalloc () #5 0xb7f98541 in ?? () #6 0x00001000 in ?? () #7 0x00000000 in ?? () Is there a xorg-x11-debuginfo RPM somewhere? The driver might not be relevant actually. Just stuck an old s3 virge dx card in the box, and it still segfaulted after a few minutes of firefox use. May or may not be relevant: display was corrupted in all modes above 640x480 no matter what driver options I tried, and kudzu probing the card produced a kernel oops. Can you try with the latest FC4 update (6.8.2-37.FC4.45) ? Also, the backtrace would look better if you had linked with libefence. Can you try it? Ran for ~ 1 hour before crashing with signal 11, so definitely an improvement. How would I go about linking with libefence? If someone could provide an appropriate RPM it would be a great help as building xorg on this box takes quite a while. Created attachment 118436 [details]
xorg log from 6.8.2-37.FC4.45
(In reply to comment #14) You don't need to recompile. Here are the steps to follow: 1. Install the ElectricFence package. 2. Boot in runlevel 3 3. $ export LD_PRELOAD=/usr/lib/libefence.so $ startx Then try to crash xorg and backtrace in the coredump. Thanks for testing and posting the results. Reviewing the log file unfortunately doesn't show any clues as to what the problem might be. Also, unfortunately... creating a useful backtrace from the X server is a bit more complicated than comment #16. You have to: 1) Rebuild the src.rpm and enable DebugBuild so that symbols are not stripped from the server during rpm packaging. Add ".debug" to the Release field so you know it is a debug build, and also not an official Red Hat build. X does not have a debuginfo package for reasons that I wont go into in the bug report other than to say it is not easily possible due to the X ELF loader, and other ugly factors. 2) Install the newly built debuggable x packages. 3) Edit the config file and add option NoTrapSignals to the serverflags section (see Xorg/xorg.conf manpages for details) 4) Run the server as root, because SUID process will not produce core files. 5) Make sure ulimit is set to allow corefiles. 6) Trigger a crash. At this point things get fun and exciting. Since the X server dynamically loads it's modules using it's own custom ELF loader, and gdb doesn't have any clue about the X server's custom ELF loader, normal GNU gdb does not have the ability to debug a running X server or make much useful sense out of most core dumps in practice, although it is sometimes worthwhile trying. If nothing can be obtained that way usefully, then there are 2 options: 1) Compile a statically linked X server with debugging enabled, and try debugging that, or backtracing a corefile generated by the static server. or 2) ftp://people.redhat.com/mharris/hacks has a customized version of gdb which I no longer maintain or support, which may or may not be useful in trying to debug the modular X server. It used to work in the RHL 8.0 days or thereabouts, but I stopped using it ages ago. xf86Msg() and friends is what I use mostly nowadays. At this point, it seems like this is possibly a driver specific issue, or that it at least requires having the hardware in order to do further diagnosis. Unfortunately we do not have this via hardware, and are thus unable to troubleshoot or debug the problem directly any further. If this issue turns out to still be reproduceable in the latest updates for this Fedora Core release, please file a bug report in the X.Org bugzilla located at http://bugs.freedesktop.org in the "xorg" component. Once you've filed your bug report to X.Org, if you paste the new bug URL here, Red Hat will continue to track the issue in the centralized X.Org bug tracker, and will review any bug fixes that become available for consideration in future updates. Setting status to "NEEDINFO_REPORTER", and awaiting upstream bug report URL for tracking. Thanks in advance. This problem sounds hardware specific, and we do not have this hardware to attempt to reproduce. Please file a bug in X.Org bugzilla for this issue, and attach all relevant details to the X.org bug report. http://bugs.freedesktop.org in the "xorg" component. Once you've filed your bug report to X.Org, if you paste the new bug URL here, Red Hat will continue to track the issue in the centralized X.Org bug tracker, and will review any bug fixes that become available for consideration in future updates. It's been over 4 months since we've had any feedback on this issue. Unfortunately, we do not have this hardware available in our lab for direct diagnosis, so we require a 2 way communication link with the reporter, or someone else who has the hardware directly available to them to diagnose, who is willing to spend some time troubleshooting in order for any progress to be made. Has the problem vanished in a more recent Fedora xorg update? If the problem is no longer present, or if there is no longer any interest in tracking this issue, please update the report to indicate the current state of the issue, so we can proceed. If the issue is still present in the latest Fedora Core 4 updates, I would strongly encourage testing of the latest rawhide X, which is most easily done by installing Fedora Core 5 test2. If the problem exists still in the latest X.Org X11 builds in Fedora development, it is probably going to require direct investigation by the upstream via driver maintainers, as they have access directly to the hardware in question. In this case, please file a bug report in X.Org bugzilla, at http://bugs.freedesktop.org in the "xorg" component, detailing the issue, and attaching your X server log and config file as individual file attachments. If there is an X.Org bug for this issue already, or if you file one, please paste the URL here so we can track the issue. If a fix is available from X.org, we will consider including it in future updates. Thanks in advance. Sorry for the lack of communication, things are pretty hectic here at the moment :-( The problem is still present with xorg-x11-6.8.2-37.FC4.49.2 RPMs, but I don't think it's a Xorg issue to be honest - up-to-date FC3 on identical hardware works fine. Maybe a compiler issue? If this issue still occurs in Fedora Core 5 development, please file a bug report in X.Org bugzilla upstream at http://bugs.freedesktop.org and paste the URL here, and we will track the issue in X.Org bugzilla. Closing "WONTFIX" for FC4. |