Bug 157868

Summary: X server crashes randomly with signal 11 on epia M10000 board
Product: [Fedora] Fedora Reporter: Lucas Maneos <redhat>
Component: xorg-x11Assignee: X/OpenGL Maintenance List <xgl-maint>
Status: CLOSED WONTFIX QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: medium    
Version: 4CC: olivier.baudron
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-03-07 11:59:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
xorg.conf
none
Xorg.0.log.old - X server log after crash
none
Xorg.0.log - X server log while server is still running
none
/var/log/messages contents
none
/var/log/dmesg
none
Sysreport output
none
xorg log from 6.8.2-37.FC4.45 none

Description Lucas Maneos 2005-05-16 16:11:23 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.7.7) Gecko/20050421 Firefox/1.0.3 (Debian package 1.0.3-2)

Description of problem:
The X server is crashing randomly on this machine (epia M10000 motherboard, VIA CLE266 chip).  Not entirely sure if hardware is relevant, the problem occurs with both the 'via' and 'vesa' drivers.  Previously this box was running RH9 for years with no such problem.  An identical machine with FC3 also works fine.

I haven't managed to identify any particular circumstances that will always trigger the crash, although firefox seems to trigger it often.  Also, running twm seems much more stable than metacity - perhaps it's something to do with pixmaps?


Version-Release number of selected component (if applicable):
xorg-x11-6.8.2-31

How reproducible:
Sometimes

Steps to Reproduce:
1. Start an X session (runlevel 5 or startx from the console, doesn't make a difference)
2. Run some programs and wait for it to crash.  Firefox seems to trigger it often.

  

Actual Results:  X server crashes, console is stuck in graphics mode and unusable.

Expected Results:  Stable X operation.

Additional info:

Xorg.0.log shows the following:

Fatal server error:
Caught signal 11.  Server aborting

(and then suggests checking itself for additional information)

The kernel doesn't log anything, so the problem appears to  be entirely in user space.

Comment 1 Mike A. Harris 2005-05-16 16:32:20 UTC
Please try to narrow the problem down using a single video driver (via)
to specific steps to reproduce.  If there is a particular web page that
triggers this, or specific action in firefox, ie: moving the scroll bars,
or somesuch, this would be useful to know.

Please attach sysreport, X server log, X config file, /var/log/messages,
output of lsmod from after the X server is started - as individual
file attachments using the link below.

Setting status to "NEEDINFO"

Comment 2 Mike A. Harris 2005-05-16 16:34:00 UTC
Also, I notice this is reported for "i386", but bugzilla says your
web browser is PPC.  Is it just different machine you're reporting it
from, or should it be marked as a "ppc" problem?

Comment 3 Lucas Maneos 2005-05-16 17:36:20 UTC
Created attachment 114430 [details]
xorg.conf

I've commented out modules fbdevhw, glx, record and dri but it hasn't made a
difference.

Comment 4 Lucas Maneos 2005-05-16 17:37:07 UTC
Created attachment 114431 [details]
Xorg.0.log.old - X server log after crash

Comment 5 Lucas Maneos 2005-05-16 17:40:13 UTC
Created attachment 114432 [details]
Xorg.0.log - X server log while server is still running

Not sure if this is significant, but note that 'Frame buffer start' is
different every time the server starts.

Comment 6 Lucas Maneos 2005-05-16 17:41:39 UTC
Created attachment 114433 [details]
/var/log/messages contents

All syslog output (*.*) between X server startup and crash.

Comment 7 Lucas Maneos 2005-05-16 17:42:12 UTC
Created attachment 114434 [details]
/var/log/dmesg

Comment 8 Lucas Maneos 2005-05-16 17:47:54 UTC
Sysreport output to follow later (it seems to be running an rpm -Va, which seems
to be running prelink which takes ages).

I am indeed reporting from another machine, the problem is on i386.

I still haven't managed to isolate a specific thing that will trigger the crash,
but using firefox/mozilla for a few minutes seems to do it every time.

Comment 9 Lucas Maneos 2005-05-16 19:45:58 UTC
Created attachment 114438 [details]
Sysreport output

Comment 10 Lucas Maneos 2005-05-17 14:09:33 UTC
The problem appears to be, at least in part, windowmanager-related.  I had been
running twm since yesterday afternoon without incident, wheras after switching
to metacity or WindowMaker the X server crashes again after a few minutes (the
crash  frequency is actually much higher with wmaker).

Now running a kde session (for the last half hour or so) and it seems stable so far.

Comment 11 Lucas Maneos 2005-05-17 19:52:37 UTC
Managed to get a core file, but a stack backtrace isn't very illuminating:

(gdb) bt
#0  0x007cab04 in malloc_consolidate () from /lib/libc.so.6
#1  0x007cbe4d in _int_malloc () from /lib/libc.so.6
#2  0x007cd552 in malloc () from /lib/libc.so.6
#3  0x080e523b in Xalloc ()
#4  0x080e5e3d in Xcalloc ()
#5  0xb7f98541 in ?? ()
#6  0x00001000 in ?? ()
#7  0x00000000 in ?? ()


Is there a xorg-x11-debuginfo RPM somewhere?


Comment 12 Lucas Maneos 2005-05-22 11:05:24 UTC
The driver might not be relevant actually.  Just stuck an old s3 virge dx card
in the box, and it still segfaulted after a few minutes of firefox use.

May or may not be relevant: display was corrupted in all modes above 640x480 no
matter what driver options I tried, and kudzu probing the card produced a kernel
oops.


Comment 13 Olivier Baudron 2005-09-01 12:06:06 UTC
Can you try with the latest FC4 update (6.8.2-37.FC4.45) ?

Also, the backtrace would look better if you had linked with libefence.
Can you try it?

Comment 14 Lucas Maneos 2005-09-04 09:34:27 UTC
Ran for ~ 1 hour before crashing with signal 11, so definitely an improvement.

How would I go about linking with libefence?  If someone could provide an
appropriate RPM it would be a great help as building xorg on this box takes
quite a while.

Comment 15 Lucas Maneos 2005-09-04 09:37:26 UTC
Created attachment 118436 [details]
xorg log from 6.8.2-37.FC4.45

Comment 16 Olivier Baudron 2005-09-04 18:41:09 UTC
(In reply to comment #14)

You don't need to recompile. Here are the steps to follow:
1. Install the ElectricFence package.
2. Boot in runlevel 3
3. $ export LD_PRELOAD=/usr/lib/libefence.so
   $ startx
Then try to crash xorg and backtrace in the coredump.

Thanks for testing and posting the results.

Comment 17 Mike A. Harris 2005-09-14 09:31:01 UTC
Reviewing the log file unfortunately doesn't show any clues as to what
the problem might be.

Also, unfortunately...  creating a useful backtrace from the X server is
a bit more complicated than comment #16.  You have to:

1) Rebuild the src.rpm and enable DebugBuild so that symbols are not
   stripped from the server during rpm packaging.  Add ".debug" to the
   Release field so you know it is a debug build, and also not an
   official Red Hat build.  X does not have a debuginfo package for
   reasons that I wont go into in the bug report other than to say
   it is not easily possible due to the X ELF loader, and other ugly
   factors.

2) Install the newly built debuggable x packages.

3) Edit the config file and add option NoTrapSignals to the serverflags
   section (see Xorg/xorg.conf manpages for details)

4) Run the server as root, because SUID process will not produce core
   files.

5) Make sure ulimit is set to allow corefiles.

6) Trigger a crash.


At this point things get fun and exciting.  Since the X server dynamically
loads it's modules using it's own custom ELF loader, and gdb doesn't have
any clue about the X server's custom ELF loader, normal GNU gdb does not
have the ability to debug a running X server or make much useful sense
out of most core dumps in practice, although it is sometimes worthwhile
trying.

If nothing can be obtained that way usefully, then there are 2 options:

1) Compile a statically linked X server with debugging enabled, and try
   debugging that, or backtracing a corefile generated by the static server.

or

2) ftp://people.redhat.com/mharris/hacks has a customized version of gdb
   which I no longer maintain or support, which may or may not be useful
   in trying to debug the modular X server.  It used to work in the RHL 8.0
   days or thereabouts, but I stopped using it ages ago.  xf86Msg() and
   friends is what I use mostly nowadays.


At this point, it seems like this is possibly a driver specific issue,
or that it at least requires having the hardware in order to do further
diagnosis.  Unfortunately we do not have this via hardware, and are thus
unable to troubleshoot or debug the problem directly any further.

If this issue turns out to still be reproduceable in the latest
updates for this Fedora Core release, please file a bug report
in the X.Org bugzilla located at http://bugs.freedesktop.org in
the "xorg" component.

Once you've filed your bug report to X.Org, if you paste the new
bug URL here, Red Hat will continue to track the issue in the
centralized X.Org bug tracker, and will review any bug fixes
that become available for consideration in future updates.

Setting status to "NEEDINFO_REPORTER", and awaiting upstream
bug report URL for tracking.

Thanks in advance.


Comment 18 Mike A. Harris 2005-09-26 20:51:38 UTC
This problem sounds hardware specific, and we do not have this hardware
to attempt to reproduce.  Please file a bug in X.Org bugzilla for this
issue, and attach all relevant details to the X.org bug report.

http://bugs.freedesktop.org in the "xorg" component.

Once you've filed your bug report to X.Org, if you paste the new
bug URL here, Red Hat will continue to track the issue in the
centralized X.Org bug tracker, and will review any bug fixes that
become available for consideration in future updates.

Comment 19 Mike A. Harris 2006-01-18 02:01:30 UTC
It's been over 4 months since we've had any feedback on this issue.
Unfortunately, we do not have this hardware available in our lab for
direct diagnosis, so we require a 2 way communication link with the
reporter, or someone else who has the hardware directly available to
them to diagnose, who is willing to spend some time troubleshooting
in order for any progress to be made.

Has the problem vanished in a more recent Fedora xorg update?  If the
problem is no longer present, or if there is no longer any interest in
tracking this issue, please update the report to indicate the current
state of the issue, so we can proceed.

If the issue is still present in the latest Fedora Core 4 updates, I
would strongly encourage testing of the latest rawhide X, which is most
easily done by installing Fedora Core 5 test2.

If the problem exists still in the latest X.Org X11 builds in Fedora
development, it is probably going to require direct investigation by
the upstream via driver maintainers, as they have access directly to
the hardware in question.  In this case, please file a bug report
in X.Org bugzilla, at http://bugs.freedesktop.org in the "xorg" component,
detailing the issue, and attaching your X server log and config file
as individual file attachments.

If there is an X.Org bug for this issue already, or if you file one,
please paste the URL here so we can track the issue.  If a fix is
available from X.org, we will consider including it in future updates.

Thanks in advance.

Comment 20 Lucas Maneos 2006-01-22 17:45:03 UTC
Sorry for the lack of communication, things are pretty hectic here at the moment :-(

The problem is still present with xorg-x11-6.8.2-37.FC4.49.2 RPMs, but I don't
think it's a Xorg issue to be honest - up-to-date FC3 on identical hardware
works fine.  Maybe a compiler issue?

Comment 21 Mike A. Harris 2006-03-07 11:59:46 UTC
If this issue still occurs in Fedora Core 5 development, please file a bug
report in X.Org bugzilla upstream at http://bugs.freedesktop.org and paste
the URL here, and we will track the issue in X.Org bugzilla.

Closing "WONTFIX" for FC4.