Bug 90615

Summary: X spontaneously exits
Product: [Retired] Red Hat Linux Reporter: Need Real Name <dgl>
Component: XFree86Assignee: X/OpenGL Maintenance List <xgl-maint>
Status: CLOSED CURRENTRELEASE QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: low    
Version: 9CC: ivan.makfinsky, menscher, plazonic
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-10-01 05:02:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
The beginning and end of XFree86.0.log after the crash
none
XF86Config file
none
XFree86.0.log none

Description Need Real Name 2003-05-11 03:42:25 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030225

Description of problem:
Twice under redhat 9, I've left my computer with X windows running and returned
to find it at the console prompt.

Version-Release number of selected component (if applicable):


How reproducible:
Didn't try

Steps to Reproduce:
1.startx
2.???
3.
    

Actual Results:  X dies

Expected Results:  X doesn't die

Additional info:

Comment 1 Need Real Name 2003-05-11 03:44:18 UTC
Created attachment 91603 [details]
The beginning and end of XFree86.0.log after the crash

Comment 2 Need Real Name 2003-05-11 03:46:40 UTC
Created attachment 91604 [details]
XF86Config file

Comment 3 Mike A. Harris 2003-05-11 04:18:22 UTC
Your bug report doesn't contain any useful information that could be used
to determine what the problem is.  Also, you are not using a Red Hat supplied
kernel.

Please install the latest Red Hat official kernel for Red Hat Linux 9 from
Red Hat Network by using "up2date -f kernel".  While using this kernel,
please reproduce the problem, and provide detailed step by step instructions
on how someone else can easily reproduce this.

Then please attach the new X server log file from the failure case running
under the official Red Hat kernel, and we can try to investigate the problem
further.

Thanks in advance.

Comment 4 Mike A. Harris 2003-05-11 04:19:23 UTC
Also, please attach your /var/log/messages from when this problem occurs, as
well as the output of "lsmod" from while X is running.

Comment 5 Need Real Name 2003-05-11 14:34:24 UTC
I'm sure the bug has nothing to do with the kernel ... the first time the
problem happened, I was using the official redhat kernel.  Since then, I found
what I believe is a bug in that kernel (see bug 90462), so I reverted to the
2.4.20 kernel.  I am changing back to the official redhat kernel using up2date
-f kernel (although I got an error when I did that "error: db4 error(-30989)
from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found", I'm hoping it
still worked).

Sorry the info I sent you wasn't useful... I was hoping the
"bufferglEnable(GL_STENCIL_TEST) but no stencil" message, which repeated 12852
times, might be a clue.  I didn't realize the debugging content of the
XFree86.0.log file was a function of the kernel.  Whenever I'm not using
pthreads, I'll use the redhat kernel and see if this problem repeats.

Both times I saw the problem, I was not at my computer when it happened.  The
second time, I know I had enabled the screen saver... I can't remember if I had
enabled it before the first crash, but I think I did.  Is is possible for a
problem with the screensaver to cause X to crash?  If so, is there anything I
can do provide debugging info for the screensaver?

Comment 6 Need Real Name 2003-05-11 15:35:21 UTC
The /var/log/messages file has no more useful info from the time of the crash. 
The syslogd restarted at 4am.  :(  The lsmod output (still from the 2.4.20
kernel) follows:

[root@cartman dgl]# /sbin/lsmod 
Module                  Size  Used by    Not tainted
ide-cd                 33668   0  (autoclean)
cdrom                  33696   0  (autoclean) [ide-cd]
parport_pc             19044   1  (autoclean)
lp                      8996   0  (autoclean)
parport                37056   1  (autoclean) [parport_pc lp]
autofs                 13364   0  (autoclean) (unused)
via-rhine              15760   1 
mii                     3912   0  [via-rhine]
ohci1394               20136   0  (unused)
ieee1394               47020   0  [ohci1394]
nls_iso8859-1           3516   1  (autoclean)
nls_cp437               5116   1  (autoclean)
vfat                   13068   1  (autoclean)
fat                    38840   0  (autoclean) [vfat]
keybdev                 2944   0  (unused)
mousedev                5492   1 
hid                    22148   0  (unused)
input                   5728   0  [keybdev mousedev hid]
usb-uhci               26316   0  (unused)
ehci-hcd               17480   0  (unused)
usbcore                77600   1  [hid usb-uhci ehci-hcd]
ext3                   70144   3 
jbd                    51540   3  [ext3]

I will send the output from the 2.4.20-9 kernel after I reboot.  If and when I
see the problem again, I'll capture all of the requested information.  Perhaps
you should change the error output of xfree86 to request /var/log/messages in
addition to /var/log/XFree86.0.log if you need that info.

Comment 7 Mike A. Harris 2003-05-11 15:47:05 UTC
>I'm sure the bug has nothing to do with the kernel ...

The problem you are experiencing may very well have nothing to do with the
kernel at all, and it may theoretically occur under all kernels.  Red Hat
only provides support for systems which are running the official Red Hat
kernel however, and there is indeed a potential that a custom built kernel
or kernel obtained elsewhere, or kernel which has loaded proprietary or 3rd
party kernel modules may be causing system problems of all sorts.

As such, when problems do occur on a system, depending on the nature of the
problem, users who are not using the official Red Hat kernel at the time the
problem occurs, may be asked and/or required to reproduce the problem with
an official Red Hat kernel.  This is very important, to minimize the problem
domain down to the officially supported operating system components.

>(although I got an error when I did that "error: db4 error(-30989)
>from dbcursor->c_get: DB_PAGE_NOTFOUND: Requested page not found", I'm hoping
>it still worked).

That sounds like either a corrupt rpm database, or a bug in rpm.  You can
obtain help/technical support via the Red Hat mailing lists for that problem
if you require assistance, or possibly from your support representative if
you have a support contract with us.

>Sorry the info I sent you wasn't useful... I was hoping the
>"bufferglEnable(GL_STENCIL_TEST) but no stencil" message, which repeated 12852
>times, might be a clue. 

It may turn out to be useful perhaps, but without additional information
it is too early to say for sure.  I do not have access to any S3 Savage
hardware, so unfortunately I can't just fire up X myself on a Savage and
hope to be able reproduce this problem.  Assuming there is an XFree86
bug occuring here, in order to effectively troubleshoot the problem will
likely require someone with physical access to the hardware and the ability
to debug X related problems to investigate directly.  It will also require
the problem to be narrowed down to an easily reproduceable test case first.

I might be able to help narrow it down, but I can't debug the hardware or
the driver personally.  Someone else will have to do that.

>I didn't realize the debugging content of the
>XFree86.0.log file was a function of the kernel.

I'm not sure what lead you to believe that, but the XFree86 log file
has nothing to do with the kernel.  If you're refering to my request for
you to attach /var/log/messages, this is because debugging problems on
hardware to which you have no physical access is very highly dependant on
receiving as much information of the problematic system as possible.  Every
piece of data received contributes to a useful pool of information in which
some clues may be provided that can solve the problem.

A bug report with little or no concrete information, no debugging, no
specific details on how to reproduce or 100% reproduceable test case,
and to which developers investigating the matter don't have access to the
hardware the problem occurs on, unfortunately makes it next to impossible
to even investigate.

It's a process of gathering data, refining that data, offering suggestions
to the user having the problem on how to further narrow it down, and
repeating that process until a hypothesis can be made as to what the real
problem is, and perhaps attempt a code change somewhere to try to resolve
the issue.

If there is something missing from that process, then sometimes all we
can really do, is wait and hope a future release of XFree86, or of a given
video driver, just happens to fix the problem someone is experiencing.


>Is is possible for a problem with the screensaver to cause X to crash?

Theoretically, a bug can occur anywhere in a piece of code such as the
X server, so it is definitely possible for some codepath to be buggy enough
to crash if the right sequence of events happens.  A screensaver could thus
theoretically trigger some codepath that other software doesn't trigger
perhaps, and thus cause a crash to happen.  In the context of screensavers,
in problems that have been reported in the past, this occurs most frequently
with OpenGL screensavers on video hardware which XFree86 has DRI 3D
acceleration support for.  The simple test for such problems is to disable
DRI support and see if 3D crash problems go away.  The savage driver does not
contain 3D acceleration support however so that isn't a possibility here,
although your error messages do show 3D related errors.

>If so, is there anything I can do provide debugging info for the
>screensaver?

If the X server is crashing, then it isn't a problem with the screensaver, 
it could be a number of things, possibly 2D acceleration problems, possibly
hardware problems, or a variety of other things.  More information is
be required to really be able to make a solid assessment of the problem.

Again, without physical hardware access, as many other possibilities which
could be causing the problem need to be ruled out first.

Since these types of bug reports can often end up remaining open for very
long periods of time due to the various difficulties mentioned above in trying
to find a solution, you may wish to try and find a workaround in the interim
which may be adequate enough for now until the specific nature of this problem
you are experiencing is understood and can be explored more deeply.

Here are some suggestions which you can try out which may or may not be
adequate workarounds.  They may provide valuable clues as to what the problem
might be also:

- Try disabling 2D acceleration by using Option "noaccel" and/or by
  experimenting with the various XaaNo options described on the XF86Config
  manpage.

- Disable all OpenGL screen savers completely, or even disable the screensaver
  itself entirely.  Or, pick a single screensaver out of the bunch, instead
  of "random".

- Try using the "vesa" driver, which is unaccelerated and slow

Thos are some options you can try at least which if this is really an
XFree86 bug, may work around.  Some of the options may also work around
hardware flaws, bad video memory, and other possible problems.

Please provide any updated info you can over time, and hopefully we can
narrow things down and come up with a better assessment of the problem and
possible fixes.

Thanks.

Comment 8 Mike A. Harris 2003-05-11 15:51:06 UTC
>The /var/log/messages file has no more useful info from the time of the crash. 

You don't know what I am looking for in the output of these files.  It is easier
for you to attach them and let me determine if they contain information useful
to me for troubleshooting purposes, than it is for me to explain the many
things that I might be looking for.  When in doubt, attach more information
and let developers work it out.  ;o)

Also, when I'm requesting such information, I want only information such as
lsmod and /var/log/messages obtained while you are booted into a Red Hat
supplied official kernel.  This is very important.



Comment 9 Need Real Name 2003-05-11 17:04:51 UTC
Here is the output from /sbin/lsmod with the redhat kernel running:

[dgl@cartman dgl]$ /sbin/lsmod 
Module                  Size  Used by    Not tainted
ide-cd                 35708   0 (autoclean)
cdrom                  33728   0 (autoclean) [ide-cd]
parport_pc             19076   1 (autoclean)
lp                      8996   0 (autoclean)
parport                37056   1 (autoclean) [parport_pc lp]
autofs                 13268   0 (autoclean) (unused)
via-rhine              15856   1
mii                     3976   0 [via-rhine]
microcode               4668   0 (autoclean)
ohci1394               20136   0 (unused)
ieee1394               48780   0 [ohci1394]
nls_iso8859-1           3516   1 (autoclean)
nls_cp437               5116   1 (autoclean)
vfat                   13004   1 (autoclean)
fat                    38808   0 (autoclean) [vfat]
keybdev                 2944   0 (unused)
mousedev                5492   1
hid                    22148   0 (unused)
input                   5856   0 [keybdev mousedev hid]
usb-uhci               26348   0 (unused)
ehci-hcd               19976   0 (unused)
usbcore                78816   1 [hid usb-uhci ehci-hcd]
ext3                   70784   2
jbd                    51892   2 [ext3]

As for the /var/log/message file, it may very well have had useful information
after the crash, but by the time I got your email this morning, the syslogd had
restarted overnight and var/log/message was empty.

As for your lack of access to S3 Savage hardware, I purchased this computer from
walmart.com.  I did so because they allow you to buy semi-custom computers
without a M$ operating system.  Therefore, I suspect there are many others in
the Linux community that also have this Via motherboard which has onboard video
in the form of S3 Savage.  Perhaps redhat could get a free motherboard from Via
or Walmart?  Just a thought.

OK... I'll stop bugging you until I've repeated the problem.  ;)


Comment 11 Josko Plazonic 2003-05-29 14:58:14 UTC
Actually, this seems like a persistant problem one of my people is seeing on a
radeon VE used with 2 monitors (hence xinerama used).  It seems to be triggered
by xscreensaver and I strongly suspect it is 3d, in fact, after turning off glx
he can't get it to crash anymore.  Symptoms are very similar - leave the desk
and quite often come back to a logged out state.  The only log found in
XFree86.0.log file is signal 11.  Misteriously, it happens often when logging on
via gdm and takes a lot longer to happen when using startx.

Now, I know this is pretty much useless for debugging but I'd like some pointers
on how to do it effectively.  I am currently trying to trigger the bug on
another machine after setting ulimit to unlimited for crash dumps, as a common
user.  Any hope something will get dumped if I can manage to trigger it?  Any
other suggestions on how to do it properly, like flags to tell xfree to dump
core or anything like it?  I am far from afraid to dig into X source code or
anything to help, just that the beast like X is not as easy as some other
software to debug (e.g. can't immagine how to run it under gdb?)....  May I s

I should say - tried 4.3.0-10 with no improvement.  Would using the -12 (that
seems to have debugging symbols) help in debugging? 

btw, thanks for great work Mike

Comment 12 Ivan Makfinsky 2003-06-11 19:13:46 UTC
I can also confirm that this happens on two sets of hardware i have running.
Both exit with signal 11 randomly. One is a p4 with a matrox g450 dual headed
card running xinerama and the other a p4 with an Nvida tnt card. Both are
running kde with gdm login and both have screensavers enabled. On the Matrox
dual headed machine I can successfully get it to crash by running the molecule
opengl screensavers, however I cannot repeat on the Nvidia card. As a matter of
fact, I have had the Nvidia machine crash while I was working on it - running
xmms, mozilla, terminal, kde, kmix and several applets. Both times nothing is
loggd in either /var/log/messages nor /var/log/XFree86.0.log.old except the
following:

(**) Mouse0: ZAxisMapping: buttons 4 and 5
(**) Mouse0: Buttons: 5
(II) Keyboard "Keyboard0" handled by legacy driver
(II) XINPUT: Adding extended input device "Mouse0" (type: MOUSE)
(II) Mouse0: ps2EnableDataReporting: succeeded

   *** If unresolved symbols were reported above, they might not
   *** be the reason for the server aborting.

Fatal server error:
Caught signal 11.  Server aborting


When reporting a problem related to a server crash, please send
the full server output, not just the last messages.
This can be found in the log file "/var/log/XFree86.0.log".
Please report problems to xfree86.

I can provide more log files, just le me know what to provide. Also, I am going
to disable the Xscreensaver completely from both and see if that helps any as I
suspect it might.

Comment 13 Damian Menscher 2004-01-14 19:15:30 UTC
One of my users has been plagued by this bug.  About one X crash a 
week for the past 2-3 months.  I finally got lucky and saw 
the "bufferglEnable(GL_STENCIL_TEST) but no stencil" message which 
led me here.

Like everyone else, I suspect that XScreenSaver is the trigger (since 
it always fails when nobody is around).  I'm running a fairly 
standard hardware/software configuration: Intel motherboard, dual P4 
(with HT enabled), NVidia GForce2 graphics, RH stock kernel, default 
XF86Config file.

My lsmod is even pretty generic:
# lsmod                
Module                  Size  Used by    Not tainted
es1371                 34952   0  (autoclean)
gameport                3508   0  (autoclean) [es1371]
ac97_codec             14696   0  (autoclean) [es1371] 
soundcore               7044   4  (autoclean) [es1371]
ide-cd                 35808   0  (autoclean)
cdrom                  34176   0  (autoclean) [ide-cd]
parport_pc             19204   1  (autoclean)
lp                      9188   0  (autoclean)
parport                39072   1  (autoclean) [parport_pc lp]
nfsd                   81104   8  (autoclean)
lockd                  59536   1  (autoclean) [nfsd]
sunrpc                 87516   1  (autoclean) [nfsd lockd]
e100                   56356   1 
ipt_REJECT              3992   2  (autoclean)
ipt_LOG                 4280   1  (autoclean)
ipt_limit               1688   5  (autoclean)
ipt_mac                 1208  16  (autoclean)
ipt_state               1080   1  (autoclean)
ip_conntrack           29896   1  (autoclean) [ipt_state]
iptable_filter          2412   1  (autoclean)
ip_tables              15864   6  [ipt_REJECT ipt_LOG ipt_limit 
ipt_mac ipt_state iptable_filter]
loop                   12888   0  (autoclean)
keybdev                 2976   0  (unused)
mousedev                5688   1
hid                    22404   0  (unused)
input                   6208   0  [keybdev mousedev hid]
usb-uhci               27468   0  (unused)
ehci-hcd               20584   0  (unused)
usbcore                82816   1  [hid usb-uhci ehci-hcd]
ext3                   73376  11  
jbd                    56368  11  [ext3]
lvm-mod                64544  21
3w-xxxx                40128   3
sd_mod                 13452   6  
scsi_mod              110872   2  [3w-xxxx sd_mod]

Here are the potentially-relevant lines from /var/log/messages.  To 
make it easier to follow, the name/IP of the machine are 
astro/130.126.8.170 and the affected user is shapiro.  He reports it 
was hung when he returned to his office at 9:13.  He fixed it via a 
<Ctrl><Alt>-<Backspace> at 9:15.

Jan 13 12:28:03 astro ypserv[805]: refused connect from 
130.126.8.170:36784 to procedure ypproc_match (astro-
theory,shadow.byname;-1)
Jan 13 13:26:12 astro ypserv[805]: refused connect from 
130.126.8.170:36784 to procedure ypproc_match (astro-
theory,shadow.byname;-1)
Jan 13 13:31:55 astro ypserv[805]: refused connect from 
130.126.8.170:36807 to procedure ypproc_match (astro-
theory,shadow.byname;-1)
Jan 13 13:54:52 astro ypserv[805]: refused connect from 
130.126.8.170:36821 to procedure ypproc_match (astro-
theory,shadow.byname;-1)
Jan 13 15:23:03 astro ypserv[805]: refused connect from 
130.126.8.170:36838 to procedure ypproc_match (astro-
theory,shadow.byname;-1)
Jan 13 17:18:49 astro ypserv[805]: refused connect from 
130.126.8.170:36863 to procedure ypproc_match (astro-
theory,shadow.byname;-1)
Jan 13 17:59:02 astro ypserv[805]: refused connect from 
130.126.8.170:36863 to procedure ypproc_match (astro-
theory,shadow.byname;-1)
Jan 13 23:26:14 astro ypserv[805]: refused connect from 
130.126.8.170:36894 to procedure ypproc_match (astro-
theory,shadow.byname;-1)
Jan 14 04:06:25 ontario kernel: nfs: server astro not responding, 
still trying
Jan 14 04:06:27 ontario kernel: nfs: server astro OK 
Jan 14 08:44:36 astro ypserv[805]: refused connect from 
130.126.8.170:36906 to procedure ypproc_match (astro-
theory,shadow.byname;-1)
Jan 14 09:12:13 astro ypserv[805]: refused connect from 
130.126.8.170:36928 to procedure ypproc_match (astro-
theory,shadow.byname;-1)
Jan 14 09:15:29 astro gdm(pam_unix)[1145]: session closed for user 
shapiro 
Jan 14 09:15:30 astro gdm[1145]: gdm_slave_xioerror_handler: Fatal X 
error - Restarting :0
Jan 14 09:15:30 astro modprobe: modprobe: Can't locate module char-
major-10-134
Jan 14 09:15:42 astro gdm(pam_unix)[15106]: session opened for user 
shapiro by (uid=0)
Jan 14 09:15:45 astro ypserv[805]: refused connect from 
130.126.8.170:36928 to procedure ypproc_match (astro-
theory,shadow.byname;-1)
Jan 14 09:15:47 astro kernel: cdrom: This disc doesn't have any 
tracks I recognize!

Interestingly, I have 8 identical machines, and only 1 causes this 
problem.  Maybe other users just enable different screensavers.

Please let me know if there's any other information I can provide.


Comment 14 Damian Menscher 2004-01-14 19:20:40 UTC
Created attachment 96987 [details]
XFree86.0.log

Log file showing the problem.  Note the large section of repeated
"bufferglEnable(GL_STENCIL_TEST) but no stencil" errors.

Comment 15 Damian Menscher 2004-01-26 00:11:59 UTC
Thought I'd provide a little more insight here:

Unchecking the "Power Management Enabled" box in XScreenSaver (or 
setting "dpmsEnabled: False" in the ~/.xscreensaver) caused the 
problem to go away for my user.

I tried testing with my own account by enabling DPMS support, but was 
unable to come up with a reliable testcase to reproduce the problem.  
Still, it would be interesting to find out if others experiencing the 
problem have power management enabled in their screensavers.

Mike, should I open my findings as a new bug?

Comment 16 Mike A. Harris 2004-10-01 05:02:15 UTC
Since this bugzilla report was filed, there have been several major
updates to the X Window System, which may resolve this issue.  Users
who have experienced this problem are encouraged to upgrade to the
latest version of Fedora Core, which can be obtained from:

    http://fedora.redhat.com

If this issue turns out to still be reproduceable in the latest
version of Fedora Core, please file a bug report in the X.Org
bugzilla located at http://bugs.freedesktop.org in the "xorg"
component.

Once you've filed your bug report to X.Org, if you paste the new
bug URL here, Red Hat will continue to track the issue in the
centralized X.Org bug tracker, and will review any bug fixes
that become available for consideration in future updates.