Bug 116738

Summary: System Freezes when operating under Xwindows
Product: Red Hat Enterprise Linux 2.1 Reporter: Raul Pingarron <raul.pingarron>
Component: XFree86Assignee: X/OpenGL Maintenance List <xgl-maint>
Status: CLOSED WONTFIX QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: medium    
Version: 2.1CC: martin.wilck, raimondi
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-05-12 05:47:15 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 131672    
Bug Blocks: 143573    
Attachments:
Description Flags
/var/log/messages
none
Output from lspci
none
Output from lsmod
none
XF86 config file
none
boot.log
none
/var/log/messsages
none
Xfree86 log file
none
XFree config file (with ati driver)
none
Output from "lsmod"
none
Output from "lspci -v v"
none
SYSREPORT output file none

Description Raul Pingarron 2004-02-24 20:18:20 UTC
Description of problem:

RH AS2.1 installed on Fujitsu Siemens PRIMERGY TX200. The server 
freezes randomly; this is happening in several machines (over 50) 
with the same HW&SW configuration. We are unable to reproduce the 
problem.

There's a user which starts XF86 (startx 2>>/dev/null in .bashrc) 
whith ICEwm as window manager. The user works with Mozilla 1.4 and 
opens several PHPs which query a local Oracle9.2 instance.
The XFree are XF86-4.10-46
Kernel 2.4.9-e.24smp
VesaFRAMEBUFFER device @800x600x16 (dev/fb0); we also setted up
the ATI driver [autodetected] (system has ATI Rage onboard graphics 
adapter) but the problem still arises: the machine get "frozen" so no 
response from keyboard and mouse (even ping, etc). But the user can 
see the screen info (so the last window onto it was working).
The SAR was running but when the system freezes the SAR log files do 
not display any info. Even we do not see any error 
through /var/log/messages. The server has also a HW Management 
Processor and we obtained the HW LOGs as well but we do not see 
anything which points to HW related issue.

works without problems unless:
 - System is left in runlevel 3 and no Xwindow is launched

Version-Release number of selected component (if applicable):
XF86 4.10-46
Kernel 2.4.9-e.24smp
Mozilla 1.4
ICEWM
apache 1.3.28
Oracle9.2


How reproducible:

We could not reproduce the failure. It appears sporadicaly but
we cannot force it to appear...

Comment 1 Mike A. Harris 2004-02-25 10:58:07 UTC
When the next problem occurs, please boot directly into runlevel 3
from the bootloader, and make a backup copy of your X server log.
Please attach that, along with a copy of your XFree86 config file,
and complete copy of your /var/log/messages which goes back at
least to the reboot in which the system hung.  Use bugzilla's
file attachment feature below to attach each file individually
and uncompressed.

Also please let the system run for a while normally as you would,
and then run "lsmod", and "lspci -vv" and attach the output
to the bug report.

Be sure first that your system is 100% updated with all updates
available for this OS release, and that you are using use the "ati"
driver which is included with the OS.  We do not support the
"vesa" driver, nor X running on the kernel framebuffer devices.

I do not know what "SAR" is above, could you clarify that as well
please?

Thanks in advance.

Comment 2 Raul Pingarron 2004-02-25 16:10:40 UTC
Created attachment 98041 [details]
/var/log/messages

Comment 3 Raul Pingarron 2004-02-25 16:11:44 UTC
Created attachment 98042 [details]
Output from lspci

Comment 4 Raul Pingarron 2004-02-25 16:12:24 UTC
Created attachment 98043 [details]
Output from lsmod

Comment 5 Raul Pingarron 2004-02-25 16:12:59 UTC
Created attachment 98044 [details]
XF86 config file

Comment 6 Raul Pingarron 2004-02-25 16:17:23 UTC
Hi,
We have also configured the machine to use the ati display driver but 
it still continues freezing.
About the "SAR", the sar accounting is active in the system and 
therefore is dumping the logs to /var/log/sa. If we display the sar 
log file for the day when the machine got frozen (sar -A -
f /var/log/sa/sa25) then sar does did not registered any activity for 
this period of time; what I meant is that we really do not have any 
further debug info...

I attached you the /var/log/meessages, the output from "lspvi -vv" 
and lsmod as well as the XF86 config file under framebuffer (mind 
that we also tried with ati but the machine freezes again, sorry but 
this was yesterday before receiving your mail and I did not save the 
logs...).

Thanks in advance.
Best Regards

Comment 7 Mike A. Harris 2004-02-25 18:31:30 UTC
Ok, I've analyzed the various comments above as well as the file
attachments that you have provided.

Red Hat only supports the XFree86 native video drivers.  All other
drivers are provided only for end user convenience and are not
supported by Red Hat.  If you encounter problems using any non-native
driver, we require that you reproduce the problem using a supported
driver supplied with the OS.  All config files, log files, etc. that
are requested in order to investigate any XFree86 related bug, must
be configured to use a supported driver, or they are not useful
in troubleshooting a problem that is encountered.

Additionally, you are not using the latest official Red Hat
supplied kernel update, which is required for support.  Your
system is also using unsupported 3rd party kernel modules.

Red Hat does not support systems that are using XFree86 on
the kernel framebuffer, which is indicated in your log file.  We
only support systems which are using the latest updated rpm
packages that have been released for the OS, and are using only
kernel modules and XFree86 drivers supplied by Red Hat in binary
form.  The "vesa" driver, as well as the XFree86 kernel framebuffer
support is included only for convenience for users, and is not
supported by Red Hat.

The following procedures would need to be done in order to get
the system into a more supported state:

- Upgrade your kernel to the latest official Red Hat kernel
  release for AS 2.1.  You must be using the binary compiled kernel
  supplied by Red Hat, and not a recompiled one.

- Upgrade all other rpm packages on your system including XFree86
  to the latest officially released versions.

- You must not load any 3rd party kernel modules (proprietary or
  otherwise) after system has booted.

- You must use the official native driver for the video hardware
  in question, in this case the "ati" driver.

After doing the above, if you can reproduce the same problems
without using any 3rd party kernel modules, then please feel free
to reopen this bug report, and please attach to the bug report the
following specific items:

1) The X server configuration file that was used at the time the
   problem occured

2) The X server log file that was generated at the time the problem
   occured.  Keep in mind that when X starts up, it wipes out the
   previous log file and creates a new one, so you need to backup
   the log from the failed session, prior to starting a new session.

3) The complete contents of /var/log/messages from the time of boot
   which resulted in the system hanging.

4) The output of "lsmod" and "lspci -vvxxx" from after a successful
   X server startup, preferably after the system has been running
   for a while doing normal tasks, but prior to any lockups.

Due to the level of unsupported aspects of your configuration,
I'm closing the bug report as "NOTABUG" for the time being, however
if you can meet the above criterion and still reproduce the problem,
feel free to reopen the report with all of the above details,
including detailed steps of how to reproduce the issue, and I
will review the updated information.

Thanks in advance.




Comment 8 Raul Pingarron 2004-03-02 14:49:48 UTC
Created attachment 98188 [details]
boot.log

Comment 9 Raul Pingarron 2004-03-02 14:56:00 UTC
Hello,

we reconfigured the machine and we're using the ATI display driver; 
the system has applied AS 2.1 R2 erratas.
The problem still persists !!


Comment 10 Raul Pingarron 2004-03-02 14:57:14 UTC
Created attachment 98189 [details]
/var/log/messsages

The system got frozen on 01/March/2004 at 15:01pm

Comment 11 Raul Pingarron 2004-03-02 14:59:02 UTC
Created attachment 98190 [details]
Xfree86 log file

Comment 12 Raul Pingarron 2004-03-02 14:59:59 UTC
Created attachment 98191 [details]
XFree config file (with ati driver)

Comment 13 Raul Pingarron 2004-03-02 15:02:06 UTC
Created attachment 98192 [details]
Output from "lsmod"

Comment 14 Raul Pingarron 2004-03-02 15:03:04 UTC
Created attachment 98193 [details]
Output from "lspci -v v"

Comment 15 Mike A. Harris 2004-03-02 18:35:51 UTC
As indicated in my last comment above, you are not using the latest
updates that are available from Red Hat.  Also, as indicated above,
Red Hat explicitly does not support systems that are using 3rd party
kernel modules.

Re-closing as NOTABUG.

Comment 16 Mike A. Harris 2004-03-02 19:40:08 UTC
(as a side note, you are using XFree86 4.1.0, not 3.3.6, however
you have filed this bug report against XFree86-Servers 3.3.6, and
included your 4.1.0 X server log, and the 3.3.6 config file.)

Comment 18 Raul Pingarron 2004-04-19 11:30:18 UTC
Hello,

Now the systems has been updated to RH AS 2.1 U3 ! and still freezes. 
I atach you the SYSREPORT file.

Comment 19 Raul Pingarron 2004-04-19 11:31:54 UTC
Created attachment 99530 [details]
SYSREPORT output file

Comment 20 Raul Pingarron 2004-04-19 11:33:51 UTC
Also the window manager has been changed, we are now using GNOME as 
comes with RH AS 2.1 U3 as well as Xfree86 release 4.1.0-50el with 
ATIMACH64 (ati) graphic driver.


Comment 21 Martin Wilck 2004-04-19 12:15:49 UTC
Here is some additional information:

We cannot use a newer update kernel because of the release strategy of
third-party software involved, and because these are production
systems that the customer doesn't want to be updated often.

Unfortunately, we have not yet been able to reproduce the problem
under laboratory conditions. We have seen the problem also on HP
hardware, therefore we do not think our hardware is responsible.

We have installed te netdump debugging facility on several systems,
but the freezes are sporadic, and we didn's see any on those systems
so far.

We have considered activating the NMI watchdog on the customer
systems, but we are a bit afraid of the possibility that the watchdog
may kill an alive system which is under heavy load. Could you please
comment on your experiences with the watchdog in this respect?

It is unclear to which extent X is actually involved in the problem.

The systems are normally in runlevel 3 (no X), and the freeze only
occurs if a certain user is logged in who runs an X session (startx)
from his login scripts. The only application running in the X session
is mozilla, which is used as an UI for the database application
running on the system. 

The systems are automatically rebooted every night, and usually in the
morning the mentioned user logs in and starts X. The systems have
never frozen between the nightly reboot and the login in the morning.
OTOH, in the morning, general system load starts to increase, too, so
the X server isn't necessarily responsible.

Thus: perhaps the affected component for this BUG should be changed to
"kernel". At least the kernel guys should have a look at the problem.

Another info: We suspected that the X screensaver / screen blanking
may have something to do with the problem. So we used "xset noblank"
in the startup scripts (the attached monitors are not DPMS-capable, so
that DPMS blanking is off anyway). Problem persists.




Comment 22 Martin Wilck 2004-04-19 12:17:15 UTC
Added myself to cc list.


Comment 23 Mike A. Harris 2005-03-06 18:32:35 UTC
Does this problem still occur on the latest RHEL update release,
with the latest kernel and XFree86 errata applied, and no
3rd party kernel modules loaded?

Comment 24 Martin Wilck 2005-03-07 09:02:16 UTC
It probably will unless bug #131762 is fixed (see also #122729 which
is related to this one). 

I'll inquire with Raul anyway.


Comment 25 Martin Wilck 2005-03-07 09:22:27 UTC
Sorry I meant #131672 above.


Comment 26 Mike A. Harris 2005-03-07 15:08:41 UTC
bug #131672

Comment 27 Mike A. Harris 2005-03-07 15:21:11 UTC
Ok, with your system fully updated to all of the latest RHEL 2.1
erratum, I need you to do the following:

1) disable all proprietary and other 3rd party kernel modules from
   loading.

2) Configure the XFree86 4.1.0 X server to use the "ati" driver.

3) Reproduce the problem

4) Attach the X server log file, config file, /var/log/messages,
   and /var/log/dmesg from after the problem occurs.  Also needed
   are the output of "lsmod" from just after the X server is started,
   and again from just after the problem occurs if possible.

Also, please inidicate how bug #131672 is related to this one as
I do not see the connection.

After re-reviewing this bug report, it appears to me that you are
having a system hang, and that it might be totally unrelated to
X.  I'd like to establish wether X is at all involved in the
problem or not.  If your kernel or hardware hangs, then X will
also be unresponsive, however that does not automatically indicate
that it is an X server problem.

I'm heavily suspecting that the 3rd party proprietary
modules you are using (which are unsupported) are the problem, or
that this is a kernel issue.

Please update the report with the requested info once you've
had time to test.

Thanks in advance.

Comment 28 Mike A. Harris 2005-05-12 05:47:15 UTC
The initial report here speaks of a problem that occurs only sporadically,
and is not reproduceable on demand.  By reviewing the log files and other
attachments I determined that there were 3rd party kernel modules loaded
on the system.  I've subsequently requested that the system be updated
to the latest RHEL 2.1 updates released by Red Hat.  Further attachments
still show that unsupported modules are being used, and an older kernel
is being used that we do not support.

In order to investigate problems of this nature, customers must be using
our latest updates, even if it is just temporarily to diagnose the problem.
If you absolutely must use an older unsupported kernel for regular
operation, we still need to have our latest kernel installed, and the
problem reproduced under a supported configuration with no 3rd party
kernel modules loaded.

We've requested information a few times now, and not had sufficient
information provided back with which we can work with.  As originally
stated in the initial bug report the problem is hard to reproduce, so
we need to make sure that it is not caused by the system having
unsupported components being used before we can investigate the issue.

Currently however, Red Hat Enterprise Linux 2.1 is now in security fix
only mode, and we are no longer providing fixes for non-security issues.

Since RHEL 2.1 is now in maintenance only mode, the problem is
non-reproduceable, and there's insufficient information present in
this bug report to diagnose the problem, I'm closing this as "WONTFIX"
for RHEL 2.1.

Please upgrade to Red Hat Enterprise Linux 4 (or RHEL 3), and if this
problem can be reproduced using our latest official updates including
the kernel and X for the OS release installed, contact Red Hat Global
Support Services directly to open an official support ticket.  You can
do this by logging into the Red Hat support website at:

    http://www.redhat.com/support

or by calling Global Support Services directly for phone support, at
1-888-RED-HAT1