Bug 143761

Summary: X frozen or hung; hard reboot required
Product: [Fedora] Fedora Reporter: Vladimir Ivanovic <vladimir>
Component: xorg-x11Assignee: X/OpenGL Maintenance List <xgl-maint>
Status: CLOSED RAWHIDE QA Contact:
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: gczarcinski
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-04-11 02:54:01 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 136451    
Attachments:
Description Flags
(desktop) Xorg log file of session that hung.
none
(desktop) Xorg configuration file
none
(laptop) Xorg log file of session that hung.
none
(laptop) Xorg configuration file none

Description Vladimir Ivanovic 2004-12-27 07:01:43 UTC
Description of problem: X freezes (requires reboot) after an
indeterminate amount of time. Keyboard is unresponsive;
ctrl-alt-delete doesn't work; ssh/rlogin doesn't work


Version-Release number of selected component (if applicable):
xorg-x11-6.8.1-12.FC3.21 but also with previos releases

How reproducible:
Always, it seems.

Steps to Reproduce:
1. Start X.
2. Wait.
3. System will freeze eventually.

This bug occurs both with a dual-processor Pentium III-based
motherboard + ATI Radeaon AGP, and with a single CPU x86 + ATI Radeon
M9 (laptop).
  
Actual results: System (keyboard, LCD display) frozen.


Expected results: Normal processing.


Additional info:
If someone could give me a cookbook approach, I can try to set up X
session so that it provides usable debugging information

Comment 1 Sitsofe Wheeler 2004-12-27 10:20:34 UTC
If you have a network card could you see whether you can still ssh into the machine when it is frozen? (If you can't that suggests a kernel bug)

Could you also *attach* your /var/log/Xorg.log and /etc/X11/xorg.conf files?

Comment 2 Vladimir Ivanovic 2004-12-27 18:56:17 UTC
Created attachment 109140 [details]
(desktop) Xorg log file of session that hung.

The only significant difference between Xorg.0.log and Xorg.0.log.old is:

(II) Mouse0: ps2EnableDataReporting: succeeded
(WW) Open APM failed (/dev/apm_bios) (No such file or directory)
(II) RADEON(0): [RESUME] Attempting to re-init Radeon hardware.
(II) RADEON(0): [agp] Mode 0x1f000201 [AGP 0x1106/0x0691; Card 0x1002/0x5144]
(II) DevInputMice: ps2EnableDataReporting: succeeded
(II) Mouse0: ps2EnableDataReporting: succeeded
(WW) Open APM failed (/dev/apm_bios) (No such file or directory)
(II) RADEON(0): [RESUME] Attempting to re-init Radeon hardware.
(II) RADEON(0): [agp] Mode 0x1f000201 [AGP 0x1106/0x0691; Card 0x1002/0x5144]
(II) DevInputMice: ps2EnableDataReporting: succeeded
(II) Mouse0: ps2EnableDataReporting: succeeded
(II) RADEON(0): [drm] removed 1 reserved context for kernel
(II) RADEON(0): [drm] unmapping 8192 bytes of SAREA 0xf8b1b000 at 0xb3f26000

It appears at the very end of the log file.

Comment 3 Vladimir Ivanovic 2004-12-27 18:57:18 UTC
Created attachment 109141 [details]
(desktop) Xorg configuration file

Comment 4 Vladimir Ivanovic 2004-12-27 19:04:30 UTC
Created attachment 109142 [details]
(laptop) Xorg log file of session that hung.

Note that this invocation did not use DRI. (My attempt at changing things to
see if they would help.)

Comment 5 Vladimir Ivanovic 2004-12-27 19:05:59 UTC
Created attachment 109143 [details]
(laptop) Xorg configuration file

Comment 6 Mike A. Harris 2005-02-01 01:57:48 UTC
Please upgrade to the latest FC3 updates, including the kernel, and
make sure you are rebooted into the most recent Fedora Core kernel.

If the problem still exists, upgrade to xorg-x11 from rawhide
(currently 6.8.1.903-2), and try to reproduce it.  If the problem
still persists, indicate that in a status update.

If you are using ACPI, please disable it, and see if that prevents
the problem also.

Comment 7 Mike A. Harris 2005-02-01 01:59:04 UTC
Setting status to "NEEDINFO", awaiting status update from reporter
after testing recommendations in comment #6.

Comment 9 Vladimir Ivanovic 2005-02-04 02:53:52 UTC
I had been running with MPS 1.4 enabled in the BIOS, and when I fell
back to MPS 1.1, I noticed a big improvement. Instead of having to
reboot twice (or more) a day, I was able to go for 3 days without
needing a reboot. 

Regardless, I have installed and I am running 2.6.10-1.1124_FC4smp,
but without 6.8.1.903-2. (I am at 6.8.1-12.FC3.21.) If I have to
reboot at all, then I will install the latest Rawhide Xorg-X11 RPMs
and try again.

Upon re-reading comment #6, I see that I need not have install the
latest Rawhide kernel. I was already running the latest FC3 kernels as
I use the updates and updates-testing whateveryoucallthem with up2date
and yum and I run the nightly yum service, so I'm always up to date. 

So, to summarize a long winded comment, if I hang, I'll install the
latest xorg-x11 RPMS and try again. If that doesn't help, I'll try
disabling completely ACPI ("acpi=off" boot option; service acpid off).
 
If I successfully last a week without needing a reboot, I will
reenable MPS 1.4 in my BIOS and see how that works.

Comment 10 Mike A. Harris 2005-02-08 01:18:24 UTC
Ok, thanks for the update.  I'm resolving the bug as fixed in
"RAWHIDE", however if you still experience the problem with
xorg-x11 from rawhide, feel free to update the bug report with
the latest testing status, etc. and reopen the report and we'll
review the issue again.

Thanks.

Comment 11 Vladimir Ivanovic 2005-02-08 04:53:46 UTC
Unfortunately, the bug still exists and my system hangs, even with
6.8.1.904 from Rawhide, but it seems to take a while to manifest
itself. I now only seem to hang overnight (after several hours of X
inactivity).

Also unfortunately, I cannot use a system that is booted with
"acpi=off" because I then have no networking. (Don't ask me why; all I
know is that I get a "SIOCSIFFLAGS: Device or resource busy" error
message whenever I try to do a "ifup eth0" with a "acpi=off" boot
parameter.

I do get (regularly) two error messages that seem benign:

I get frequent "APIC error on CPU{0 or 1}: 40(40)" messages. This is
fairly recent, say Fedora Core 3++. I have never found any answer to
how to fix this by googling around, or by posting to newsgroups,
although I have seen opinions that indicate that it's harmless.

I also regularly get SCSI error messages like:

   kernel: sym0:6:0: ABORT operation started.
   kernel: sym0:6:0: ABORT operation timed-out.
   kernel: sym0:6:0: DEVICE RESET operation started.
   kernel: sym0:6:0: DEVICE RESET operation timed-out.
   kernel: sym0:6:0: BUS RESET operation started.
   kernel: sym0: SCSI BUS reset detected.
   kernel: sym0: SCSI BUS has been reset.
   kernel: sym0:6:0: BUS RESET operation complete.

and likewise, no amount of googling nor newgroup or email postings
have given me any help in figuring out what the problem is. These SCSI
errors pre-date Fedora Core 3.

If there's something I can do to help debug this... I may not be able
to get to it during the week, but I surely can on the weekend if I'm
not away.

Comment 12 Mike A. Harris 2005-03-22 18:53:02 UTC
This sounds more and more like a hardware problem to me.  Either a hardware
bug (such as an IO-APIC bug), broken BIOS, or an issue specific to this
exact motherboard or motherboard/video card combination, or something else perhaps.

I'd recommend reporting the issue in X.Org bugzilla to maximize the number
of developers aware of the issue, as someone else may have additional ideas.

http://bugs.freedesktop.org in the "xorg" component.

Once you've reported your bug in X.Org bugzilla, paste the URL here, and
we will track the issue there as well.

The SCSI bus reset issues indicate a problem with your SCSI devices or bus,
unrelated to X.  This may indicate disk failure or some other hardware
problem perhaps.



Comment 13 Gene Czarcinski 2005-03-25 15:47:58 UTC
OK, I am still testing to make sure that the problem occurs.

I have updated to 6.8.2-1.FC3.10test and commented out loading "dri" in
/etc/X11/xorg.conf.  This has been running for a couple of days with no problems
so I have now uncommented /etc/X11/xorg.conf so I am loading "dri".  Now waiting
for something to happen.

Comment 14 Mike A. Harris 2005-04-11 02:54:01 UTC
Please upgrade to the latest Fedora development packages, and if this
problem persists, file a bug report in X.Org bugzilla, located at
http://bugs.freedesktop.org in the "xorg" component.  Attach your
X server log and config file to the report, along with all relevant
details for reproduceability.  If you paste the URL here, we will
track the bug in upstream bugzilla also.

Setting status to "RAWHIDE"