Red Hat Bugzilla – Bug 57509
Frequent Crashes under X, simple lockup to full reboot
Last modified: 2007-04-18 12:38:43 EDT
Description of problem:
Hardware: ASUS A7M266 Motherboard, Athlon processor with AMD 761 Chipset
BIOS upgraded from v1004a to v1005
256 MB RAM
ATI XPERT 2000 (Rage 128 RL) Video card
Software: OS: RH7.2 with current patches, Kernel 2.4.9-13.athlon
Possibly related bugs? : #49586, #56426
This system is very unstable under X. Problems range from simple X crashes
to unannounced warm reboots. The full system lockups/reboots make it
difficult to capture any useful debugging info.
I have been able to consistently lockup X by running some of the mesa
demos. By lockup up, I mean the system will not accept keyboard input and
the display is frozen. I have to ssh in from another system to gracefully
reboot as ctrl-alt-del does not work. Please note that this is NOT limited
to the mesa demos. I used them because:
1) They exercise (what I believe to be) the bug
2) They reproduce the problem on demand
3) They don't wipe out the system to the point that I can't capture any
In addition to info generated by running the mesa demos, attached is
ksymoops output for an X crash that occurred while
trying to submit this bug report.
Version-Release number of selected component (if applicable):
Red Hat Linux 7.2
Steps to Reproduce:
1.Create generic test acct using unmodified configs from /etc/skel
3.Login using generic acct and gnome
4.run /usr/bin/fire or /usr/bin/gears
(The following steps may require logging in from
another system as X may have locked up to the extent
that even switching to a virtual console is not possible.)
5.run ksymoops, capture output for bug report
Actual Results: At the very least, an "Oops" is generated.
Expected Results: System should not generate an "Oops". It should
DEFINITELY NOT bring down the kernel, i.e. warm reboot
The output of ksymoops for three different events is attached.
Argh is the result of running aspell while spell checking this bug report.
(X crashed and dumped me back to the login screen.)
CrashC are the results of running "fire"
CrashD is the result of running "gears"
Full logs, symbol and module lists for each "oops" event are available on
Created attachment 40648 [details]
ksymoops output after running mesa fire
Created attachment 40649 [details]
ksymoops output after running mesa gears
Created attachment 40650 [details]
ksymoops output of unplanned oops while submitting report
Try our latest kernel 2.4.9-21 out.
Also, try using the non-athlon kernel just to see if that fixes it
or not. Try also booting with the option "nopentium" which bypasses
a bug related to Athlon CPU's, 4Mb pages and AGP. Many 3D lockup
problems are believed to be the result of this bug.
a) tried the i686 kernel without success. (Not 2.4.9-21.i686.)
b) tried the test kernel at
a) Clean install with bad block check
b) update to kernel 2.4.9-21
Crashed and trashed the root file system. Had to do a rescue/manual fsck to
get it to a usable state.
a) Clean install
b) update to kernel 2.4.9-21
c) boot with "nopentium" option
Crashed with the now familiar "Unable to handle kernel paging request at
virtual address ..."
For what it's worth:
RAM checks out with two different test programs. The system runs games like
Quake II and u$soft Flight Simulator 2000 just fine under win98 so I don't think
it's broken hardware. (Buggy chip-sets excluded.)
I've seen threads on the linux-kernel mailing list archived at
http://marc.theaimsgroup.com/ that there are problems with the AMD-761 and AGP.
Are there any known issues with the AMD-761 and ATI XPERT 2000, Rage 128 based
Appears to be a R128 DRM problem if I read the oops reports correctly.
Arjan, can you make heads or tails of it? I'm not well versed when
it comes to debugging kernel oops. ;o)
Yes, there are problems reported with some AMD chipsets and some ATI video
cards, however that particular combination you mention, I'm not aware of.
This is something the kernel guys can answer better I believe. Stephen?
AMD-761 and Radeon has been a known bad combination, but I can't recall seeing
similar reports of problems with the r128.
The first of the r128 oopses is *really* weird: the kernel is trying to execute
code with %EIP in the middle of an assembler instruction. No wonder it oopses
--- it's essentially trying to execute garbage.
The second looks semi-sane --- there's an oops accessing memory at %d08b1da3,
which is _just_ above the 256MB boundary so which could actually be physical
memory if the e820 map is doing weird stuff; it could also be AGP memory,
although the CPU should probably not be trying to access that. What do the
"BIOS-e820" entries in your kernel boot log look like?
The other three oopses are just random memory corruption most likely triggered
by the initial corruption.
Is the system stable under Linux when not using the r128 drm (ie. when not
running accelerated 3d apps)?
1) Re: BIOS-e820
The following is a partial capture of the system boot:
Linux version 2.4.9-21 (firstname.lastname@example.org) (gcc version 2.96
20000731 (Red Hat Linux 7.1 2.96-98)) #1 Thu Jan 17 13:35:37 EST 2002
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 000000000ffec000 (usable)
BIOS-e820: 000000000ffec000 - 000000000ffef000 (ACPI data)
BIOS-e820: 000000000ffef000 - 000000000ffff000 (reserved)
BIOS-e820: 000000000ffff000 - 0000000010000000 (ACPI NVS)
BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
On node 0 totalpages: 65516
zone(0): 4096 pages.
zone(1): 61420 pages.
zone(2): 0 pages.
Found and enabled local APIC!
Kernel command line: auto BOOT_IMAGE=sconsole ro root=305
BOOT_FILE=/boot/vmlinuz-2.4.9-21 hdd=ide-scsi mem=nopentium console=ttyS0
Feb 11 16:30:42 sam kernel: Symbols match kernel version 2.4.9.
Feb 11 16:30:42 sam kernel: Loaded 222 symbols from 7 modules.
Feb 11 16:30:42 sam kernel: Linux version 2.4.9-21
(email@example.com) (gcc version 2.96 20000731 (Red Hat Linux
7.1 2.96-98)) #1 Thu Jan 17 13:35:37 EST 2002
Feb 11 16:30:42 sam kernel: BIOS-provided physical RAM map:
Feb 11 16:30:42 sam kernel: BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
Feb 11 16:30:42 sam kernel: BIOS-e820: 000000000009fc00 - 00000000000a0000
Feb 11 16:30:42 sam kernel: BIOS-e820: 00000000000f0000 - 0000000000100000
Feb 11 16:30:42 sam kernel: BIOS-e820: 0000000000100000 - 000000000ffec000 (usable)
Feb 11 16:30:42 sam kernel: BIOS-e820: 000000000ffec000 - 000000000ffef000
Feb 11 16:30:42 sam kernel: BIOS-e820: 000000000ffef000 - 000000000ffff000
Feb 11 16:30:42 sam kernel: BIOS-e820: 000000000ffff000 - 0000000010000000
Feb 11 16:30:42 sam kernel: BIOS-e820: 00000000fec00000 - 00000000fec01000
Feb 11 16:30:42 sam kernel: BIOS-e820: 00000000fee00000 - 00000000fee01000
Feb 11 16:30:42 sam kernel: BIOS-e820: 00000000ffff0000 - 0000000100000000
Feb 11 16:30:42 sam kernel: On node 0 totalpages: 65516
Feb 11 16:30:42 sam kernel: zone(0): 4096 pages.
Feb 11 16:30:42 sam kernel: zone(1): 61420 pages.
Feb 11 16:30:42 sam kernel: zone(2): 0 pages.
Feb 11 16:30:42 sam kernel: Found and enabled local APIC!
Feb 11 16:30:42 sam kernel: Kernel command line: auto BOOT_IMAGE=sconsole ro
root=305 BOOT_FILE=/boot/vmlinuz-2.4.9-21 hdd=ide-scsi mem=nopentium console=ttyS0
Feb 11 16:30:42 sam kernel: ide_setup: hdd=ide-scsi
2) Re: stability when not using r128 drm
The reason I used gears and fire to illustrate the problem is they were fairly
consistent at causing a crash in a reasonable amount of time. The system has
crashed while doing something as simple as moving a KDE "Konsole" window or
running top. However, I can't say for sure that something 3D related was not
run between system boot and execution of the command that triggered the crash.
I would be happy to set up any specific tests you would like. The system is
useless for any real work in it's current state so it's not a problem for me to
rebuild it, try new kernels etc.
It would be useful to know if (a) you can reproduce the problem without running
X at all; or (b) whether simple cpu-intensive tasks (such as rebuilding a
kernel) can provoke the problems.
However, at this point it really looks like hardware. It may be a peculiarity of
the way Linux is driving the hardware, or it may be hardware problems for which
Windows has a workaround --- it's really impossible to tell right now.
The system does NOT appear to have any problems with strictly CPU intensive
tasks. I can build test kernels etc without problems, provided I do it from one
of the vtty's. I am very open to suggestions as to how to stress test the
hardware without X, especially the AGP and video hardware. (It feels like
something on the video or AGP side is trashing sections of memory.) Is there a
benchmarking or acceptance suite anyone would care to suggest?
I tend to agree with the last comment suggesting that the root of the problem is
hardware related. i.e. "works as designed but the hardware design is flawed".
For what it's worth all the major components are on AMD's approved list.
Does anyone else out there have this hardware combo: Asus A7M266 Main board with
AMD 761 Chipset, ATI XPERT 2000 AGP, Rage 128 based video? If so, are you
experiencing similar problems?
For what it's worth:
Re: Is the system stable under Linux when not using the r128 drm (ie. when not
running accelerated 3d apps)?
I swapped out the Xpert2000 Rage 128 AGP card for an Xpert98 Rage Pro PCI. With
the AGP card the system would die almost immediately after starting gears. With
the PCI card it would ran for several hours but eventually failed with the now
familiar " Unable to handle kernel paging request ..."
One thing I do find interesting is lspci reports AMD-760 but this is an AMD-761
system. This is true even with kernel v2.4.17 which does have AMD-761 AGP support.
00:00.0 Host bridge: Advanced Micro Devices [AMD] AMD-760 [Irongate] System
Controller (rev 13)
00:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-760 [Irongate] AGP Bridge
Finally, I don't plan on spending much more time on this issue. If anyone has
any specific test they would like run, please let me know. Otherwise, unless
there are others with this problem, you all may want to drop it as well.
I believe this issue is just bad hardware or bad hardware combination.
Also, the error: Unable to handle kernel paging request
is a kernel crash, not XFree86. If you determine any more info that you
think might be helpful however, please add it to the report.