Bug 203324 - Kernel 2.6.17-1.2174_FC5 crashes hard with radeon DRI when exiting X
Summary: Kernel 2.6.17-1.2174_FC5 crashes hard with radeon DRI when exiting X
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 5
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Brian Brock
URL:
Whiteboard: bzcl34nup
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-08-21 04:49 UTC by Darryl Bond
Modified: 2008-05-06 16:15 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-06 16:15:02 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
oops trace for crashed kernel when exiting X with dri enabled (11.76 KB, application/octet-stream)
2006-11-05 04:17 UTC, Darryl Bond
no flags Details
Output of dmidecode (16.82 KB, text/plain)
2006-11-15 09:44 UTC, Darryl Bond
no flags Details
Output of dmesg (22.47 KB, text/plain)
2006-11-15 09:47 UTC, Darryl Bond
no flags Details

Description Darryl Bond 2006-08-21 04:49:41 UTC
Description of problem:
The kernel crashes hard when logging out. It is not possible to do a clean
reboot if Xorg has started (ie Run Level 5), or to kill the Xserver.
The Oops message alludes to a problem in the radeon module. The oops message
also mentions that the kernel is Tainted even though the ATI proprietary driver
is not being used, the machine has a standard FC5 with a yum update only.

The problem has only seemed to start with the latest kernel/ati Xorg driver.

The crash is very hard and the only thing that can be done is to reset it. The
oops message cannot be read if you do a normal log out from Gnome as the monitor
goes into power off mode instantly. If you go to a text console while the
Xserver is running and reboot the box the oops message is displayed.

I have done a few tests.
1. Machine reboots normally if X is not started.
2. Machine reboots normally if the   xorg.conf   'Load  "dri"' line is removed
3. Machine runs normally until Xorg exits.

Version-Release number of selected component (if applicable):
xorg-x11-drv-ati-6.5.8.0-1
Kernel 2.6.17-1.2174_FC5

How reproducible: Very


Steps to Reproduce:
1. Boot to Run Level 5
2. Log into X and log out or 
3. Ctrl/Alt/Backspace to exit the X server or
4. Reboot from a text console with the Xorg still running
Actual results:
Kernel crashes instantly with errors around the radeon driver
  

Expected results:
Normal logout or reboot

Additional info:

Comment 1 Hans de Goede 2006-08-23 05:09:20 UTC
You say the proprietary driver is not being used, but has it been installed?
That  alone might cause problems.


Comment 2 Hans de Goede 2006-08-23 05:22:08 UTC
I just tried this on my amd (32 bit) system witha  radeon 9250 and it works fine.


Comment 3 Darryl Bond 2006-08-23 06:55:26 UTC
I am absolutely certain that there was no proprietary driver installed. I did a 
fresh install, (not an upgrade). The problem was not being exibited after the 
default install. I simply did a yum update of the box. There were no extra 
repos set up. or any other software installed. 
As soon as the upgrade completed and the box rebooted, the crashes started 
again. BTW it was crashing before the install. I did the install just in case I 
had installed something to taint it.

The only change that I can recall after installing was to configure my 
xorg.conf for my widescreen LCD. I don't think that this should affect it.

Comment 4 Hans de Goede 2006-08-23 07:07:17 UTC
Ok, so then it most probably is a real bug :|

For starters it would tremendously help if you could post (write it down if
nescesarry) the oops which is given before crashing.

Also this could be a hardware problem (even if it works fine in an older kernel,
the new kernel might just rub your hardware the wrong way), you could try some
of the steps in:
http://fedoraproject.org/wiki/HardwareProblems

Especially doing a memory test always is a good idea and if you have an FC-5
install CD handy costs very little human time (but lots of PC time).




Comment 5 Darryl Bond 2006-08-26 03:49:29 UTC
Ok,
I ram memcheck for 15 hours without a fault. I have reset my BIOS to defaults
etc as described in the HardwareProblems page. 

I set up a serial console to capture the panic. Note that the messages are never
the same and they differ in how much gets printed. Sometimes, only the first
module is printed before the machine dies. The crash attached here was brought
on by a Ctrl/Alt/Backspace to exit the XServer. It seemed to print a lot of
information before the machine locked. Note that there were a few Oops before a
full panic. 

Unable to handle kernel paging request at ffff81002ea076f0 RIP:
<ffffffff80312a79>{file_has_perm+58}
PGD 8063 PUD 9063 PMD 800000002ea001e3 BAD
Oops: 0009 [1] SMP
last sysfs file: /class/drm/card0/dev
CPU 1
Modules linked in: radeon drm ipv6 autofs4 hidp l2cap bluetooth sunrpc
ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp
iptable_filter ip_tables x_tables dm_mirror dm_mod video button battery
acpi_memhotplug ac lp parport_pc parport uhci_hcd ehci_hcd floppy snd_via82xx
gameport snd_ac97_codec snd_hda_intel snd_ac97_bus snd_hda_codec snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_bt87x bt878 snd_seq tuner bttv video_buf
ir_common compat_ioctl32 i2c_algo_bit v4l2_common btcx_risc snd_pcm_oss
snd_mixer_oss snd_pcm tveeprom i2c_core videodev snd_timer snd_page_alloc
via_rhine mii snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore serio_raw
ext3 jbd
Pid: 2484, comm: Xorg Tainted: G   M  2.6.17-1.2174_FC5 #1
RIP: 0010:[<ffffffff80312a79>] <ffffffff80312a79>{file_has_perm+58}
RSP: 0018:ffff81002f2ffea8  EFLAGS: 00010246
RAX: ffffffff80561960 RBX: ffff81002ea076d0 RCX: ffff81002ef58d40
RDX: 0000000000000048 RSI: 0000000000000000 RDI: ffff81002f2ffeb8
RBP: ffff81002f2ffeb8 R08: 0000000000000000 R09: 00000000006d2c60
R10: 0000000000000000 R11: 0000000000003246 R12: ffff81003e9bb2c0
R13: ffff81003027afc0 R14: ffff8100322d8e40 R15: 0000000000000001
FS:  00002aaaaaad3360(0000) GS:ffff810037e836c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff81002ea00038 CR3: 000000002f732000 CR4: 00000000000006e0
Process Xorg (pid: 2484, threadinfo ffff81002f2fe000, task ffff81002fb417e0)
Stack: ffff81002fb417e0 ffff81002ef58d40 00007fff559ff100 0000000040206435
       0000000000000006 0000000000000006 0000000000000008 ffffffff80244c47
       ffff81002ef58d40 00007fff559ff100
Call Trace: <ffffffff80244c47>{do_ioctl+92} <ffffffff80231fba>{vfs_ioctl+598}
       <ffffffff8024fc85>{sys_ioctl+66} <ffffffff80262b0e>{system_call+126}

Code: 48 8b 43 20 48 89 44 24 08 e8 59 3d f5 ff c6 44 24 10 01 4c
RIP <ffffffff80312a79>{file_has_perm+58} RSP <ffff81002f2ffea8>
CR2: ffff81002ea076f0
 <3>BUG: sleeping function called from invalid context at include/linux/rwsem.h:43
Unable to handle kernel paging request at ffff81002ea03000 RIP:
<ffffffff802c6695>{free_block+134}
PGD 8063 PUD 9063 PMD 800000002ea001e3 BAD
Oops: 0009 [2] SMP
last sysfs file: /class/drm/card0/dev
CPU 0
Modules linked in: radeon drm ipv6 autofs4 hidp l2cap bluetooth sunrpc
ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp
iptable_filter ip_tables x_tables dm_mirror dm_mod video button battery
acpi_memhotplug ac lp parport_pc parport uhci_hcd ehci_hcd floppy snd_via82xx
gameport snd_ac97_codec snd_hda_intel snd_ac97_bus snd_hda_codec snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_bt87x bt878 snd_seq tuner bttv video_buf
ir_common compat_ioctl32 i2c_algo_bit v4l2_common btcx_risc snd_pcm_oss
snd_mixer_oss snd_pcm tveeprom i2c_core videodev snd_timer snd_page_alloc
via_rhine mii snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore serio_raw
ext3 jbd
Pid: 8, comm: events/0 Tainted: G   M  2.6.17-1.2174_FC5 #1
RIP: 0010:[<ffffffff802c6695>] <ffffffff802c6695>{free_block+134}
RSP: 0018:ffff810037f49d48  EFLAGS: 00010046
RAX: ffff81002ea03000 RBX: ffff810032beb000 RCX: 000000000000000e
RDX: ffff81003ef85c40 RSI: ffff81003fc60ac0 RDI: 000000000000000e
RBP: ffff81003ef85c40 R08: 0000000000000000 R09: ffff810037f49e18
R10: ffffffff80630078 R11: 0000000000000002 R12: ffff810032bebb58
R13: ffff81003ef85c80 R14: ffff81003ef900c0 R15: ffff81003ef8b448
FS:  0000000000000000(0000) GS:ffffffff8069d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffff81002ea00018 CR3: 0000000033af8000 CR4: 00000000000006e0
Process events/0 (pid: 8, threadinfo ffff810037f48000, task ffff810037fef7a0)
Stack: 0000000000000000 0000000000000100 0000000400000018 ffff81003ef8b428
       0000000000000018 ffff81003ef8b400 ffff81003ef85c80 0000000000000000
       ffff81003ef900c0 ffffffff802c689f
Call Trace: <ffffffff802c689f>{drain_array+139} <ffffffff802c8071>{cache_reap+194}
       <ffffffff802c7faf>{cache_reap+0} <ffffffff80250f56>{run_workqueue+159}
       <ffffffff8024d6de>{worker_thread+0} <ffffffff8024d7ce>{worker_thread+240}
      <ffffffff8028b81b>{default_wake_function+0} <ffffffff80234e36>{kthread+246}
       <ffffffff80263b9e>{child_rip+8} <ffffffff80234d40>{kthread+0}
       <ffffffff80263b96>{child_rip+0}

Code: 48 8b 10 48 39 da 74 1b 48 89 de 48 c7 c7 4f 9f 47 80 31 c0
RIP <ffffffff802c6695>{free_block+134} RSP <ffff810037f49d48>
CR2: ffff81002ea03000
 BUG: events/0/8, lock held at task exit time!
 [ffffffff80551a80] {cache_chain_mutex}
.. held by:          events/0:    8 [ffff810037fef7a0, 110]
... acquired at:               cache_reap+0x26/0x2fd
Unable to handle kernel paging request at ffff81002ebcecb0 RIP:
<ffffffff8041f9bd>{rt_check_expire+175}
PGD 8063 PUD 9063 PMD 800000002ea001e3 BAD
Oops: 0009 [3] SMP
last sysfs file: /class/drm/card0/dev
CPU 0
Modules linked in: radeon drm ipv6 autofs4 hidp l2cap bluetooth sunrpc
ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp
iptable_filter ip_tables x_tables dm_mirror dm_mod video button battery
acpi_memhotplug ac lp parport_pc parport uhci_hcd ehci_hcd floppy snd_via82xx
gameport snd_ac97_codec snd_hda_intel snd_ac97_bus snd_hda_codec snd_seq_dummy
snd_seq_oss snd_seq_midi_event snd_bt87x bt878 snd_seq tuner bttv video_buf
ir_common compat_ioctl32 i2c_algo_bit v4l2_common btcx_risc snd_pcm_oss
snd_mixer_oss snd_pcm tveeprom i2c_core videodev snd_timer snd_page_alloc
via_rhine mii snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore serio_raw
ext3 jbd
Pid: 0, comm: swapper Tainted: G   M  2.6.17-1.2174_FC5 #1
RIP: 0010:[<ffffffff8041f9bd>] <ffffffff8041f9bd>{rt_check_expire+175}
RSP: 0018:ffffffff80602ee8  EFLAGS: 00010282
RAX: ffffffff80542dc0 RBX: ffff81002ebcec80 RCX: 0000000000000001
RDX: 00000000ffffffff RSI: 00000000000124f8 RDI: ffff81003ee11af0
RBP: ffff81003eec5e50 R08: ffffffff80654db0 R09: 0000000000000002
R10: ffffffff80630078 R11: 0000000000000001 R12: 00000000000124f8
R13: 0000000000011af0 R14: 0000000000000dd0 R15: 0000000000000bca
FS:  0000000000000000(0000) GS:ffffffff8069d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: ffff81002ea00e70 CR3: 0000000033af8000 CR4: 00000000000006e0
Process swapper (pid: 0, threadinfo ffffffff806c8000, task ffffffff80542dc0)
Stack: 00000000ffff4203 0000000000000000 ffffffff80654d80 0000000000000100
       ffffffff8041f90e 0000000000000000 0000000000000000 ffffffff802967b9
       ffffffff80602f28 ffffffff80602f28
Call Trace: <IRQ> <ffffffff8041f90e>{rt_check_expire+0}
       <ffffffff802967b9>{run_timer_softirq+342} <ffffffff80211a20>{__do_softirq+85}
       <ffffffff80263eee>{call_softirq+30} <ffffffff80271239>{do_softirq+44}
       <ffffffff8025b98b>{mwait_idle+0}
<ffffffff8026384b>{apic_timer_interrupt+135} <EOI>
       <ffffffff8025b98b>{mwait_idle+0} <ffffffff802678f6>{thread_return+0}
       <ffffffff8025b9c1>{mwait_idle+54} <ffffffff8024c90d>{cpu_idle+151}
       <ffffffff806cb817>{start_kernel+502} <ffffffff806cb28a>{_sinittext+650}

Code: 48 8b 43 30 48 85 c0 74 08 48 3b 04 24 78 20 eb 16 48 63 15
RIP <ffffffff8041f9bd>{rt_check_expire+175} RSP <ffffffff80602ee8>
CR2: ffff81002ebcecb0
 <3>BUG: sleeping function called from invalid context at include/linux/rwsem.h:43
in_atomic():1, irqs_disabled():1

Call Trace: <IRQ> <ffffffff802998f5>{blocking_notifier_call_chain+31}
       <ffffffff80215c67>{do_exit+32} <ffffffff8026c048>{do_page_fault+1895}
       <ffffffff8028a1eb>{__wake_up_common+62} <ffffffff8028c423>{complete+56}
       <ffffffff802639e5>{error_exit+0} <ffffffff8041f9bd>{rt_check_expire+175}
       <ffffffff8041f9bb>{rt_check_expire+173} <ffffffff8041f90e>{rt_check_expire+0}
       <ffffffff802967b9>{run_timer_softirq+342} <ffffffff80211a20>{__do_softirq+85}
       <ffffffff80263eee>{call_softirq+30} <ffffffff80271239>{do_softirq+44}
       <ffffffff8025b98b>{mwait_idle+0}
<ffffffff8026384b>{apic_timer_interrupt+135} <EOI>
       <ffffffff8025b98b>{mwait_idle+0} <ffffffff802678f6>{thread_return+0}
       <ffffffff8025b9c1>{mwait_idle+54} <ffffffff8024c90d>{cpu_idle+151}
       <ffffffff806cb817>{start_kernel+502} <ffffffff806cb28a>{_sinittext+650}
Kernel panic - not syncing: Aiee, killing interrupt handler!
 in_atomic():0, irqs_disabled():1

Call Trace: <ffffffff802998f5>{blocking_notifier_call_chain+31}
       <ffffffff80215c67>{do_exit+32} <ffffffff8026c048>{do_page_fault+1895}
       <ffffffff80278c14>{do_flush_tlb_all+0}
<ffffffff80278e20>{smp_call_function+62}
       <ffffffff802639e5>{error_exit+0} <ffffffff80312a79>{file_has_perm+58}
       <ffffffff80244c47>{do_ioctl+92} <ffffffff80231fba>{vfs_ioctl+598}
       <ffffffff8024fc85>{sys_ioctl+66} <ffffffff80262b0e>{system_call+126}


Comment 6 Darryl Bond 2006-09-23 10:28:27 UTC
Just did a yum update to 2.6.17-1.2187_FC5. Same fault.

Comment 7 Dave Jones 2006-10-17 00:09:57 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 8 Darryl Bond 2006-10-21 04:49:27 UTC
Upgraded to 2.6.18-1.2200.fc5, same fault once dri is enabled.
I also tried downgrading to the original xorg-x11-drv-ati-6.5.7.3-4.x86_64.rpm
which was also upgraded about the time when the fault began. This did not fix it
either.

The box has run perfectly with the dri option disabled in xorg.conf.



Comment 9 Darryl Bond 2006-10-28 07:35:46 UTC
Upgraded box to FC6. Fault remains.
Pity, because compiz is so pretty.

Comment 10 Darryl Bond 2006-11-05 04:17:59 UTC
Created attachment 140378 [details]
oops trace for crashed kernel when exiting X with dri enabled

Comment 11 Darryl Bond 2006-11-05 04:21:20 UTC
I did some more checking with the serial terminal attached. I downloaded and
built 2.6.18.1 kernel from kernel.org with the same .config as FC6. (Ran make
oldconfig first)
The upstream kernel consistently oopsed with the same RIP (__change_page_attr).
The FC6 default kernel also did regularly oops at the same place but it also
sometimes oopsed elsewhere. Attached is the oops trace for the Fedora core 6.

Note that the trace was generated in single user mode after running the command X.


Comment 12 Darryl Bond 2006-11-14 09:51:36 UTC
I conclude that I have some obscure hardware problem.

I tried a few more things.
1. Installed FC6 from scratch -Same problem is exhibited
2. Using the xen kernel which does not allow AGP and therefore the DRM/ radeon
module is not loaded. -Machine is reliable albeit with slow graphics.
3. Loaded FC6 x86_64 on an Athlon64 system with an ATI 9200.-Works perfectly.
4. Put the ATI 9250 from the problem machine into the Athlon- Also works perfectly.
5. Upgraded to the new kernel 2.6.18-1.2849.fc6 - Does not help.
6. Replaced the memory - no change. 
7. Ran memcheck again -no errors

Some machine details:
* ASUS P5VDC-MX with onboard VIA UniChrome Pro IGP (rev 01). The onboard video
is turned off by the installation of the ATI card.
* Intel 805D (Dual core 2.6GHz) CPU  
* 1GB DDR1 PC3200 400mhz RAM (Motherboard supports DDR2 but I don't have any)

Does that help at all?




Comment 13 Dan Carpenter 2006-11-14 10:10:22 UTC
> Modules linked in: radeon 

The Radeon driver is proprietary.  This is clearly a Radeon bug as well based on
the stack trace.

There really isn't anything anyone can do.  :/



Comment 14 Hans de Goede 2006-11-14 12:01:17 UTC
(In reply to comment #13)
> > Modules linked in: radeon 
> 
> The Radeon driver is proprietary.  This is clearly a Radeon bug as well based on
> the stack trace.
> 
> There really isn't anything anyone can do.  :/
> 

<sigh> He is using an 9250, which is an r200 which has excellent opensource
drivers, also he is using the opensource drivers, the opensource module is
called radeon, the closed one fglrx!

Next time please do not comment unless you actually know what you're talking about.



Comment 15 Dan Carpenter 2006-11-14 17:21:58 UTC
Blast, sorry...  I appologize for that.  You're right the G taint means GPL.

Pid: 1688, comm: X Tainted: G   M  2.6.18-1.2798.fc6 #1
But it does have the 'M' taint which means there was an MCE.



Comment 16 Darryl Bond 2006-11-14 23:41:51 UTC
I looked up MCE taint:

# M: A Machine Check Exception (MCE) has been raised while the kernel was
running. MCEs are triggered by the hardware to indicate a hardware related
problem, for example the CPU's temperature exceeding a treshold or a memory bank
signaling an uncorrectable error.

How can I find out what raised the MCE. It is strange that it only exhibits when
I exit X when the radeon module is loaded. Of course, it may have happened
earlier. As I described earlier, I have replaced everything except the M/B and CPU.

Comment 17 Dan Carpenter 2006-11-15 02:35:19 UTC
Post your entire dmesg.  To find out what's causing MCE run "mcelog | tee
saved.txt" and upload saved.txt.

I've seen a couple bios bugs that caused reproduceable MCEs, so it's not
impossible.  Heck, post the output from "dmidecode" as well.

Comment 18 Darryl Bond 2006-11-15 09:44:03 UTC
Created attachment 141242 [details]
Output of dmidecode

Here is the output of dmidecode

Comment 19 Darryl Bond 2006-11-15 09:47:51 UTC
Created attachment 141244 [details]
Output of dmesg

Output of mcelog
Memory error?
 

MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 0 MISC 214300005c0010e ADDR cc000000 
MCG status:EIPV 
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA:BUS Generic Observed-error-as-third-party Generic Memory-access
Request-timeout Error
Model:Response hard fail

STATUS be00030010020c03 MCGSTATUS 2

Comment 20 Dan Carpenter 2006-11-15 10:11:39 UTC
It says you only have one stick of RAM and it's in bank3.  Probably you want it
in bank0.
----
	Size: 1024 MB
	Form Factor: DIMM
	Set: None
	Locator: DIMM3
	Bank Locator: BANK3
-----

Try that first.

> ADDR cc000000 

Normally you can tell based on the address what DIMM is bad but that's not a
normal normal address.  Plus you already replaced the RAM.  I think it's not bad
RAM.



Comment 21 Darryl Bond 2006-11-15 10:22:16 UTC
Bank 0 for this M/B is for DDR2. I don't (yet) have DDR2
Bank 3 is the first of the DDR1 banks.

Comment 22 Darryl Bond 2006-11-15 10:37:59 UTC
Aagh!!
My mistake,Bank 3 is the last DDR1 bank. I swapped it to the first (Bank 2) and
tried again.
mcelog produced no output, I though we might be getting somewhere. I logged out
of X and it paniced again.

I reset and rebooted. The mcelog now looks like this:
MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 0 MCG status:EIPV 
MCi status:
Uncorrected error
Processor context corrupt
MCA:BUS Generic Originated-request Generic Other-transaction Request-timeout Error
Model:Pad address glitch

STATUS a20000001080080f MCGSTATUS 2
MCE 1
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 1 BANK 0 MCG status:EIPV 
MCi status:
Uncorrected error
Processor context corrupt
MCA:BUS Generic Originated-request Generic Other-transaction Request-timeout Error
Model:Pad address glitch

STATUS a20000001080080f MCGSTATUS 2


Comment 23 Dan Carpenter 2006-11-15 11:18:02 UTC
You're using the 709 BIOS.  They released the 712 BIOS for that mobo last
week...  It might be worth it to test that.

I'm not really sure...  Sorry.


Comment 24 Darryl Bond 2006-11-16 08:07:49 UTC
Tried a few more things:

Updated to 712 bios, no change.

Borrowed some DDR2 RAM and tried in BANKS 0 & 1 (512mb sticks)
Still crashed, interestingly it didn't crash quite as hard. The syslog
registered the Oops 

Nov 14 19:31:02 gold kernel: Unable to handle kernel paging request at
ffff81002c620000 RIP:
Nov 14 19:31:02 gold kernel:  [<ffffffff80207b42>] unmap_vmas+0x35f/0x791
Nov 14 19:31:02 gold kernel: PGD 8063 PUD 9063 PMD 800000002c6001e3 BAD
Nov 14 19:31:02 gold kernel: Oops: 0009 [1] SMP

When using DDR1 the machine locks solid as soon as the oops is printed to the
console.

Of note, the mcelog was empty until the Oops. After the reset,the mcelog had
similar contents to that above.

Should I be woking with ASUS to fix the problem, rather than here?


Comment 25 Dan Carpenter 2006-11-16 08:32:30 UTC
Yeah.  I'm afraid it kinda looks that way.  :/

They may have seen something like that before and have a work around.



Comment 26 Darryl Bond 2006-11-23 09:56:28 UTC
I wonder?
http://airlied.livejournal.com/36055.html

Dave Airlie: days away + problems with radeon 9200 on 64-bit..

We've had a fair few reports on Radeon 9200 on 64-bit with AGP being a bit
unstable, Linus just committed an AGP change to the kernel for some i965 issues
with allocating AGP pages from the DMA32 pool, I'd be interested if someone can
run that latest Linux kernel from git, with a known unstable 9200 system and see
if this fixes it or helps at all...



Comment 27 Dave Jones 2006-11-24 21:32:17 UTC
irrelevant unless you a) have >=4GB of ram, and b) an i965 motherboard chipset.


Comment 28 Darryl Bond 2006-11-25 07:49:07 UTC
WooHoo, 2.6.19-rc6-git8 fixed it!
I have working compiz and the box doesn't oops on logout.

I still have this though

MCE 0
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 0 BANK 0 MISC 4000 ADDR cc000000 
MCG status:EIPV 
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA:BUS Generic Observed-error-as-third-party Generic Memory-access
Request-timeout Error
Model:Response hard fail

STATUS be00020010020c03 MCGSTATUS 2


Comment 29 Darryl Bond 2006-11-26 05:22:01 UTC
davej,
Took on board your comment about >4GB etc. and assumed that it was fixed
elsewhere. I tested rc6 which was also Ok and rc1 which as broken.

rc6-git8 - Good
rc6 - Good
rc1 - Bad
rc5 - Good
rc4 - good
rc3 - Good
rc2 - Bad
Ended up trying all of them I found that it was fixed in rc3. Ah well.
I have a fix until the Fedora 2.6.19 kernel is available.



Comment 30 Bug Zapper 2008-04-04 03:33:49 UTC
Fedora apologizes that these issues have not been resolved yet. We're
sorry it's taken so long for your bug to be properly triaged and acted
on. We appreciate the time you took to report this issue and want to
make sure no important bugs slip through the cracks.

If you're currently running a version of Fedora Core between 1 and 6,
please note that Fedora no longer maintains these releases. We strongly
encourage you to upgrade to a current Fedora release. In order to
refocus our efforts as a project we are flagging all of the open bugs
for releases which are no longer maintained and closing them.
http://fedoraproject.org/wiki/LifeCycle/EOL

If this bug is still open against Fedora Core 1 through 6, thirty days
from now, it will be closed 'WONTFIX'. If you can reporduce this bug in
the latest Fedora version, please change to the respective version. If
you are unable to do this, please add a comment to this bug requesting
the change.

Thanks for your help, and we apologize again that we haven't handled
these issues to this point.

The process we are following is outlined here:
http://fedoraproject.org/wiki/BugZappers/F9CleanUp

We will be following the process here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this
doesn't happen again.

And if you'd like to join the bug triage team to help make things
better, check out http://fedoraproject.org/wiki/BugZappers

Comment 31 Bug Zapper 2008-05-06 16:15:00 UTC
This bug is open for a Fedora version that is no longer maintained and
will not be fixed by Fedora. Therefore we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen thus bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.