Bug 207254

Summary: Bug message on boot, freeze or crash on load
Product: [Fedora] Fedora Reporter: Daniel Tschan <tschan+redhat.com>
Component: kernel-xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED WONTFIX QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: medium    
Version: 6CC: bstein, mattdm
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-02-26 23:22:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 206757    
Bug Blocks:    
Attachments:
Description Flags
Complete kernel and lspci -vv output
none
5 additional kernel backtraces none

Description Daniel Tschan 2006-09-20 08:12:20 UTC
Description of problem:
kernel-xen-2.6.17-1.2647.fc6 issues the following bug report when booting:
BUG: warning at kernel/lockdep.c:1814/trace_hardirqs_on() (Not tainted)
 [<c0405666>] show_trace_log_lvl+0x58/0x177
 [<c0405c6b>] show_trace+0xd/0x10
 [<c0405ca9>] dump_stack+0x19/0x1b
 [<c0436442>] trace_hardirqs_on+0xa4/0x120
 [<c0404e5f>] restore_all+0x37/0x3a
DWARF2 unwinder stuck at restore_all+0x37/0x3a
Leftover inexact backtrace:
Inexact backtrace:
 [<c0405c6b>] show_trace+0xd/0x10
 [<c0405ca9>] dump_stack+0x19/0x1b
 [<c0436442>] trace_hardirqs_on+0xa4/0x120
 [<c0404e5f>] restore_all+0x37/0x3a


And either freezes or crashes when the system is loaded:
BUG: unable to handle kernel paging request at virtual address 6f6c6700
 printing eip:
c048321c
294d9000 -> *pde = 00000000:c674d001
2714d000 -> *pme = 00000000:00000000
Oops: 0000 [#1]
SMP 
last sysfs file: /devices/system/cpu/cpu1/cpufreq/scaling_setspeed

Please see attachment for complete backtrace!

Configuration:
Gigabyte GA-965P-DS3 mainboard with Intel P965, ICH8 (not ICH8R), JMicron JMB363
Intel Core 2 Duo E6600
4x 1GB DDR2-675
Software RAID 5 with 3x 250gb SATA2 disks
Gigabyte GV-NX76G256D-RH PCIe graphics card with nVidia GeForce 7600 GS
LG GSA-H10N IDE DVD writer connected to JMB363


Version-Release number of selected component (if applicable):
2.6.17-1.2647.fc6xen

How reproducible:
Always

Steps to Reproduce:
1. Boot kernel-xen-2.6.17-1.2647.fc6
2. Execute rpm -Va to generate CPU and disk load
3.
  
Actual results:
System either freezes completely or crashes

Expected results:
System runs stable on high loads

Additional info:
Attachment with complete kernel and lspci -vv output

Comment 1 Daniel Tschan 2006-09-20 08:12:21 UTC
Created attachment 136715 [details]
Complete kernel and lspci -vv output

Comment 2 Stephen Tweedie 2006-09-20 08:21:25 UTC
Hmm, the oops here is not obviously related to xen.  Does the non-xen
2.6.17-1.2647.fc6 kernel run reliably for you?  Does the crash always look the
same or is the backtrace different each time?

Comment 3 Daniel Tschan 2006-09-20 09:06:37 UTC
2.6.17-1.2647.fc6 didn't show any of these symptoms so far, neither the message
on boot nor a freeze or crash. The PAE kernels however do not boot. See bug
#206757. So the problem may be related to PAE. The backtrace is different each time.


Comment 4 Stephen Tweedie 2006-09-20 10:06:10 UTC
Could you please supply several example backtraces?  Without that it's
impossible to look for any sort of pattern here.  Thanks!

Comment 5 Daniel Tschan 2006-09-20 18:43:59 UTC
Created attachment 136764 [details]
5 additional kernel backtraces

Sure. I attached 5 additional backtraces. During the 5 crashes I observed 2
freezes. But I just remembered now that I might be able to get info out of
these with magic sysrq. Please tell me if that would be useful or if you need
anything else.

Comment 6 Stephen Tweedie 2006-09-20 20:31:38 UTC
Hmm, it definitely does look like it could be related to the PAE problem ---
kernel-xen is built with PAE enabled by default, but it uses highmem (ie. >4GB)
memory in different ways due to the way the hypervisor parcels memory out to the
kernel.

If PAE is not working, then the -xen kernel is unlikely to do any better,
although it may fail in different ways.  We'd really need to get the underlying
PAE problem fixed in order to be able to test the -xen case.

Comment 7 Daniel Tschan 2006-10-30 07:24:07 UTC
The problem is still present in kernel-xen-2.6.18-1.2798.fc6.i686 but cannot be
reproduced that easily any more. It seems to be caused by the wrong
initialization of the agpgart. Please see new comments of bug #206757 .


Comment 8 Matthew Miller 2007-04-06 18:06:55 UTC
Fedora Core 5 and Fedora Core 6 are, as we're sure you've noticed, no longer
test releases. We're cleaning up the bug database and making sure important bug
reports filed against these test releases don't get lost. It would be helpful if
you could test this issue with a released version of Fedora or with the latest
development / test release. Thanks for your help and for your patience.

[This is a bulk message for all open FC5/FC6 test release bugs. I'm adding
myself to the CC list for each bug, so I'll see any comments you make after this
and do my best to make sure every issue gets proper attention.]


Comment 9 Red Hat Bugzilla 2007-07-25 01:33:40 UTC
change QA contact

Comment 10 Chris Lalancette 2008-02-26 23:22:11 UTC
This report targets FC6, which is now end-of-life.

Please re-test against Fedora 7 or later, and if the issue persists, open a new bug.

Thanks