Bug 184378

Summary: Xen kernel BUG crash on x86_64
Product: [Fedora] Fedora Reporter: Aleksander Adamowski <bugs-redhat>
Component: kernel-xenAssignee: Juan Quintela <quintela>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: rawhideCC: bstein, fedora, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.20-1.2307.fc5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-04-17 13:21:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 179599    
Attachments:
Description Flags
Crash screenshot including debug info none

Description Aleksander Adamowski 2006-03-08 11:05:36 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.12) Gecko/20060202 Fedora/1.7.12-1.5.2

Description of problem:
When booting the latest Xen kernel for x86_64 on a HP Proliant DL 385 with 2 dual core Opteron processors (AMD Opteron(tm) Processor 265, 1800 MHz), the kernel crashes with a message:

Kernel BUG at arch/x86_64/mm/fault-xen.c:292
invalid opcode: 0000 [1]  SMP
CPU 1
...... more info in the screenshot I'm attaching

Reproducibility: sometimes (in about half cases the system boots up fine).


contents of /proc/cpuinfo:

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 33
model name      : AMD Opteron(tm) Processor 265
stepping        : 2
cpu MHz         : 1804.116
cache size      : 1024 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni cmp_legacy
bogomips        : 4512.23
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 33
model name      : AMD Opteron(tm) Processor 265
stepping        : 2
cpu MHz         : 1804.116
cache size      : 1024 KB
physical id     : 0
siblings        : 1
core id         : 0
cpu cores       : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni cmp_legacy
bogomips        : 4512.23
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp


Version-Release number of selected component (if applicable):
2.6.15-1.2025_FC5xen0

How reproducible:
Sometimes

Steps to Reproduce:
1. Boot the latest xen0 kernel on a HP Proliant DL385 with two dual core Opterons 265


Additional info:

Comment 1 Aleksander Adamowski 2006-03-08 11:07:16 UTC
Created attachment 125792 [details]
Crash screenshot including debug info

Comment 2 Aleksander Adamowski 2006-03-08 11:09:48 UTC
See also bug 183221, which covers a hard system hang of the same machine under
earlier Xen kernels for x86_64.

Comment 3 Aleksander Adamowski 2006-03-14 09:56:45 UTC
What's strange, if the system boots successfully to domain 0, then it works fine
afterwards. I'm currently running 2.6.15-1.2038_FC5xen0 and have accumulated 3
days uptime.

The problem is, guest domains have no network connectivity (maybe it's related
to the fact that the crash stacktrace I've attached shows that the crash occured
in the tg3 driver?).

The NIC in the machine is a Broadcom Corporation NetXtreme BCM5704 Gigabit
Ethernet (rev 10).


Comment 4 Andy Burns 2006-03-15 21:07:17 UTC
Very similar to crashes I've experienced with 2041_FC5xen0 and 2054_FC5xen0 the
crash happens intermittenty at boot time, if the crash doesn't happen the system
is then stable.

Full log attached to https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=185235
but the pertinent bit seems the same as above

Kernel BUG at arch/x86_64/mm/fault-xen.c:292
invalid opcode: 0000 [1] SMP
CPU 0
Modules linked in: snd_mixer_oss snd_pcm snd_timer snd soundcore snd_page_alloc
hw_random i2c_i801 i2c_core dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd ahci
libata sd_mod scsi_mod
Pid: 615, comm: udevd Not tainted 2.6.15-1.2041_FC5xen0 #1


Comment 5 Stephen Tweedie 2006-03-16 19:58:30 UTC
*** Bug 185235 has been marked as a duplicate of this bug. ***

Comment 6 Stephen Tweedie 2007-03-16 15:08:09 UTC
Is this still reproducible on the latest stable release+updates?  Thanks.


Comment 7 Aleksander Adamowski 2007-04-16 15:19:42 UTC
Didn't experience crashes with 2.6.20-1.2307.fc5xen0 or 2.6.18-1.2239.fc5xen0
anymore.