Bug 233983

Summary: System crashes when running single paravirtualized guest w/AIM7
Product: Red Hat Enterprise Linux 5 Reporter: Joseph Szczypek <joseph.szczypek>
Component: kernel-xenAssignee: Aron Griffis <agriffis>
Status: CLOSED CURRENTRELEASE QA Contact: Martin Jenner <mjenner>
Severity: urgent Docs Contact:
Priority: medium    
Version: 5.0CC: martine.silbermann, xen-maint
Target Milestone: ---   
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: 5.1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-08-06 19:14:10 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 223107    

Description Joseph Szczypek 2007-03-26 14:18:47 UTC
Description of problem:
While running AIM7 dbase workload on a one VCPU, 8GB, 13 drive DomU (one
physical system disk, and 12 physical target disks (all 12 in one MSA1000),
system crashes.   Swap is a partition on DomU system disk. Console output has
'swapper' in output - problem appears to be occurring as load gets high

Problem was also encountered running other workloads (ie: fserver)

Hardware in system includes 96GB memory, 8 CPUs (4 dual-core) and 80 physical
hard drives.  Dom0 is configured with one VCPU, 1GB memory. 

Version-Release number of selected component (if applicable):
2.6.18-8.1.1.el5xen

How reproducible:
Easily.   

Steps to Reproduce:
1.  Install RHEL5 with ia64 Tech Preview
2.  Install guest and AIM7
3.  Run AIM7 on guest (dbase workload) to crossover
  
Actual results:
System panics.

Expected results:
AIM7 completes successfully on guest.  

Additional info:

Output from console:
Pid: 0, CPU 0, comm:              swapper
psr : 0000001008026010 ifs : 8000000000000183 ip  : [<a0000001002ac1a0>]    Not 
tainted
ip is at memset+0x240/0x420
unat: 0000000000000000 pfs : 8010000000000309 rsc : 0000000000000008
rnat: 0000000000000000 bsps: e000000038681d60 pr  : 0000000000010525
ldrs: 0000000000000000 ccv : 0000000000000401 fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001002abf30 b6  : a000000200183100 b7  : a0000001003c1ba0
f6  : 1003e0000000000000000 f7  : 0ffe380c7000000000000
f8  : 10008cb80000000000000 f9  : 10002a000000000000000
f10 : 10005a2ccccccc994a000 f11 : 1003e0000000000000051
r1  : a000000100bc9cb0 r2  : e00000000e478008 r3  : e00000000e478480
r8  : e0000000222e0000 r9  : 000000000000006f r10 : 0000000000000000
r11 : 00000000000109e5 r12 : a00000010073fb80 r13 : a000000100738000
r14 : e0000000222e0000 r15 : e00000000e478000 r16 : 0000000000004000
r17 : e0000000222e0000 r18 : e0000000222e0000 r19 : 0000000000000000
r20 : 0000000000000000 r21 : 0000000000000010 r22 : 0000000000000080
r23 : 0000000000000007 r24 : e0000000222e0000 r25 : 00000000001de840
r26 : 00000000001de840 r27 : e0000000222e0010 r28 : e0000000222e4000
r29 : 0000000000000000 r30 : 0000000000000000 r31 : 000000000000007f

Call Trace:
 [<a00000010001c8a0>] show_stack+0x40/0xa0
                                sp=a00000010073f710 bsp=a000000100739570
 [<a00000010001d1a0>] show_regs+0x840/0x880
                                sp=a00000010073f8e0 bsp=a000000100739518
 [<a0000001000413e0>] die+0x1c0/0x380
                                sp=a00000010073f8e0 bsp=a0000001007394d0
 [<a0000001005ee1c0>] ia64_do_page_fault+0x8a0/0x9e0
                                sp=a00000010073f900 bsp=a000000100739480
 [<a000000100065340>] xen_leave_kernel+0x0/0x3b0
                                sp=a00000010073f9b0 bsp=a000000100739480
 [<a0000001002ac1a0>] memset+0x240/0x420
                                sp=a00000010073fb80 bsp=a000000100739468
 [<a0000001002abf30>] __copy_user+0x930/0x960
                                sp=a00000010073fb80 bsp=a000000100739438
 <0>Kernel panic - not syncing: Fatal exception

Comment 1 Red Hat Bugzilla 2007-07-25 00:44:33 UTC
change QA contact

Comment 2 Ronald Pacheco 2007-07-26 15:51:43 UTC
Joe,

you reported in BZ 234325 a crash with a guest that is 90 Gb in size.  This is a
tad larger size guest.  Shall we close this as a duplicate of 234325?

Comment 3 Brian Stein 2007-07-26 17:19:21 UTC
Does this issue remain in 5.1 beta?

Comment 4 Joseph Szczypek 2007-07-26 20:35:17 UTC
(In reply to comment #2)
> Joe,
> 
> you reported in BZ 234325 a crash with a guest that is 90 Gb in size.  This is a
> tad larger size guest.  Shall we close this as a duplicate of 234325?

I believe they are different.  The guest in this bugzilla is 8GB in size.   It
is much smaller in size compared to the one in 234325.

Comment 5 Joseph Szczypek 2007-07-26 21:46:19 UTC
(In reply to comment #3)
> Does this issue remain in 5.1 beta?

At this time, I do not have access to the storage configuration I used when I
ran the AIM7 workloads.  I will try to get some appropriate storage and repeat
this experiment.

Comment 6 Brian Stein 2007-07-27 14:19:23 UTC
Changed to NEEDINFO, waiting for add'l testing feedback.

Comment 7 Joseph Szczypek 2007-08-06 16:00:35 UTC
(In reply to comment #6)
> Changed to NEEDINFO, waiting for add'l testing feedback.

I reran this experiment, but had to use a somewhat different configuration.   My
old system configuration when I reported this problem had 6 MSA1000 arrays, each
with one MSA30 attached to it to spread the disks across multiple buses.  For
the experiment back when I encountered the problem, I used only one of the
arrays to provide target drives (it had 12 drives configured). 

This time around, I tracked down one MSA1000 and set it up with 12 drives to use
with AIM7 workloads.

I also have a different FC switch this time around.

Dom0 is set up as 1GB, 1VCPU.   DomU has 1VCPU, 8GB memory, and 13 drives (one
system and 12 target (all 12 in MSA1000)).

Hardware in system includes 96GB memory, 8 CPUs (4 dual-core) and 28 physical
hard drives.

I reran the AIM7 dbase and fserver workloads.  They ran to crossover
successfully - no panic, no crash.

Comment 8 Brian Stein 2007-08-06 19:14:10 UTC
Closing per Comment #7 for current 5.1 beta.  Please re-open if this issue
appears in future tests.