Bug 445504

Summary: kernel crash when three virtualized guest are running at the same time
Product: Red Hat Enterprise Linux 5 Reporter: Michal Nowak <mnowak>
Component: kernel-xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: medium    
Version: 5.2CC: ohudlick
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-07-22 12:16:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
xend.log
none
xend.log old one none

Description Michal Nowak 2008-05-07 09:05:40 UTC
Description of problem:

Boot physical system with kernel-xen, prepare at least three guests in
virtual-manager. In my case: rawhide + rhel5.2 (both full virt) + rhel4.7
(para). Start one after another, crash happens in few seconds after the third
guest is run.

Version-Release number of selected component (if applicable):

kernel-xen-2.6.18-92.el5
virt-manager-0.5.3-8.el5

How reproducible:
always

Not tested with kernel-2.6.18-*.el5

Comment 1 Bill Burns 2008-05-07 12:03:27 UTC
Can you provide the console output from the crash?


Comment 2 Michal Nowak 2008-05-09 15:50:49 UTC
Not much interesting from my POV in log:

Before the first crash:

May  7 09:37:30 dhcp-lab-198 kernel: virbr0: port 7(vif13.0) entering disabled state
May  7 09:37:30 dhcp-lab-198 kernel: device vif13.0 left promiscuous mode
May  7 09:37:30 dhcp-lab-198 kernel: virbr0: port 7(vif13.0) entering disabled state
May  7 09:37:31 dhcp-lab-198 logger: /etc/xen/scripts/blktap: xenstore-read
/local/domain/13/vm failed.
May  7 09:37:31 dhcp-lab-198 logger: /etc/xen/scripts/blktap:
/etc/xen/scripts/blktap failed; error detected.
May  7 09:38:19 dhcp-lab-198 kernel: tap tap-14-51712: 2 getting info
May  7 09:38:19 dhcp-lab-198 kernel: device vif14.0 entered promiscuous mode
May  7 09:38:19 dhcp-lab-198 kernel: ADDRCONF(NETDEV_UP): vif14.0: link is not ready
May  7 09:38:20 dhcp-lab-198 kernel: ADDRCONF(NETDEV_CHANGE): vif14.0: link
becomes ready
May  7 09:38:20 dhcp-lab-198 kernel: blktap: ring-ref 522, event-channel 9,
protocol 1 (x86_32-abi)
May  7 09:38:20 dhcp-lab-198 kernel: virbr0: port 7(vif14.0) entering listening
state
May  7 09:38:35 dhcp-lab-198 kernel: virbr0: port 7(vif14.0) entering learning state
May  7 09:38:50 dhcp-lab-198 kernel: virbr0: topology change detected, propagating
May  7 09:38:50 dhcp-lab-198 kernel: virbr0: port 7(vif14.0) entering forwarding
state

...

then it worked for 10 minutes, I issued the third guest and it restarted/crashed.

Before second crash:

May  7 10:25:45 dhcp-lab-198 kernel: device tap5 entered promiscuous mode
May  7 10:25:45 dhcp-lab-198 kernel: xenbr0: port 7(tap5) entering learning state
May  7 10:25:45 dhcp-lab-198 kernel: xenbr0: topology change detected, propagating
May  7 10:25:45 dhcp-lab-198 kernel: xenbr0: port 7(tap5) entering forwarding state
May  7 10:25:46 dhcp-lab-198 kernel: device vif7.0 entered promiscuous mode

<< immediate crash here

Attaching /var/log/xen/xend.log and /var/log/xen/xend.log.1, interesting seems
to be the end of $NAME.1 and the beginning of $NAME.

I am not able to provide output right after the crash because it simply
restarted. If you are aware of some place where is output from console
redirected let me know.

Comment 3 Michal Nowak 2008-05-09 15:50:59 UTC
Created attachment 304961 [details]
xend.log

Comment 4 Michal Nowak 2008-05-09 15:51:20 UTC
Created attachment 304962 [details]
xend.log old one

Comment 5 Bill Burns 2008-05-09 17:24:58 UTC
Sorry for not being clear. Can you run with serial console and capture that
output? That will show us the fault/panic info we need. Also is this on any
specific hardware? How much memory is on the system and how much are you giving
to each guest?

Thanks



Comment 6 Michal Nowak 2008-05-12 15:06:06 UTC
Nothing special: Dell Precision 490, dual 64bit Xeon, 2 GB memory; 512 MB per
each guest. I am not able to reproduce it again but was able to do so the week
before.

Let's re-open it when it crashes again but for now...

Thanks for you time, Bill.

Comment 7 Bill Burns 2008-05-12 15:30:13 UTC
Ok, thanks.


Comment 8 Michal Nowak 2008-05-12 19:38:36 UTC
Umm, I've been hit by this again when mangling with several guests... Could you
please point me or guide how to set up serial console?

The thing is that when it crasher it immediately restarts so I am unable to
catch any error msg, if any's there. 

Is it even possible to have the Dom0 shut down but hypervisor running and
possibly providing the error output? The crashes are irregular but often so I
can catch it someday. 

Comment 9 Bill Burns 2008-05-12 20:32:39 UTC
In the grub configuration for the "kernel" line (which loads xen.gz) you usually
add:
  console=com1 com1=115200,8n1
then on the next module line, where the dom 0 kernel gets loaded, you add
 console=ttyS0,115200

Then you need a serial cable to the box from another system ans
use the ttywatch program to monitor it.



Comment 10 Michal Nowak 2008-07-22 12:16:46 UTC
Can't provide more data; never been hit by this for a long time. Thanks for your
time a nice guidance, Bill.