Bug 204321

Summary: random crashes
Product: [Fedora] Fedora Reporter: Thomas Hutterer <thu>
Component: xenAssignee: Xen Maintainance List <xen-maint>
Status: CLOSED NEXTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: atontti+rh, bstein, katzj
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-03-16 14:27:13 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Thomas Hutterer 2006-08-28 10:56:49 EDT
Description of problem:
my FC5 guest randomly crashes

Version-Release number of selected component (if applicable):
HOST: 
kernel-xen0-2.6.17-1.2174_FC5 (also tryed kernel-xen0-2.6.17-1.2157_FC5 and
kernel-xen-2.6.17-1.2174_FC5 with same result)
xen-3.0.2-3.FC5

GUEST: 
kernel-xenU-2.6.17-1.2174_FC5 (also tryed kernel-xenU-2.6.17-1.2157_FC5 and
kernel-xen-2.6.17-1.2174_FC5 with same result)

How reproducible:
crashes from time to time about once a day, but wether at the same time, nor on
same IO oder same load. 

Steps to Reproduce:
1. install FC5 on Host and make yum update
2. install FC5 on Guest and make yum update
3. start guest (xm create guest)
4. wait some random time
  
Actual results:
guest crashes, with following msg in xend.log: 

[2006-08-28 05:01:43 xend.XendDomainInfo] WARNING (XendDomainInfo:861) Domain
has crashed: name=cms2 id=1.
[2006-08-28 05:01:43 xend.XendDomainInfo] ERROR (XendDomainInfo:977)
XendDomainInfo.dumpCore failed: id = 1 name = cms2
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 973,
in dumpCore
    xc.domain_dumpcore(self.domid, corefile)
Error: (2, 'No such file or directory')
[2006-08-28 05:01:44 xend.XendDomainInfo] DEBUG (XendDomainInfo:1405)
XendDomainInfo.destroyDomain(1)
[2006-08-28 05:01:45 xend.XendDomainInfo] DEBUG (XendDomainInfo:185)
XendDomainInfo.create(['domain', ['domid', 1], ['uuid', 'b1584e
45-7539-3b6e-dc55-9a72d92053da'], ['vcpus', 1], ['vcpu_avail', 1],
['cpu_weight', 1.0], ['memory', 3000], ['maxmem', 3000], ['bootlo
ader', '/usr/bin/pygrub'], ['features', ''], ['name', 'cms2'], ['on_poweroff',
'destroy'], ['on_reboot', 'restart'], ['on_crash', 'r
estart'], ['image', ['linux', ['ramdisk', '/var/lib/xen/initrd.NMlsAT'],
['kernel', '/var/lib/xen/vmlinuz.J0d3Ij'], ['args', 'ro roo
t=LABEL=/']]], ['device', ['vif', ['backend', 0], ['script', 'vif-bridge'],
['mac', '00:16:3e:19:49:d0']]], ['device', ['vbd', ['bac
kend', 0], ['dev', 'xvda'], ['uname', 'file:/xen/cms2/cms2.disk'], ['mode',
'w']]], ['state', '----c-'], ['shutdown_reason', 'crash'
], ['cpu_time', 2394.316973941], ['online_vcpus', 1], ['up_time',
'23072.2200789'], ['start_time', '1156711032.22'], ['store_mfn', 8
80309], ['console_mfn', 880308]])
[2006-08-28 05:01:45 xend.XendDomainInfo] DEBUG (XendDomainInfo:291)
parseConfig: config is ['domain', ['domid', 1], ['uuid', 'b1584
e45-7539-3b6e-dc55-9a72d92053da'], ['vcpus', 1], ['vcpu_avail', 1],
['cpu_weight', 1.0], ['memory', 3000], ['maxmem', 3000], ['bootl
oader', '/usr/bin/pygrub'], ['features', ''], ['name', 'cms2'], ['on_poweroff',
'destroy'], ['on_reboot', 'restart'], ['on_crash', '
restart'], ['image', ['linux', ['ramdisk', '/var/lib/xen/initrd.NMlsAT'],
['kernel', '/var/lib/xen/vmlinuz.J0d3Ij'], ['args', 'ro ro
ot=LABEL=/']]], ['device', ['vif', ['backend', 0], ['script', 'vif-bridge'],
['mac', '00:16:3e:19:49:d0']]], ['device', ['vbd', ['ba
ckend', 0], ['dev', 'xvda'], ['uname', 'file:/xen/cms2/cms2.disk'], ['mode',
'w']]], ['state', '----c-'], ['shutdown_reason', 'crash
'], ['cpu_time', 2394.316973941], ['online_vcpus', 1], ['up_time',
'23072.2200789'], ['start_time', '1156711032.22'], ['store_mfn',
880309], ['console_mfn', 880308]]
[2006-08-28 05:01:45 xend.XendDomainInfo] DEBUG (XendDomainInfo:390)
parseConfig: result is {'uuid': 'b1584e45-7539-3b6e-dc55-9a72d9
2053da', 'on_crash': 'restart', 'on_reboot': 'restart', 'image': ['linux',
['ramdisk', '/var/lib/xen/initrd.NMlsAT'], ['kernel', '/v
ar/lib/xen/vmlinuz.J0d3Ij'], ['args', 'ro root=LABEL=/']], 'on_poweroff':
'destroy', 'bootloader_args': None, 'cpus': None, 'name':
'cms2', 'backend': [], 'vcpus': 1, 'cpu_weight': 1.0, 'features': '',
'vcpu_avail': 1, 'memory': 3000, 'device': [('vif', ['vif', ['
backend', 0], ['script', 'vif-bridge'], ['mac', '00:16:3e:19:49:d0']]), ('vbd',
['vbd', ['backend', 0], ['dev', 'xvda'], ['uname', '
file:/xen/cms2/cms2.disk'], ['mode', 'w']])], 'bootloader': '/usr/bin/pygrub',
'cpu': None, 'maxmem': 3000}
[2006-08-28 05:01:45 xend.XendDomainInfo] DEBUG (XendDomainInfo:1216)
XendDomainInfo.construct: None
[2006-08-28 05:01:45 xend.XendDomainInfo] DEBUG (XendDomainInfo:1248)
XendDomainInfo.initDomain: 2 1.0
[2006-08-28 05:01:45 xend] DEBUG (balloon:134) Balloon: free 1203; need 3001;
retries: 15.
[2006-08-28 05:01:47 xend] DEBUG (balloon:143) Balloon: setting dom0 target to 257.
[2006-08-28 05:01:47 xend.XendDomainInfo] DEBUG (XendDomainInfo:987) Setting
memory target of domain Domain-0 (0) to 257 MiB.
[2006-08-28 05:01:47 xend] DEBUG (balloon:128) Balloon: free 3001; need 3001; done.
[2006-08-28 05:01:48 xend] INFO (image:134) buildDomain os=linux dom=2 vcpus=1
[2006-08-28 05:01:48 xend] DEBUG (image:177) dom            = 2
[2006-08-28 05:01:48 xend] DEBUG (image:178) image          =
/var/lib/xen/vmlinuz.J0d3Ij
[2006-08-28 05:01:48 xend] DEBUG (image:179) store_evtchn   = 1
[2006-08-28 05:01:48 xend] DEBUG (image:180) console_evtchn = 2
[2006-08-28 05:01:48 xend] DEBUG (image:181) cmdline        =  ro root=LABEL=/
[2006-08-28 05:01:48 xend] DEBUG (image:182) ramdisk        =
/var/lib/xen/initrd.NMlsAT
[2006-08-28 05:01:48 xend] DEBUG (image:183) vcpus          = 1
[2006-08-28 05:01:48 xend] DEBUG (image:184) features       =
[2006-08-28 05:01:48 xend] DEBUG (DevController:110) DevController: writing
{'backend-id': '0', 'mac': '00:16:3e:19:49:d0', 'handle'
: '0', 'state': '1', 'backend': '/local/domain/0/backend/vif/2/0'} to
/local/domain/2/device/vif/0.
[2006-08-28 05:01:48 xend] DEBUG (DevController:112) DevController: writing
{'mac': '00:16:3e:19:49:d0', 'state': '1', 'handle': '0'
, 'script': '/etc/xen/scripts/vif-bridge', 'frontend-id': '2', 'domain': 'cms2',
'frontend': '/local/domain/2/device/vif/0'} to /loc
al/domain/0/backend/vif/2/0.
[2006-08-28 05:01:48 xend] DEBUG (blkif:24) exception looking up device number
for xvda: [Errno 2] No such file or directory: '/dev/
xvda'
[2006-08-28 05:01:48 xend] DEBUG (DevController:110) DevController: writing
{'virtual-device': '51712', 'backend-id': '0', 'state':
'1', 'backend': '/local/domain/0/backend/vbd/2/51712'} to
/local/domain/2/device/vbd/51712.
[2006-08-28 05:01:48 xend] DEBUG (DevController:112) DevController: writing
{'domain': 'cms2', 'frontend': '/local/domain/2/device/v
bd/51712', 'dev': 'xvda', 'state': '1', 'params': '/xen/cms2/cms2.disk', 'mode':
'w', 'frontend-id': '2', 'type': 'file'} to /local/
domain/0/backend/vbd/2/51712.
[2006-08-28 05:01:48 xend.XendDomainInfo] DEBUG (XendDomainInfo:701) Storing VM
details: {'uuid': 'b1584e45-7539-3b6e-dc55-9a72d9205
3da', 'on_reboot': 'restart', 'start_time': '1156734108.64', 'on_poweroff':
'destroy', 'name': 'cms2', 'vcpus': '1', 'vcpu_avail': '
1', 'memory': '3000', 'on_crash': 'restart', 'image': "(linux (ramdisk
/var/lib/xen/initrd.NMlsAT) (kernel /var/lib/xen/vmlinuz.J0d3
Ij) (args 'ro root=LABEL=/'))", 'maxmem': '3000'}
[2006-08-28 05:01:48 xend.XendDomainInfo] DEBUG (XendDomainInfo:736) Storing
domain details: {'console/ring-ref': '681226', 'console
/port': '2', 'name': 'cms2', 'console/limit': '1048576', 'vm':
'/vm/b1584e45-7539-3b6e-dc55-9a72d92053da', 'domid': '2', 'cpu/0/avai
lability': 'online', 'memory/target': '3072000', 'store/ring-ref': '681227',
'store/port': '1'}
[2006-08-28 05:01:48 xend.XendDomainInfo] DEBUG (XendDomainInfo:919)
XendDomainInfo.handleShutdownWatch
[2006-08-28 05:02:02 xend.XendDomainInfo] WARNING (XendDomainInfo:861) Domain
has crashed: name=cms2 id=2.
[2006-08-28 05:02:02 xend.XendDomainInfo] ERROR (XendDomainInfo:977)
XendDomainInfo.dumpCore failed: id = 2 name = cms2
Traceback (most recent call last):
  File "/usr/lib/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 973,
in dumpCore
    xc.domain_dumpcore(self.domid, corefile)
Error: (2, 'No such file or directory')
[2006-08-28 05:02:02 xend.XendDomainInfo] ERROR (XendDomainInfo:1577) VM cms2
restarting too fast (18.377953 seconds since the last
restart).  Refusing to restart to avoid loops.
[2006-08-28 05:02:02 xend.XendDomainInfo] DEBUG (XendDomainInfo:1397)
XendDomainInfo.destroy: domid=2
[2006-08-28 05:02:02 xend.XendDomainInfo] DEBUG (XendDomainInfo:1405)
XendDomainInfo.destroyDomain(2)

Expected results:
stable runnding xen guest

Additional info:
HOST: 
=============================================

lsmod

Module                  Size  Used by
xt_physdev              6481  1 
bridge                 51673  0 
drbd                  139092  3 
ipv6                  245985  20 
autofs4                24773  1 
ip_conntrack_netbios_ns     7105  0 
ipt_REJECT              9281  1 
xt_state                6209  2 
ip_conntrack           54177  2 ip_conntrack_netbios_ns,xt_state
nfnetlink              10841  1 ip_conntrack
xt_tcpudp               7233  4 
iptable_filter          7105  1 
ip_tables              17157  1 iptable_filter
x_tables               18117  5 xt_physdev,ipt_REJECT,xt_state,xt_tcpudp,ip_tables
loop                   19273  2 
dm_mirror              25361  0 
dm_multipath           23113  0 
dm_mod                 58841  2 dm_mirror,dm_multipath
video                  19525  0 
button                 10705  0 
battery                13381  0 
ac                      8901  0 
lp                     16393  0 
parport_pc             29669  0 
parport                38409  2 lp,parport_pc
sg                     36445  0 
e1000                 109881  0 
hw_random               9817  0 
e752x_edac             14405  0 
edac_mc                17925  1 e752x_edac
ext3                  125385  3 
jbd                    57813  1 ext3
megaraid_mbox          35793  7 
megaraid_mm            15085  1 megaraid_mbox
sd_mod                 22977  8 
scsi_mod              132073  3 sg,megaraid_mbox,sd_mod


GUEST: 
=============================================================

lsmod

Module                  Size  Used by
nfsd                  211281  5 
exportfs                9793  1 nfsd
lockd                  60233  1 nfsd
nfs_acl                 7617  1 nfsd
autofs4                24773  1 
ipv6                  245985  47 
sunrpc                149117  7 nfsd,lockd,nfs_acl
xennet                 25153  0 
ip_conntrack_netbios_ns     7105  0 
ipt_REJECT              9281  1 
xt_state                6209  10 
ip_conntrack           54177  2 ip_conntrack_netbios_ns,xt_state
nfnetlink              10841  1 ip_conntrack
xt_tcpudp               7233  12 
iptable_filter          7105  1 
ip_tables              17157  1 iptable_filter
x_tables               18117  4 ipt_REJECT,xt_state,xt_tcpudp,ip_tables
dm_mirror              25361  0 
dm_mod                 58841  1 dm_mirror
Comment 1 Stephen Tweedie 2006-10-11 12:28:43 EDT
Can you capture kernel output around the crash?  Ie. "xm console" output showing
the most recent kernel logs.
Comment 2 Asko Tontti 2006-10-11 15:16:34 EDT
I have the same problems than Thomas, and in my case "xm console 1" says
samething like no console information available (I don't remember exact words).
"xm list" shows "Zombie-foobar ... ----cd". The logs of the guest host don't
have anything about the crashes.
Comment 3 Asko Tontti 2006-10-17 08:14:03 EDT
# xm console 1
xenconsole: Could not read tty from store: No such file or directory
Comment 4 Stephen Tweedie 2007-03-16 10:47:15 EDT
You need to run "xm console $DOMID" while the domain is already running,
_before_ it crashes, to capture the crash output.  

Can you reproduce on current kernels?
Comment 5 Asko Tontti 2007-03-16 11:02:48 EDT
On spring I compiled my own xen package from Fedora CVS for new kernels. That
solved the problem for me.

Later I went back to normal Fedora RPMs when the problems were fixed in them. So
this is not a problem for me, anymore. I don't know about Thomas.
Comment 6 Stephen Tweedie 2007-03-16 14:27:13 EDT
OK, closing for now.  Thomas, please reopen if you can still reproduce problems
on the current packages.