Description of problem: We have a xen host running RHEL5.2 with up to 7 xen guests. The host has been running for several months, and the VMs are typically running for several days at a time, perhaps as long as a month. We tried stopping and starting a single xen guest and it wouldn't start again. No errors or output reported to the command line; the `xm create` command simply didn't respond. After a day of debugging, someone noticed that there were 10s of thousands of "xenbl*" files in /var/lib/xen. We cleared out those files and restarted xend. Now, we are able to stop and start VMs without issue. I see this block in our /var/log/xen/xend.log file: [2008-11-17 12:07:50 xend.XendDomainInfo 15074] INFO (XendDomainInfo:234) Recreating domain 8, UUID 4bd90068-c83b-e964-7323-e7eeae01a0b2. [2008-11-17 12:07:50 xend 15074] ERROR (XendDomain:221) Failed to recreate information for domain 8. Destroying it in the hope of recovery. Traceback (most recent call last): File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 215, in refresh self._add_domain( File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 258, in recreate vm = XendDomainInfo(xeninfo, domid, dompath, True, priv) File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 468, in __init__ self.validateInfo() File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line 618, in validateInfo raise VmError('Invalid memory size') VmError: Invalid memory size The xen guest in question seemed to have correct memory settings for maxmem and memory. But, it's possible that this error was leaving behind the "xenbl*" files in /var/lib/xen/. Based on https://bugzilla.redhat.com/show_bug.cgi?id=182328, it looks like code was added to deal with errors that resulted in dangling xenbl files. But, it looks like the "finally" block suggested by the reported was never integrated into the XendBootloader code. It appears that a finally block is definitely in order for that code. Version-Release number of selected component (if applicable): xen 3.0.3-64 How reproducible: Two of our hosts have experienced this problem so far. But, now that we have a workaround (see below), we shouldn't have any serious problems moving forward due to this bug. Steps to Reproduce: 1. Long running xen host with VMs that produce errors on startup Actual results: The Xen host can use up all of the 32,000 xenbl files allowed by the bootloader function in XendBootloader. Expected results: The bootloader function cleans up all xenbl files even if it errors on startup. Additional info: Workaround: Remove the xenbl* files from /var/lib/xen/ and restart xend service
(In reply to comment #0) > Description of problem: > > We have a xen host running RHEL5.2 with up to 7 xen guests. The host has been > running for several months, and the VMs are typically running for several days > at a time, perhaps as long as a month. > > We tried stopping and starting a single xen guest and it wouldn't start again. > No errors or output reported to the command line; the `xm create` command > simply didn't respond. > > After a day of debugging, someone noticed that there were 10s of thousands of > "xenbl*" files in /var/lib/xen. We cleared out those files and restarted xend. > Now, we are able to stop and start VMs without issue. > > I see this block in our /var/log/xen/xend.log file: > > [2008-11-17 12:07:50 xend.XendDomainInfo 15074] INFO (XendDomainInfo:234) > Recreating domain 8, UUID 4bd90068-c83b-e964-7323-e7eeae01a0b2. > [2008-11-17 12:07:50 xend 15074] ERROR (XendDomain:221) Failed to recreate > information for domain 8. Destroying it in the hope of recovery. > Traceback (most recent call last): > File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomain.py", line 215, > in refresh > self._add_domain( > File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line > 258, in recreate > vm = XendDomainInfo(xeninfo, domid, dompath, True, priv) > File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line > 468, in __init__ > self.validateInfo() > File "/usr/lib64/python2.4/site-packages/xen/xend/XendDomainInfo.py", line > 618, in validateInfo > raise VmError('Invalid memory size') > VmError: Invalid memory size > > The xen guest in question seemed to have correct memory settings for maxmem and > memory. But, it's possible that this error was leaving behind the "xenbl*" > files in /var/lib/xen/. > > Based on https://bugzilla.redhat.com/show_bug.cgi?id=182328, it looks like code > was added to deal with errors that resulted in dangling xenbl files. But, it > looks like the "finally" block suggested by the reported was never integrated > into the XendBootloader code. It appears that a finally block is definitely in > order for that code. > > > Version-Release number of selected component (if applicable): xen 3.0.3-64 > > > How reproducible: > > Two of our hosts have experienced this problem so far. But, now that we have a > workaround (see below), we shouldn't have any serious problems moving forward > due to this bug. > > > Steps to Reproduce: > 1. Long running xen host with VMs that produce errors on startup > > > Actual results: > > The Xen host can use up all of the 32,000 xenbl files allowed by the bootloader > function in XendBootloader. > > > Expected results: > > The bootloader function cleans up all xenbl files even if it errors on startup. > > > Additional info: > > Workaround: > > Remove the xenbl* files from /var/lib/xen/ and restart xend service Could you please try using the latest versions of xen and kernel-xen packages? Thanks, Michal
Well, I did investigate this in the XendBootloader.py code and there's a working os.unlink(fifo) code so the /var/lib/xen/xenbl.* files are currently not there and I did try to boot the PV guest properly (i.e. by setting up right files in pygrub) and also I simulated the boot failure (by setting up some wrong file in the pygrub) and in both cases the /var/lib/xen/xenbl.* files were both created and deleted/unlinked. Greg, could you please try using the xen packages available in RHEL-5.5 ? I was unable to see the problem there since those files are being automatically deleted. Michal
Hi Michal, Sorry I didn't get back to you sooner. To be honest, we're not really running Xen on many hosts anymore. Those that do run Xen are older RHEL5 boxes that aren't really accessible to me to be upgraded. Pretty much everything we're running is KVM-based. Honestly, if no one else is running into this problem and you've verified that it's not an issue, I'd mark this as closed. Sorry I couldn't be of more help. :(
Since I've tried to reproduce it but I'm unable to reproduce it with RHEL-5.5 packages I'm closing it as CURRENTRELEASE. Michal