Description of problem: The summary pretty much says it all; since virt-install doesn't use O_DIRECT when you pass --non-sparse (which is now recommended), it fills the pagecache with zeros, basically. Besides being useless, this also exacerbates another Xen bug.
BZ 222467 describes a bug where the user gets a "Cannot allocate memory" error from the hypervisor when trying to start a fully-virtualized domain. To get out of this state the user generally has to reboot the system to get fully virtualized guests booting again. Basically this bug boils down to the Hypervisor reading uninitialized memory as valid page tables; most of the time you get lucky and nothing bad comes of it, but sometimes you fail. However, *this* bug exacerbates the problem in BZ 222467. I'm not 100% sure what is going on, but by filling the page cache (and hence, possibly, moving memory structures around, or zeroing out certain pages), it is *far* more likely to hit 222467. So, because we are recommending that everyone use non-sparse files (for performance and ENOSPC issues), anyone who follows our recommendation while using virt-install has a better chance of failing to start with "Cannot allocate memory". Fixing this bug will just cause the "Cannot allocate memory" to be less likely to happen; the real fix (as discussed in 222467) is another upstream hypervisor patch. Chris Lalancette
Created attachment 146255 [details] Make virt-install use a dd with O_DIRECT to avoid filling the page cache
I'm fine including the patch from comment #4 in the RHEL-5 GA build of python-virtinst, but only with the understanding that we will fix the HV properly at the soonest opportunity. This patch isn't something we can carry long-term because it will not play nicely with forthcoming UI improvements, such as progress feedback while creating the files.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux major release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Major release. This request is not yet committed for inclusion.
QE ack for RHEL5. This creates lots of problems with our hardware cert suite.
Chris Lalancette has tested the patch in comment #4 and seen it dramatically reduce the incidence of the 'Cannot allocate memory' bug described in comment #1. The patch has also been reviewed by myself & hugh brock. Finally the QA guest installation tests all use the virt-install tool. Currently they always create sparse file, and they will shortly also create non-sparse files. Thus the QA guest install tests will be able to quickly validate that this patch does not introduce any regressions.
Fix committed to CVS: * Wed Jan 24 2007 Daniel Berrange <berrange> - 0.99.0-2.el5 - use dd with o_direct to create non-sparse files (bz #223491) And built in brew in the dist-5E-qu-candidate collection: $ brew latest-pkg dist-5E-qu-candidate python-virtinst Build Tag Built by ---------------------------------------- -------------------- ---------------- python-virtinst-0.99.0-2.el5 dist-5E-qu-candidate berrange I have tested the following: virt-install using sparse file virt-install using non-sparse file virt-manager using sparse file virt-manager using non-sparse file All four cases successfully completed file creation, and the non-sparse cases correctly using O_DIRECT.
I'm reopening this issue based on the information in comment 13 and comment 14. Marking as candidate for 5.1. This was marked as a blocker for RHEL5 GA, so proposing for 5.1 blocker. QE ack for RHEL5.1.
The comments in #13 & #14 are not related to the problem reported in this bug - in fact they confirm that virt-install is using O_DIRECT correctly. A new BZ should be opened to track the different problem reported in comment #13/14 since it is a hypervisor/kernel issue, rather than a tools issue.