| Summary: | qemu should detect logical_sector_size value needed to make cache=none just work | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Jeff Moyer <jmoyer> |
| Component: | qemu-kvm | Assignee: | Minchan Kim <minchan> |
| Status: | CLOSED DUPLICATE | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.1 | CC: | chellwig, crobinso, juzhang, minchan, mkenneth, msnitzer, pbonzini, rhod, tburke, virt-maint |
| Target Milestone: | rc | Keywords: | Reopened |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-12-12 08:50:21 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Jeff Moyer
2011-07-15 21:17:23 UTC
You need to specify the logical_sector_size=4096 attribute for images that sit on 4k devices on the qemu command line. Without that you can't do I/O due to the direct I/O alignment restrictios. Note that this has huge implications for migration scenarios, and I don't think we have fully hashed out our story for it yet. This BZ 1) is really just a real-world bug of the more generic (ignored) bug I reported against RHEV long ago: bug#597402 2) reinforces that we cannot ignore this problem any more. Any layer that starts qemu-kvm needs to pass the appropriate virtio attribute(s) on the command line so that a guest with topology support (e.g. RHEL6) can automatically pick up the storage's I/O topology limits (logical_block_size, etc). Maybe we should have qemu once and for all learn to set a sensible default cache mode depending on the backing storage, and have our tools not forcibly set any cache mode by default. Solving this in all the tools is fragile and a waste of time. Since options like cache=none and aio=native have multiple caveats, qemu is the only app to authoritatively determine the optimal cache option, and qemu developers are most knowledgeable to identify the caveats and work around them if more are discovered. Provided storage sector size is actually detectable, options here are: 1) Expose that logical_sector_size option in the libvirt XML. Leave it up to virt-manager + RHEV to detect the storage sector size and specify the needed value when creating the libvirt XML. Not sure how RHEV or virt-manager gets that info from storage on a remote host? 2) Have libvirt detect the storage sector size and specify the needed logical_sector_size on the qemu command line. 3) Same as 2, but do it in qemu if no logical_sector_size was passed on the cli. From my naive view 3 sounds optimal. Reassigning to qemu No, it's not a qemu bug. Changing the logical sector size is a guest visible ABI and needs to be choses by the management tools for the initial run and can't be changed later. (In reply to comment #6) > No, it's not a qemu bug. Changing the logical sector size is a guest visible > ABI and needs to be choses by the management tools for the initial run and > can't be changed later. Um, all well and good that it isn't a qemu bug but... it certainly is a bug that the management tools should take on. Closing this CANTFIX is just odd... (In reply to comment #6) > No, it's not a qemu bug. Changing the logical sector size is a guest visible > ABI and needs to be choses by the management tools for the initial run and > can't be changed later. Then for -M RHEL6.1 and earlier, don't auto set a sector size. For -M RHEL6.2 and later, detect the sector size. Guests created with libvirt on 6.0 or 6.1 can use the newer qemu binary. Certainly that can cause issues with migration if the backing storage changes sector size, but as you said migration is a known issue here. Even if this is handled in the tools, qemu should detect sector size and explicitly disallow cache=none for 4K sectors. Or better yet, print to stderr and gracefully fallback to a working cache mode. Reopening (In reply to comment #8) > (In reply to comment #6) > > No, it's not a qemu bug. Changing the logical sector size is a guest visible > > ABI and needs to be choses by the management tools for the initial run and > > can't be changed later. > > Then for -M RHEL6.1 and earlier, don't auto set a sector size. For -M RHEL6.2 > and later, detect the sector size. I'm not sure of all the virt particulars of distinguishing between 6.1 and 6.2 but: RHEL 6.1 has all the needed kernel support to be able to propagate the logical_block_size up the stack and then allow the virt management tools to detect it. (In reply to comment #8) > (In reply to comment #6) > > No, it's not a qemu bug. Changing the logical sector size is a guest visible > > ABI and needs to be choses by the management tools for the initial run and > > can't be changed later. > > Then for -M RHEL6.1 and earlier, don't auto set a sector size. For -M RHEL6.2 > and later, detect the sector size. Guests created with libvirt on 6.0 or 6.1 > can use the newer qemu binary. Certainly that can cause issues with migration > if the backing storage changes sector size, but as you said migration is a > known issue here. It has nothing to do with the machine. An image with logical_sector_size=512 vs one with logical_sector_size4096 is a huge visible difference to the guest. I basically means you can't use it as file systems rely on the block size. E.g. if you do that to your root fs it simply won't boot. The sector_size needs to be set once when creating an image, and never changed after that. > Even if this is handled in the tools, qemu should detect sector size and > explicitly disallow cache=none for 4K sectors. Or better yet, print to stderr > and gracefully fallback to a working cache mode. Reopening cache=none works perfectly fine with 4k sectors. The only thing you must not do is switching between different ones. (In reply to comment #9) > > I'm not sure of all the virt particulars of distinguishing between 6.1 and 6.2 > but: > > RHEL 6.1 has all the needed kernel support to be able to propagate the > logical_block_size up the stack and then allow the virt management tools to > detect it. The qemu -machine value isn't about the capabilities of RHEL6.0 vs 6.1 vs 6.2, those values are just a way for management tools to maintain a stable guest hw arrangement when a qemu update changes hw defaults. -M rhel6.0 means "use the defaults qemu had in rhel6.0", so any guests created by libvirt on 6.0 don't see hardware moving around when booted on a newer qemu (In reply to comment #10) > > It has nothing to do with the machine. An image with logical_sector_size=512 > vs one with logical_sector_size4096 is a huge visible difference to the guest. > I basically means you can't use it as file systems rely on the block size. > E.g. if you do that to your root fs it simply won't boot. > Thanks, I understand now why qemu can't sync this value. > The sector_size needs to be set once when creating an image, and never changed > after that. > > > Even if this is handled in the tools, qemu should detect sector size and > > explicitly disallow cache=none for 4K sectors. Or better yet, print to stderr > > and gracefully fallback to a working cache mode. Reopening > > cache=none works perfectly fine with 4k sectors. The only thing you must not > do is switching between different ones. Sorry, I was confused. but it sounds like qemu could still detect the situation you mention in comment #2 > You need to specify the logical_sector_size=4096 attribute for images that sit > on 4k devices on the qemu command line. Without that you can't do I/O due to > the direct I/O alignment restrictios. So explicitly error or fallback to another cache mode if cache == None and host sector size != logical sector size. (In reply to comment #12) > > You need to specify the logical_sector_size=4096 attribute for images that sit > > on 4k devices on the qemu command line. Without that you can't do I/O due to > > the direct I/O alignment restrictios. > > So explicitly error or fallback to another cache mode if cache == None and host > sector size != logical sector size. Unfortunately qemu has no way to remember what sector size a guest expects from a given image. We could store it with qcow2 images, which might make some sense as a safety net, but neither raw images which are a plain pass-through nor foreign images would be able to make use of it. I'd also argue that fallback is something that we should leave to the management tools as an absolute last resort, and only after telling users about it given that it has huge performance implications. It's a dup of bug 748906. The info in the two bugs is overlapping. Since the other is assigned to me I'm closing this one. *** This bug has been marked as a duplicate of bug 748906 *** |