Bug 722615

Summary: qemu should detect logical_sector_size value needed to make cache=none just work
Product: Red Hat Enterprise Linux 6 Reporter: Jeff Moyer <jmoyer>
Component: qemu-kvmAssignee: Minchan Kim <minchan>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.1CC: chellwig, crobinso, juzhang, minchan, mkenneth, msnitzer, pbonzini, rhod, tburke, virt-maint
Target Milestone: rcKeywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-12 08:50:21 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Jeff Moyer 2011-07-15 21:17:23 UTC
Description of problem:
Attempting to install a guest using an image file on a file system that is backed by a 4k logical sector size disk fails.  Anaconda finishes the mkfs, but fails the mount.  The reason is likely to be that the cache=none option is specified, and qemu-kvm is attempting to perform 512 byte O_DIRECT I/O to the backing file.

In such instances, it would be desirable for the virt-manager to detect the 4k device and pass the proper options to qemu-kvm to align the I/O.

Version-Release number of selected component (if applicable):
virt-manager-0.8.6-4.el6.noarch

How reproducible:
100%

Steps to Reproduce:
See description

Comment 2 chellwig@redhat.com 2011-07-15 22:00:09 UTC
You need to specify the logical_sector_size=4096 attribute for images that sit on 4k devices on the qemu command line.  Without that you can't do I/O due to the direct I/O alignment restrictios.

Comment 3 chellwig@redhat.com 2011-07-15 22:01:09 UTC
Note that this has huge implications for migration scenarios, and I don't think we have fully hashed out our story for it yet.

Comment 4 Mike Snitzer 2011-07-17 12:51:45 UTC
This BZ
1) is really just a real-world bug of the more generic (ignored) bug I reported against RHEV long ago: bug#597402
2) reinforces that we cannot ignore this problem any more.

Any layer that starts qemu-kvm needs to pass the appropriate virtio attribute(s) on the command line so that a guest with topology support (e.g. RHEL6) can automatically pick up the storage's I/O topology limits (logical_block_size, etc).

Comment 5 Cole Robinson 2011-07-19 15:21:58 UTC
Maybe we should have qemu once and for all learn to set a sensible default cache mode depending on the backing storage, and have our tools not forcibly set any cache mode by default.

Solving this in all the tools is fragile and a waste of time. Since options like cache=none and aio=native have multiple caveats, qemu is the only app to authoritatively determine the optimal cache option, and qemu developers are most knowledgeable to identify the caveats and work around them if more are discovered.

Provided storage sector size is actually detectable, options here are:

1) Expose that logical_sector_size option in the libvirt XML. Leave it up to virt-manager + RHEV to detect the storage sector size and specify the needed value when creating the libvirt XML. Not sure how RHEV or virt-manager gets that info from storage on a remote host?

2) Have libvirt detect the storage sector size and specify the needed logical_sector_size on the qemu command line.

3) Same as 2, but do it in qemu if no logical_sector_size was passed on the cli.

From my naive view 3 sounds optimal. Reassigning to qemu

Comment 6 chellwig@redhat.com 2011-07-19 15:36:27 UTC
No, it's not a qemu bug.  Changing the logical sector size is a guest visible ABI and needs to be choses by the management tools for the initial run and can't be changed later.

Comment 7 Mike Snitzer 2011-07-19 15:44:54 UTC
(In reply to comment #6)
> No, it's not a qemu bug.  Changing the logical sector size is a guest visible
> ABI and needs to be choses by the management tools for the initial run and
> can't be changed later.

Um, all well and good that it isn't a qemu bug but... it certainly is a bug that the management tools should take on.

Closing this CANTFIX is just odd...

Comment 8 Cole Robinson 2011-07-19 15:55:21 UTC
(In reply to comment #6)
> No, it's not a qemu bug.  Changing the logical sector size is a guest visible
> ABI and needs to be choses by the management tools for the initial run and
> can't be changed later.

Then for -M RHEL6.1 and earlier, don't auto set a sector size. For -M RHEL6.2 and later, detect the sector size. Guests created with libvirt on 6.0 or 6.1 can use the newer qemu binary. Certainly that can cause issues with migration if the backing storage changes sector size, but as you said migration is a known issue here.

Even if this is handled in the tools, qemu should detect sector size and explicitly disallow cache=none for 4K sectors. Or better yet, print to stderr and gracefully fallback to a working cache mode. Reopening

Comment 9 Mike Snitzer 2011-07-19 17:00:29 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > No, it's not a qemu bug.  Changing the logical sector size is a guest visible
> > ABI and needs to be choses by the management tools for the initial run and
> > can't be changed later.
> 
> Then for -M RHEL6.1 and earlier, don't auto set a sector size. For -M RHEL6.2
> and later, detect the sector size.

I'm not sure of all the virt particulars of distinguishing between 6.1 and 6.2 but:

RHEL 6.1 has all the needed kernel support to be able to propagate the logical_block_size up the stack and then allow the virt management tools to detect it.

Comment 10 chellwig@redhat.com 2011-07-19 17:07:35 UTC
(In reply to comment #8)
> (In reply to comment #6)
> > No, it's not a qemu bug.  Changing the logical sector size is a guest visible
> > ABI and needs to be choses by the management tools for the initial run and
> > can't be changed later.
> 
> Then for -M RHEL6.1 and earlier, don't auto set a sector size. For -M RHEL6.2
> and later, detect the sector size. Guests created with libvirt on 6.0 or 6.1
> can use the newer qemu binary. Certainly that can cause issues with migration
> if the backing storage changes sector size, but as you said migration is a
> known issue here.

It has nothing to do with the machine.  An image with logical_sector_size=512 vs one with logical_sector_size4096 is a huge visible difference to the guest.  I basically means you can't use it as file systems rely on the block size.  E.g. if you do that to your root fs it simply won't boot.

The sector_size needs to be set once when creating an image, and never changed after that.
 
> Even if this is handled in the tools, qemu should detect sector size and
> explicitly disallow cache=none for 4K sectors. Or better yet, print to stderr
> and gracefully fallback to a working cache mode. Reopening

cache=none works perfectly fine with 4k sectors.  The only thing you must not do is switching between different ones.

Comment 11 Cole Robinson 2011-07-19 19:06:56 UTC
(In reply to comment #9)
> 
> I'm not sure of all the virt particulars of distinguishing between 6.1 and 6.2
> but:
> 
> RHEL 6.1 has all the needed kernel support to be able to propagate the
> logical_block_size up the stack and then allow the virt management tools to
> detect it.

The qemu -machine value isn't about the capabilities of RHEL6.0 vs 6.1 vs 6.2, those values are just a way for management tools to maintain a stable guest hw arrangement when a qemu update changes hw defaults. -M rhel6.0 means "use the defaults qemu had in rhel6.0", so any guests created by libvirt on 6.0 don't see hardware moving around when booted on a newer qemu

Comment 12 Cole Robinson 2011-07-19 19:19:39 UTC
(In reply to comment #10)
> 
> It has nothing to do with the machine.  An image with logical_sector_size=512
> vs one with logical_sector_size4096 is a huge visible difference to the guest. 
> I basically means you can't use it as file systems rely on the block size. 
> E.g. if you do that to your root fs it simply won't boot.
> 

Thanks, I understand now why qemu can't sync this value.

> The sector_size needs to be set once when creating an image, and never changed
> after that.
> 
> > Even if this is handled in the tools, qemu should detect sector size and
> > explicitly disallow cache=none for 4K sectors. Or better yet, print to stderr
> > and gracefully fallback to a working cache mode. Reopening
> 
> cache=none works perfectly fine with 4k sectors.  The only thing you must not
> do is switching between different ones.

Sorry, I was confused. but it sounds like qemu could still detect the situation you mention in comment #2

> You need to specify the logical_sector_size=4096 attribute for images that sit
> on 4k devices on the qemu command line.  Without that you can't do I/O due to
> the direct I/O alignment restrictios.

So explicitly error or fallback to another cache mode if cache == None and host sector size != logical sector size.

Comment 13 chellwig@redhat.com 2011-07-22 15:04:22 UTC
(In reply to comment #12)
> > You need to specify the logical_sector_size=4096 attribute for images that sit
> > on 4k devices on the qemu command line.  Without that you can't do I/O due to
> > the direct I/O alignment restrictios.
> 
> So explicitly error or fallback to another cache mode if cache == None and host
> sector size != logical sector size.

Unfortunately qemu has no way to remember what sector size a guest expects from a given image.
We could store it with qcow2 images, which might make some sense as a safety net, but neither
raw images which are a plain pass-through nor foreign images would be able to make use of it.

I'd also argue that fallback is something that we should leave to the management tools as an
absolute last resort, and only after telling users about it given that it has huge performance implications.

Comment 17 Paolo Bonzini 2011-12-12 08:50:21 UTC
It's a dup of bug 748906.  The info in the two bugs is overlapping.  Since the other is assigned to me I'm closing this one.

*** This bug has been marked as a duplicate of bug 748906 ***