Bug 1585798

Summary: [RFE] A single default (QEMU) machine type won't work for all guest images
Product: Red Hat OpenStack Reporter: Eduardo Habkost <ehabkost>
Component: openstack-novaAssignee: OSP DFG:Compute <osp-dfg-compute>
Status: CLOSED WONTFIX QA Contact: OSP DFG:Compute <osp-dfg-compute>
Severity: medium Docs Contact:
Priority: medium    
Version: unspecifiedCC: dasmith, egallen, eglynn, fjin, fj-lsoft-ofuku, jhakimra, kchamart, lyarwood, mbooth, mst, rjones, sbauza, sgordon, srevivo, vromanso
Target Milestone: ---Keywords: FutureFeature, Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-29 09:35:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Eduardo Habkost 2018-06-04 18:52:25 UTC
Description of problem:

1) When importing preexisting guest images that expect a "pc" machine, OpenStack can't use "q35" by default.

2) When importing new guest images that that expect a "q35" machine, OpenStack can't use "pc" by default.

This means we can't assume a single default will be good for everybody.


Possible solutions:

There are multiple ways this can be addressed in the virtualization stack.  The ones that are being discussed are:

1) Doing nothing and require the user to manually select the machine-type explicitly in at least one of the cases above.

2) Encoding a recommended machine-type family inside the guest image file (e.g. using another container format for guest images, or enconding additional data on qcow2 images). See qemu-devel discussion: <https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg04494.html>;

3) Choosing a smarter default based on analysis of the guest image (e.g. using virt-inspector and a database similar to libosinfo).

The implementation of these solutions will necessarily involve multiple components and require additional BZs.  I'm creating this BZ so we can track the tasks related to the problem.

Comment 1 Kashyap Chamarthy 2018-06-07 15:56:28 UTC
For context, this bug from Eduardo builds on top of this other one he 
filed:

    https://bugzilla.redhat.com/show_bug.cgi?id=1581414#c9 -- OpenStack
    shouldn't break if the default machine-type in QEMU is "q35"  


I am following the 'qemu-devel' 15 KM-long thread ("storing machine data 
in qcow images").  And the emerging consensus appears[*] to be _not_
commit to a QCOW2 specification straight away, but to first come up with
VM description -- that can work with existing formats such as "tar".

[*] Based on your proposal here:
<https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg01223.html>.
And DanPB's response here:
<https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg01231.html>.
Quoting that thread below for convenience:
--------------------------------------------------------------------------------
On Wed, Jun 06, 2018 at 03:24:50PM +0100, Daniel P. Berrangé wrote:
> On Wed, Jun 06, 2018 at 11:14:32AM -0300, Eduardo Habkost wrote:
> > On Wed, Jun 06, 2018 at 02:50:10PM +0100, Daniel P. Berrangé wrote:
> > > On Wed, Jun 06, 2018 at 03:45:10PM +0200, Michal Suchánek wrote:
> > > > 
> > > > I think that *if* we want an 'appliance' format that stores a whole VM
> > > > in a single file to ease VM distribution then the logical place to look
> > > > in qemu is qcow. The reason have been explained at length.
> > > 
> > > I rather disagree. This is a common problem beyond just QEMU and everyone
> > > just uses an existing archive format (TAR, ZIP) for bundling together
> > > one or more disk images, metdata for config, and whatever other resources
> > > are applicable for the vendor.  This works with any disk format (raw,
> > > qcow2, vmdk, vpc, etc) so is preferrable to inventing someting that is
> > > specific to qcow2 IMHO.
> > 
> > Now we have N+1 appliance file formats.  :)
> > 
> > (We like it or not, qcow2 is already used as an appliance format
> > for single-disk VMs in practice.)
> > 
> > But I agree this must not be specific to qcow2.  The same VM
> > description format we agree upon should work with other disk
> > formats or with multi-disk appliances.
> > 
> > If we specify a reasonable VM description format for appliances
> > and make it work inside (e.g.) tar files, we will still have the
> > option of allowing the description be placed inside qcow2 if we
> > really want to.  I don't think we need to finish this qcow2
> > bikeshedding exercise right now.
> 
> Yes, I think that is sensible, as once we actually try it out in real
> world cases, we might then find a tar/zip is sufficient after all and
> we don't need to do something extra for qcow2. Also means we can do
> experiments without committing to a qcow2 format spec change right
> away.
--------------------------------------------------------------------------------

Comment 2 Kashyap Chamarthy 2018-08-28 17:12:16 UTC
[Based on an IRC discussion with  Dan Berrangé, and Matt Booth.]

The preferred order in which to select appropriate machine type for Nova 
instances:

(1) Use the Nova metadata property: 'hw_machine_type' to set the machine
    type on the guest. 

(2) Ask libosinfo, and pick q35 if it says guest can do both 'pc' or
    'q35'

(3) Use 'q35' (this doesn't necessarily need code changes, and it can be
    forced via nova.conf if desired).

Related info
------------

(a) Note that the upstream libvirt completely ignore QEMU's default:

        https://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=26cfb1a
        "qemu: ensure default machine types don't change if QEMU
        changes"

    Where the commit message says:

        [...] 
        "Libvirt promises to isolate applications from hypervisor
        changes that may cause incompatibilities, so we must ensure that
        we always use the "pc" machine type if it is available. Only use
        QEMU's own reported default machine type if "pc" does not exist.

        "This issue is not x86-only, other arches are liable to change
        their default machine, while some arches don't report any
        default at all causing libvirt to pick the first machine in the
        list. Thus to guarantee stability to applications, declare a
        preferred default machine for all architectures we currently
        support with QEMU."
        [...]

(b) Dan Berrangé writes: libosinfo would only ever report 'q35' if we
    knew it to work -- so for example, we wouldn't report 'q35' support
    for RHEL6.  I bet it would impact NFV folks in particular, because
    they need to figure out which device to use from the role device
    tagging metadata Nova exposes, and with q35 the PCI topology they
    need to traverse is totally different.

Comment 3 Lee Yarwood 2018-09-04 11:39:16 UTC
*** Bug 1340726 has been marked as a duplicate of this bug. ***