Bug 2076013

Summary:

RHEL9.1 guest can't boot into OS after v2v conversion

Product:

Red Hat Enterprise Linux 9

Reporter:

mxie <mxie>

Component:

virt-v2v

Assignee:

Laszlo Ersek <lersek>

Status:

CLOSED ERRATA

QA Contact:

Vera <vwu>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

9.1

CC:

ahadas, berrange, chhu, ddutile, fweimer, hongzliu, juzhou, kchamart, kkiwi, lersek, lrotenbe, rjones, tyan, tzheng, vtosodec, vwu, xiaodwan

Target Milestone:

Target Release:

---

Hardware:

x86_64

OS:

Unspecified

Whiteboard:

Fixed In Version:

virt-v2v-2.0.7-1.el9

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2022-11-15 09:56:05 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
rhel9.1-guest-cannot-boot-into-os-after-v2v-conversion.png	none

Description mxie@redhat.com 2022-04-17 02:43:46 UTC

Created attachment 1873040 [details]
rhel9.1-guest-cannot-boot-into-os-after-v2v-conversion.png

Description of problem:
RHEL9.1 guest can't boot into OS after v2v conversion

Version-Release number of selected component (if applicable):
virt-v2v-2.0.3-1.el9.x86_64
libguestfs-1.48.1-1.el9.x86_64
guestfs-tools-1.48.0-1.el9.x86_64
libvirt-libs-8.2.0-1.el9.x86_64
qemu-img-6.2.0-13.el9.x86_64
nbdkit-server-1.30.2-1.el9.x86_64
libnbd-1.12.2-1.el9.x86_64


How reproducible:
100%

Steps to Reproduce:
1. Convert a rhel9.1 guest from VMware to local libvirt by v2v
#virt-v2v -ic vpx://root.198.169/data/10.73.199.217/?no_verify=1 -it vddk -io vddk-libdir=/home/vddk7.0.3 -io  vddk-thumbprint=B5:52:1F:B4:21:09:45:24:51:32:56:F6:63:6A:93:5D:54:08:2D:78  esx7.0-rhel9.1-x86_64 -ip /home/passwd 
[   0.0] Setting up the source: -i libvirt -ic vpx://root.198.169/data/10.73.199.217/?no_verify=1 -it vddk esx7.0-rhel9.1-x86_64
[   1.9] Opening the source
[   7.6] Inspecting the source
[  14.4] Checking for sufficient free disk space in the guest
[  14.4] Converting Red Hat Enterprise Linux 9.1 Beta (Plow) to run on KVM
virt-v2v: This guest has virtio drivers installed.
[ 131.1] Mapping filesystem data to avoid copying unused and blank areas
[ 132.0] Closing the overlay
[ 132.3] Assigning disks to buses
[ 132.3] Checking if the guest needs BIOS or UEFI to boot
[ 132.3] Setting up the destination: -o libvirt
[ 134.0] Copying disk 1/1
█ 100% [****************************************]
[ 274.0] Creating output metadata
[ 274.1] Finishing off

2.Power on guest after v2v conversion, found guest can't boot into OS because of kernel panic error, pls refer to screenshot'rhel9.1-guest-cannot-boot-into-os-after-v2v-conversion.png'

Actual results:
As above description

Expected results:
RHEL9.1 guest can boot into OS after v2v conversion

Additional info:

Comment 2 Laszlo Ersek 2022-04-17 07:57:15 UTC

(CC Daniel)

Hi mxie,

my immediate suspicion is that this occurs because, when moving the guest from vmware to local libvirt, the VCPU flags (features) exposed to the guest change.

This is a difficult problem; for example in normal QEMU/KVM -> QEMU/KVM migrations, a lot of care is taken to make sure the VCPU flags never change. Such a VCPU model is exposed to the guest on the source QEMU/KVM host that the destination QEMU/KVM host can also support.

A similar case described here: <https://access.redhat.com/solutions/6833751>. That case is about Hyper-V, but the issue is the same. Related RHBZ: bug 2058770.

Thus far, virt-v2v does not care about VCPU models at all -- and in fact, it would be incredibly hard to deal with specific VCPU models, as in QEMU and libvirt, an entire infrastructure exists just for this. With QEMU/KVM -> QEMU/KVM migrations, it is at least technically possible to find a "least common denominator" VCPU model across a cluster of hosts (so that guests can then be freely migrated between hosts). But when converting between different hypervisors, I don't think there's any way to automate this.

Note that we've seen a similar issue with virt-v2v before: please refer to <https://bugzilla.redhat.com/show_bug.cgi?id=1658126#c21>.

Can you please try the following: when the conversion completes, but *before* launching the converted domain, please set the VCPU model to "host-model". References:
- https://libvirt.org/formatdomain.html#cpu-model-and-topology
- https://gitlab.com/qemu-project/qemu/-/blob/master/docs/system/cpu-models-x86.rst.inc

... I think the only possible improvement for virt-v2v here is to just explicitly hardwire the "host-model" VCPU model in the conversion output (at least in the QEMU and libvirt output modules). Because, that will expose a hardware CPU-like VCPU to the guest on the destination hypervisor, so the guest should have no reason to complain about an inconsistent set of VCPU features. Without this, the guest currently sees the "qemu64" VCPU model, which is the worst possible choice in a sense, AIUI. (You can read about "qemu64" in the above-referenced qemu documentation.) To recap, precisely the "qemu64" model caused the issue described in <https://bugzilla.redhat.com/show_bug.cgi?id=1658126#c21> -- the VCPU flags changed, so the guest needed a different set of modules, but those modules were never available before, nor explicitly installed by virt-v2v during conversion.

Comment 3 mxie@redhat.com 2022-04-18 03:32:33 UTC

(In reply to Laszlo Ersek from comment #2) 
> Can you please try the following: when the conversion completes, but
> *before* launching the converted domain, please set the VCPU model to
> "host-model". 

Yes, rhel9.1 guest which is converted by v2v can boot into OS after setting cpu mode as 'host-model'

BTW, rhel9.1 guest also can't boot into OS after converting to rhv by v2v but rhel9.0 guest which is converted by v2v can boot into OS normally when cpu mode is not 'host-model'

Comment 4 Laszlo Ersek 2022-04-18 10:56:49 UTC

Arik, Liran, how do you determine the VCPU model for imported guests? Thanks.

Comment 5 Arik 2022-04-18 11:17:41 UTC

It is generally determined by the CPU that the cluster is set with
There are two exceptions though:
1. When the VM is set with CPU passthrough
2. When the VM is set with a custom CPU (in the OVF it appears as 'CustomCpuName')

Comment 6 Laszlo Ersek 2022-04-18 12:27:17 UTC

Thanks.

In this case, the source domain information comes from VMWare via libvirt (see near the top of the conversion log in comment 1):

<domain type='vmware' xmlns:vmware='http://libvirt.org/schemas/domain/vmware/1.0'>
  <name>esx7.0-rhel9.1-x86_64</name>
  <uuid>420315b4-3f4e-b15e-808e-6f775115fb88</uuid>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>1</vcpu>
  <cputune>
    <shares>1000</shares>
  </cputune>
  <os>
    <type arch='x86_64'>hvm</type>
  </os>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <devices>
    <disk type='file' device='disk'>
      <source file='[esx7.0-matrix] esx7.0-rhel9.1-x86_64/esx7.0-rhel9.1-x86_64.vmdk'/>
      <target dev='sda' bus='scsi'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='scsi' index='0' model='vmpvscsi'/>
    <interface type='bridge'>
      <mac address='00:50:56:83:8e:f6' type='generated'/>
      <source bridge='VM Network'/>
      <model type='vmxnet3'/>
    </interface>
    <video>
      <model type='vmvga' vram='8192' primary='yes'/>
    </video>
  </devices>
  <vmware:datacenterpath>data</vmware:datacenterpath>
  <vmware:moref>vm-7011</vmware:moref>
</domain>

this domain XML has no <cpu> element, therefore the following definitions in "input/parse_libvirt_xml.ml":

  let cpu_vendor = xpath_string "/domain/cpu/vendor/text()" in
  let cpu_model = xpath_string "/domain/cpu/model/text()" in
  let cpu_sockets = xpath_int "/domain/cpu/topology/@sockets" in
  let cpu_cores = xpath_int "/domain/cpu/topology/@cores" in
  let cpu_threads = xpath_int "/domain/cpu/topology/@threads" in

all result in None values; carried over as None values into:

    s_cpu_vendor = cpu_vendor;
    s_cpu_model = cpu_model;
    s_cpu_topology = cpu_topology;

In turn, in the "output/create_libvirt_xml.ml" output module, we don't produce the /domain/cpu element either, because that would depend on:

  if source.s_cpu_vendor <> None || source.s_cpu_model <> None ||
     source.s_cpu_topology <> None then (

And in the rhv output module ("lib/create_ovf.ml"):

      (match source.s_cpu_model with
        | None -> ()
        | Some model ->
           List.push_back content_subnodes (e "CustomCpuName" [] [PCData model])
      );

So I guess what we need to do is:

(1) cover the VCPU model in the QEMU output in the first place (it is not covered at all, AFAICT)

(2) in all outputs where we currently don't specify a VCPU model at all upon seeing "None" for the source VCPU model, we should specify

    <cpu mode='host-passthrough'/>

(NB previously I suggested "host-model", but I've revised that, based on <https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level>.)


Arik, another question please: how do we express "CPU passthrough" in the OVF? Thank you.

Comment 8 Arik 2022-04-18 12:43:38 UTC

(In reply to Laszlo Ersek from comment #6)
> Arik, another question please: how do we express "CPU passthrough" in the
> OVF? Thank you.

This can be set using the 'UseHostCpu' tag that includes a boolean value
But I wouldn't go this way - the idea behind setting the VM's CPU type according to the cluster's setting is to ensure the cluster serves as a "migration domain"
If you set the imported VM with CPU passthrough then if the VM starts on a host with newer CPU than that of the other hosts in the cluster, it won't be able to migrate elsewhere

Comment 9 Laszlo Ersek 2022-04-18 13:44:24 UTC

OK, but then we have a deeper problem (showcased by the OVF / RHV output mode): assuming the source hypervisor does not present a VCPU model, and so virt-v2v just don't say anything about the VCPU model in the OVF output (per your comment 5), then the domain, once uploaded to RHV, will inherit the cluster-level VCPU model -- but then *THAT* VCPU model (namely, the cluster-level VCPU model) is unsuitable for booting the RHEL-9.1 kernel!

If we run with this approach, then the best we can do here is another article on access.redhat.com, saying "if your converted RHEL-9.1 guest crashes during boot, please manually adjust the VCPU model in the libvirt domain XML *or* the RHV cluster settings, according to <https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level>".

Comment 10 Don Dutile (Red Hat) 2022-04-19 01:53:05 UTC

(In reply to Laszlo Ersek from comment #9)
> OK, but then we have a deeper problem (showcased by the OVF / RHV output
> mode): assuming the source hypervisor does not present a VCPU model, and so
> virt-v2v just don't say anything about the VCPU model in the OVF output (per
> your comment 5), then the domain, once uploaded to RHV, will inherit the
> cluster-level VCPU model -- but then *THAT* VCPU model (namely, the
> cluster-level VCPU model) is unsuitable for booting the RHEL-9.1 kernel!
> 
> If we run with this approach, then the best we can do here is another
> article on access.redhat.com, saying "if your converted RHEL-9.1 guest
> crashes during boot, please manually adjust the VCPU model in the libvirt
> domain XML *or* the RHV cluster settings, according to
> <https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-
> linux-9-for-the-x86-64-v2-microarchitecture-level>".

The above link does not tell a v2v-user what/how to adjust the vcpu model.
I'd suggest another link that has well-known, simple/common denominator for rhel9-x86-64.
The above link sends the reader fetching rocks for 'microarchitecture-v2' ... 
pick the (min) features, give the reader what they need to boot.  
... looks like this situation gives a clear test case to verify against.

Comment 11 Laszlo Ersek 2022-04-19 10:19:59 UTC

Kashyap's explanation here:
http://lists.openstack.org/pipermail/openstack-discuss/2021-October/025500.html

This is better than my host-passthrough (and before that, host-model) idea, for two reasons:
- Nehalem is a very low-level model that will work on both AMD and Intel hosts, so it's great for migrateability in RHV,
- we don't need to introduce any new *format* in the output modules, it's just a new output *value* for the existing structures.

Comment 12 Laszlo Ersek 2022-04-19 10:58:32 UTC

Victor,

is there a way to express "minimum CPU model requirement" in libosinfo?

Thanks!
Laszlo

Comment 13 Daniel Berrangé 2022-04-19 11:09:38 UTC

> 
> this domain XML has no <cpu> element, therefore the following definitions in "input/parse_libvirt_xml.ml":
>
>  let cpu_vendor = xpath_string "/domain/cpu/vendor/text()" in
>  let cpu_model = xpath_string "/domain/cpu/model/text()" in
>  let cpu_sockets = xpath_int "/domain/cpu/topology/@sockets" in
>  let cpu_cores = xpath_int "/domain/cpu/topology/@cores" in
>  let cpu_threads = xpath_int "/domain/cpu/topology/@threads" in
>
> all result in None values; carried over as None values into:
>
>    s_cpu_vendor = cpu_vendor;
>    s_cpu_model = cpu_model;
>    s_cpu_topology = cpu_topology;
>
> In turn, in the "output/create_libvirt_xml.ml" output module, we don't produce the /domain/cpu element either, because that would depend on:
>
>  if source.s_cpu_vendor <> None || source.s_cpu_model <> None ||
>     source.s_cpu_topology <> None then (

This is really undesirable. It is essentially always wrong to use a  'None' CPU model, because that gets you something really awful (qemu64).

If there's no existing CPU model you can carry over, then there needs to be some sensible default. I'd suggest 'host-model' is best choice because that is live migratable while still exposing as many features as possible.

(In reply to Laszlo Ersek from comment #12)
> Victor,
> 
> is there a way to express "minimum CPU model requirement" in libosinfo?

There's no CPU model concept in libosinfo. What RHEL-9 really wants is an 'x86_64-v2' ABI, of which there are many suitable CPUs that match. 

But really, I think it is better to ignore the guest OS, and pick 'host-model' unconditionally.

Comment 14 Laszlo Ersek 2022-04-19 13:06:43 UTC

Yes, host-model was my first thought (comment 2). However...

(1) "host-model" seems inexpressible in the OVF for ovirt-engine.
(Comment 5.

My (very superficial) browsing of the ovirt-engine source @ 00a89aae3bbb
suggests that we could perhaps pass "hostModel" in CustomCpuName, but
I'm unsure.

(2) "host-model" is inexpressible for the QEMU cmdline output.
"host-model" is really some special libvirtd sauce, according to both
<https://gitlab.com/qemu-project/qemu/-/blob/master/docs/system/cpu-models-x86.rst.inc>
and <https://libvirt.org/formatdomain.html>.

(3) The documentation at <https://libvirt.org/formatdomain.html> says

(a)

> During migration, complete CPU model definition is transferred to the
> destination host so the migrated guest will see exactly the same CPU
> model for the running instance of the guest, even if the destination
> host contains more capable CPUs or newer kernel;

(b)

> but shutting down and restarting the guest may present different
> hardware to the guest according to the capabilities of the new host.

I think (b) is a problem for RHV.

Furthermore, regarding (a), it seems to remain possible that the
destination host, while satisfying the "cluster VCPU" requirements,
can't provide all the individual, enumerated CPU features that originate
from the source host. Paragraph (a) discusses a *more capable*
destination host, but in RHV (AIUI) the destination host could be *less*
capable, as long as it satisfies the "cluster VCPU" baseline.

I'm concerned about using even Nehalem if we do that without regard to
the guest OS. Nehalem is from 2008 (according to the QEMU docs), but
virt-v2v specifically considers pre-2007 OSes. qemu64 has been a "very
compatible" default thus far, so using Nehalem only with RHEL-9.1 seems
the less intrusive change... I cannot regression-test tens of old OSes
on a Nehalem VCPU.

I think I'll attempt to introduce a new guest capability called
"gcaps_default_cpu_model", set it to None by default, and set it to
(Some "Nehalem") for RHEL >= 9.1. Then in the output modules, if the
source does not provide a CPU model, honor gcaps_default_cpu_model (if
any). I'm really worried about regressions.

Thanks!

Comment 16 Daniel Berrangé 2022-04-19 14:24:51 UTC

(In reply to Laszlo Ersek from comment #14)
> Yes, host-model was my first thought (comment 2). However...
> 
> (1) "host-model" seems inexpressible in the OVF for ovirt-engine.
> (Comment 5.
> 
> My (very superficial) browsing of the ovirt-engine source @ 00a89aae3bbb
> suggests that we could perhaps pass "hostModel" in CustomCpuName, but
> I'm unsure.

I'd be surprised as 'host-model' maps to different XML in libvirt
than a CPU model name does.

> (2) "host-model" is inexpressible for the QEMU cmdline output.
> "host-model" is really some special libvirtd sauce, according to both
> <https://gitlab.com/qemu-project/qemu/-/blob/master/docs/system/cpu-models-
> x86.rst.inc>
> and <https://libvirt.org/formatdomain.html>.

Yep, it just expands to an alias of a QEMU model name + flags, so
didn't need representing at the QEMU level directly.

> (3) The documentation at <https://libvirt.org/formatdomain.html> says
> 
> (a)
> 
> > During migration, complete CPU model definition is transferred to the
> > destination host so the migrated guest will see exactly the same CPU
> > model for the running instance of the guest, even if the destination
> > host contains more capable CPUs or newer kernel;
> 
> (b)
> 
> > but shutting down and restarting the guest may present different
> > hardware to the guest according to the capabilities of the new host.
> 
> I think (b) is a problem for RHV.

It is basically saying that the CPU model is only fixed for the duration of the cold boot attempt - it is the same as host-passthrough in this respect, which potentially expands differently each time you boot the guest if hardware/software has changed since last boot.

> Furthermore, regarding (a), it seems to remain possible that the
> destination host, while satisfying the "cluster VCPU" requirements,
> can't provide all the individual, enumerated CPU features that originate
> from the source host. Paragraph (a) discusses a *more capable*
> destination host, but in RHV (AIUI) the destination host could be *less*
> capable, as long as it satisfies the "cluster VCPU" baseline.

Right, so from the RHV POV it is probably desirable to NOT specify any CPU at all. Then the guest should get created using the cluster VCPU baseline that's configured.

On re-reading I see the initial bug report here only raised a problem with libvirt, not RHV.

> I'm concerned about using even Nehalem if we do that without regard to
> the guest OS. Nehalem is from 2008 (according to the QEMU docs), but
> virt-v2v specifically considers pre-2007 OSes. qemu64 has been a "very
> compatible" default thus far, so using Nehalem only with RHEL-9.1 seems
> the less intrusive change... I cannot regression-test tens of old OSes
> on a Nehalem VCPU.

Well x86 CPUs are backwards compatible. Existing software is generally expected "just work" on any new CPU model, as existing features don't change & you have to opt-in to new features explicitly at compile time.
There's always a risk of bugs, but I'd be surprised if it was a problem in general, since otherwise the original VM would likely have an explicit CPU model set in the VMX config file.

> I think I'll attempt to introduce a new guest capability called
> "gcaps_default_cpu_model", set it to None by default, and set it to
> (Some "Nehalem") for RHEL >= 9.1. Then in the output modules, if the
> source does not provide a CPU model, honor gcaps_default_cpu_model (if
> any). I'm really worried about regressions.

Note, Nehalem as a baseline may change again in RHEL >= 10, so isn't entirely future proof over the long term.

Does v2v make any statement around live migration support of its output ? 

I suggested 'host-model' as its live migration friendly.

In the absence of any CPU model listed in the src guest, it could be reasonable to argue that the user has not expressed a preference around live migration. Thus it could be possible to just decide on 'host-passthrough'  when no CPU is given. This will live migrate, but only if the src + dst are 100% identical.  If user really wants to mix/match hosts, then could re-configure the CPU after v2v is done.

Comment 17 Laszlo Ersek 2022-04-19 15:05:38 UTC

(In reply to Daniel Berrangé from comment #16)
> (In reply to Laszlo Ersek from comment #14)

> > My (very superficial) browsing of the ovirt-engine source @
> > 00a89aae3bbb suggests that we could perhaps pass "hostModel" in
> > CustomCpuName, but I'm unsure.
>
> I'd be surprised as 'host-model' maps to different XML in libvirt
> than a CPU model name does.

(on a tangent)

ovirt-engine creates a different node structure. Quoting
"backend/manager/modules/vdsbroker/src/main/java/org/ovirt/engine/core/vdsbroker/builder/vminfo/LibvirtVmXmlBuilder.java":

>     void writeCpu(boolean addVmNumaNodes) {
>         writer.writeStartElement("cpu");
>
>         String cpuType = vm.getCpuName();
>         if (vm.isUseHostCpuFlags()){
>             cpuType = "hostPassthrough";
>         }
>         if (vm.getUseTscFrequency() && tscFrequencySupplier.get() != null) {
>             cpuType += ",+invtsc";
>         }
>
>         // Work around for https://bugzilla.redhat.com/1689362
>         // If it is a nested VM on AMD EPYC, monitor feature must be
>         // disabled manually, until libvirt is fixed.
>         if (cpuModelSupplier.get() != null && cpuModelSupplier.get().contains("AMD EPYC") &&
>                 cpuFlagsSupplier.get() != null && !cpuFlagsSupplier.get().contains("monitor")) {
>             cpuType += ",-monitor";
>         }
>
>         String cpuFlagsProperty = vmCustomProperties.get("extra_cpu_flags");
>         if (StringUtils.isNotEmpty(cpuFlagsProperty)) {
>             cpuType += "," + cpuFlagsProperty;
>         }
>
>         String[] typeAndFlags = cpuType.split(",");
>
>         switch(vm.getClusterArch().getFamily()) {
>         case x86:
>         case s390x:
>             writer.writeAttributeString("match", "exact");
>
>             // is this a list of strings??..
>             switch(typeAndFlags[0]) {
>             case "hostPassthrough":
>                 writer.writeAttributeString("mode", "host-passthrough");
>                 writeCpuFlags(typeAndFlags);
>                 break;
>             case "hostModel":
>                 writer.writeAttributeString("mode", "host-model");
>                 writeCpuFlags(typeAndFlags);
>                 break;
>             default:
>                 writer.writeElement("model", typeAndFlags[0]);
>                 writeCpuFlags(typeAndFlags);
>                 break;
>             }
>             break;

We initialize "cpuType" from vm.getCpuName(), which is where *I assume*
we could pass in "hostModel" through the OVF. Provided we don't set the
"UseHostCpu" flag (see comment 8), "cpuType" does not get overwritten,
just (optionally) expanded. The we split "cpuType" into the
"typeAndFlags" array, and so if "hostModel" survived from the start of
the function, we'll match the branch where we output the "mode"
attribute, and not the "model" child element.

> Right, so from the RHV POV it is probably desirable to NOT specify any
> CPU at all. Then the guest should get created using the cluster VCPU
> baseline that's configured.
>
> On re-reading I see the initial bug report here only raised a problem
> with libvirt, not RHV.

Perhaps mxie didn't test this for the RHV output... I'm not sure.

Anyway, this is a great point! RHV (and I assume OpenStack too?) always
have a default to fall back to, while that's not available with the QEMU
and libvirt outputs.

> > I'm concerned about using even Nehalem if we do that without regard
> > to the guest OS. Nehalem is from 2008 (according to the QEMU docs),
> > but virt-v2v specifically considers pre-2007 OSes. qemu64 has been a
> > "very compatible" default thus far, so using Nehalem only with
> > RHEL-9.1 seems the less intrusive change... I cannot regression-test
> > tens of old OSes on a Nehalem VCPU.
>
> Well x86 CPUs are backwards compatible. Existing software is generally
> expected "just work" on any new CPU model, as existing features don't
> change & you have to opt-in to new features explicitly at compile
> time.

The case I have in mind is from my work on SMM enablement in OVMF. For
32-bit guests, we had to disable "nx" explicitly; otherwise things would
break hard. So the "OvmfPkg/README" file says, for 32-bit guests:

  $ qemu-system-i386 -cpu coreduo,-nx \

> There's always a risk of bugs, but I'd be surprised if it was a
> problem in general, since otherwise the original VM would likely have
> an explicit CPU model set in the VMX config file.

I'm unsure if we can trust the VMX file on this, as in the absence of an
explicit VCPU model, ESXi could pick whatever model the VMWare
developers thought sensible (similarly how the QEMU developers decided
"qemu64" was sensible).

> > I think I'll attempt to introduce a new guest capability called
> > "gcaps_default_cpu_model", set it to None by default, and set it to
> > (Some "Nehalem") for RHEL >= 9.1. Then in the output modules, if the
> > source does not provide a CPU model, honor gcaps_default_cpu_model
> > (if any). I'm really worried about regressions.
>
> Note, Nehalem as a baseline may change again in RHEL >= 10, so isn't
> entirely future proof over the long term.

Yes, that's expected. (Also, from the virt-devel discussion, I really
need to make this RHEL>=9.0, not 9.1.)

> Does v2v make any statement around live migration support of its
> output ?

This is the big question indeed.

And, recalling Rich's earlier explanation, I *think* the answer is "no".
Virt-v2v is supposed to convert a guest from a foreign hypervisor well
enough that it can be started up on the destination (QEMU/KVM), enabling
the admin to further tweak the guest on the destination, from the inside
and the outside both, as necessary.

> I suggested 'host-model' as its live migration friendly.
>
> In the absence of any CPU model listed in the src guest, it could be
> reasonable to argue that the user has not expressed a preference
> around live migration. Thus it could be possible to just decide on
> 'host-passthrough' when no CPU is given. This will live migrate, but
> only if the src + dst are 100% identical.  If user really wants to
> mix/match hosts, then could re-configure the CPU after v2v is done.

OK. I'll do "host-passthrough" in the libvirt and QEMU outputs in case
the source does not specify a VCPU model, and the guest is RHEL-9.0+.

Thanks!

Comment 18 Laszlo Ersek 2022-04-20 16:11:53 UTC

*** Test guests:

(1) RHEL-8.5 guest on ESXi, with no VCPU model specification:

$ r-virt-v2v ./run virt-v2v \
      -ic 'esx://root@esxi/?no_verify=1' \
      -it vddk \
      -io vddk-libdir=$HOME/src/v2v/vddk-7.0.3/vmware-vix-disklib-distrib \
      -io vddk-thumbprint=B2:95:32:52:6E:3A:12:7B:BB:E0:A0:D2:BA:1A:F2:7A:64:99:D5:13 \
      -ip $HOME/src/v2v/esxi.passwd \
      --print-source \
      rhel8-without-vcpu-model

>     source name: rhel8-without-vcpu-model
> hypervisor type: vmware
>        VM genid:
>          memory: 2147483648 (bytes)
>        nr vCPUs: 1
>      CPU vendor:
>       CPU model:
>    CPU topology:
>    CPU features:
>        firmware: unknown
>         display:
>           sound:
> disks:
>         0 [ide]
> removable media:
>         CD-ROM [sata] in slot 0
> NICs:
>         Bridge "VM Network" mac: 00:0c:29:d4:02:3c [vmxnet3]

(2) RHEL-9.1 Beta guest on ESXi, with no VCPU model specification:

$ r-virt-v2v ./run virt-v2v \
      -ic 'esx://root@esxi/?no_verify=1' \
      -it vddk \
      -io vddk-libdir=$HOME/src/v2v/vddk-7.0.3/vmware-vix-disklib-distrib \
      -io vddk-thumbprint=B2:95:32:52:6E:3A:12:7B:BB:E0:A0:D2:BA:1A:F2:7A:64:99:D5:13 \
      -ip $HOME/src/v2v/esxi.passwd \
      --print-source \
      rhel9-without-vcpu-model

>     source name: rhel9-without-vcpu-model
> hypervisor type: vmware
>        VM genid:
>          memory: 2147483648 (bytes)
>        nr vCPUs: 1
>      CPU vendor:
>       CPU model:
>    CPU topology:
>    CPU features:
>        firmware: unknown
>         display:
>           sound:
> disks:
>         0 [ide]
> removable media:
>         CD-ROM [sata] in slot 0
> NICs:
>         Bridge "VM Network" mac: 00:0c:29:35:e4:17 [vmxnet3]

(3) Exact same RHEL-8.5 guest disk image on libvirt, with VCPU model
specified:

$ r-virt-v2v ./run virt-v2v \
      --print-source \
      rhel8-with-vcpu-model

>     source name: rhel8-with-vcpu-model
> hypervisor type: kvm
>        VM genid:
>          memory: 1610612736 (bytes)
>        nr vCPUs: 1
>      CPU vendor:
>       CPU model: Skylake-Server-IBRS
>    CPU topology:
>    CPU features: acpi,apic,vmport
>        firmware: unknown
>         display: spice
>           sound: ich6
> disks:
>         0 [virtio-blk]
> removable media:
>
> NICs:
>         Bridge "virbr0" mac: 52:54:00:46:6b:dc [virtio]

(4) Exact same RHEL-9.1 Beta guest disk image on libvirt, with VCPU
model specified (topology specified too):

$ r-virt-v2v ./run virt-v2v \
      --print-source \
      rhel9-with-vcpu-model

>     source name: rhel9-with-vcpu-model
> hypervisor type: kvm
>        VM genid:
>          memory: 4294967296 (bytes)
>        nr vCPUs: 4
>      CPU vendor:
>       CPU model: Skylake-Server-IBRS
>    CPU topology: sockets: 1 cores/socket: 4 threads/core: 1
>    CPU features: acpi,apic,vmport
>        firmware: unknown
>         display: spice
>           sound: ich6
> disks:
>         0 [virtio-blk]
> removable media:
>         CD-ROM [ide] in slot 0
> NICs:
>         Bridge "virbr0" mac: 52:54:00:c8:45:42 [virtio]


*** Performing the actual tests is excruciating; there are 16
conversions to do:

- before applying the patches, convert all 4 guests to both libvirt and
  qemu (8 conversions)

- after applying the patches, convert all 4 guests to both libvirt and
  qemu (8 conversions)

- compare the QEMU shell scripts before vs. after

- compare the domain XMLs before vs. after

- boot the RHEL9 guest with the "after" QEMU script

- boot the RHEL9 guest with the "after" libvirt domain


*** Here's the QEMU script comparison, with comments:

> diff -r -u before/rhel8-with-vcpu-model/rhel8-with-vcpu-model.sh after/rhel8-with-vcpu-model/rhel8-with-vcpu-model.sh
> --- before/rhel8-with-vcpu-model/rhel8-with-vcpu-model.sh       2022-04-20 17:30:01.000000000 +0200
> +++ after/rhel8-with-vcpu-model/rhel8-with-vcpu-model.sh        2022-04-20 17:42:13.000000000 +0200
> @@ -6,7 +6,8 @@
>      -name rhel8-with-vcpu-model \
>      -machine q35,accel=kvm:tcg \
>      -m 1536 \
> -    -drive file=/var/tmp/before/rhel8-with-vcpu-model/rhel8-with-vcpu-model-sda,format=raw,if=none,id=drive-vblk-0,media=disk \
> +    -cpu Skylake-Server-IBRS \
> +    -drive file=/var/tmp/after/rhel8-with-vcpu-model/rhel8-with-vcpu-model-sda,format=raw,if=none,id=drive-vblk-0,media=disk \
>      -device virtio-blk-pci,drive=drive-vblk-0 \
>      -netdev user,id=net0 \
>      -device virtio-net-pci,netdev=net0,mac=52:54:00:46:6b:dc \

This is an independent fix that I've included in the series: even if the
source hypervisor specified a VCPU model, the QEMU output ignored it. So
this is intentional.

> @@ -18,5 +19,5 @@
>      -device virtio-rng-pci,rng=rng0 \
>      -device virtio-balloon \
>      -device pvpanic,ioport=0x505 \
> -    -device vhost-vsock-pci,guest-cid=141315 \
> +    -device vhost-vsock-pci,guest-cid=162151 \
>      -serial stdio

Irrelevant here (the CIDs are semi-randomly generated), good.

> diff -r -u before/rhel8-without-vcpu-model/rhel8-without-vcpu-model.sh after/rhel8-without-vcpu-model/rhel8-without-vcpu-model.sh
> --- before/rhel8-without-vcpu-model/rhel8-without-vcpu-model.sh 2022-04-20 17:27:38.000000000 +0200
> +++ after/rhel8-without-vcpu-model/rhel8-without-vcpu-model.sh  2022-04-20 17:40:41.000000000 +0200
> @@ -7,7 +7,7 @@
>      -machine q35,accel=kvm:tcg \
>      -m 2048 \
>      -device virtio-scsi-pci,id=scsi0 \
> -    -drive file=/var/tmp/before/rhel8-without-vcpu-model/rhel8-without-vcpu-model-sda,format=raw,if=none,id=drive-vblk-0,media=disk \
> +    -drive file=/var/tmp/after/rhel8-without-vcpu-model/rhel8-without-vcpu-model-sda,format=raw,if=none,id=drive-vblk-0,media=disk \
>      -device virtio-blk-pci,drive=drive-vblk-0 \
>      -drive if=none,id=drive-scsi-0,media=cdrom \
>      -device scsi-cd,bus=scsi0.0,lun=0,drive=drive-scsi-0 \
> @@ -17,5 +17,5 @@
>      -device virtio-rng-pci,rng=rng0 \
>      -device virtio-balloon \
>      -device pvpanic,ioport=0x505 \
> -    -device vhost-vsock-pci,guest-cid=140366 \
> +    -device vhost-vsock-pci,guest-cid=161663 \
>      -serial stdio

No relevant change, good.

> diff -r -u before/rhel9-with-vcpu-model/rhel9-with-vcpu-model.sh after/rhel9-with-vcpu-model/rhel9-with-vcpu-model.sh
> --- before/rhel9-with-vcpu-model/rhel9-with-vcpu-model.sh       2022-04-20 17:30:40.000000000 +0200
> +++ after/rhel9-with-vcpu-model/rhel9-with-vcpu-model.sh        2022-04-20 17:42:30.000000000 +0200
> @@ -6,8 +6,9 @@
>      -name rhel9-with-vcpu-model \
>      -machine q35,accel=kvm:tcg \
>      -m 4096 \
> +    -cpu Skylake-Server-IBRS \
>      -smp cpus=4,sockets=1,cores=4,threads=1 \
> -    -drive file=/var/tmp/before/rhel9-with-vcpu-model/rhel9-with-vcpu-model-sda,format=raw,if=none,id=drive-vblk-0,media=disk \
> +    -drive file=/var/tmp/after/rhel9-with-vcpu-model/rhel9-with-vcpu-model-sda,format=raw,if=none,id=drive-vblk-0,media=disk \
>      -device virtio-blk-pci,drive=drive-vblk-0 \
>      -drive if=none,id=drive-ide-0,media=cdrom \
>      -device ide-cd,bus=ide.0,unit=0,drive=drive-ide-0 \
> @@ -21,5 +22,5 @@
>      -device virtio-rng-pci,rng=rng0 \
>      -device virtio-balloon \
>      -device pvpanic,ioport=0x505 \
> -    -device vhost-vsock-pci,guest-cid=141635 \
> +    -device vhost-vsock-pci,guest-cid=162424 \
>      -serial stdio

Same comments as for "rhel8-with-vcpu-model", good.

> diff -r -u before/rhel9-without-vcpu-model/rhel9-without-vcpu-model.sh after/rhel9-without-vcpu-model/rhel9-without-vcpu-model.sh
> --- before/rhel9-without-vcpu-model/rhel9-without-vcpu-model.sh 2022-04-20 17:29:08.000000000 +0200
> +++ after/rhel9-without-vcpu-model/rhel9-without-vcpu-model.sh  2022-04-20 17:40:42.000000000 +0200
> @@ -6,8 +6,9 @@
>      -name rhel9-without-vcpu-model \
>      -machine q35,accel=kvm:tcg \
>      -m 2048 \
> +    -cpu host \
>      -device virtio-scsi-pci,id=scsi0 \
> -    -drive file=/var/tmp/before/rhel9-without-vcpu-model/rhel9-without-vcpu-model-sda,format=raw,if=none,id=drive-vblk-0,media=disk \
> +    -drive file=/var/tmp/after/rhel9-without-vcpu-model/rhel9-without-vcpu-model-sda,format=raw,if=none,id=drive-vblk-0,media=disk \
>      -device virtio-blk-pci,drive=drive-vblk-0 \
>      -drive if=none,id=drive-scsi-0,media=cdrom \
>      -device scsi-cd,bus=scsi0.0,lun=0,drive=drive-scsi-0 \
> @@ -17,5 +18,5 @@
>      -device virtio-rng-pci,rng=rng0 \
>      -device virtio-balloon \
>      -device pvpanic,ioport=0x505 \
> -    -device vhost-vsock-pci,guest-cid=140799 \
> +    -device vhost-vsock-pci,guest-cid=160036 \
>      -serial stdio

This is the actual fix; good.

I've booted all 4 "after" scripts; everything works fine.

I've also reproduced the RHEL9 guest boot failure with *both* of the
"before" scripts.


*** The libvirt domain XML comparison, with comments:

> --- before-converted-rhel8-with-vcpu-model.xml  2022-04-20 17:49:10.355563419 +0200
> +++ after-converted-rhel8-with-vcpu-model.xml   2022-04-20 17:49:08.158600356 +0200
> @@ -1,6 +1,6 @@
>  <domain type='kvm'>
> -  <name>before-converted-rhel8-with-vcpu-model</name>
> -  <uuid>fea19b8f-034b-44d2-a8c0-cf3a76a1510f</uuid>
> +  <name>after-converted-rhel8-with-vcpu-model</name>
> +  <uuid>68f74799-5fc2-4d93-8dc2-6c9558d3d6bf</uuid>
>    <metadata>
>      <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
>        <libosinfo:os id="http://redhat.com/rhel/8.5"/>
> @@ -28,7 +28,7 @@
>      <emulator>/usr/bin/qemu-system-x86_64</emulator>
>      <disk type='volume' device='disk'>
>        <driver name='qemu' type='raw'/>
> -      <source pool='default' volume='before-converted-rhel8-with-vcpu-model-sda'/>
> +      <source pool='default' volume='after-converted-rhel8-with-vcpu-model-sda'/>
>        <target dev='vda' bus='virtio'/>
>        <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
>      </disk>

No relevant change, good.

> --- before-converted-rhel8-without-vcpu-model.xml       2022-04-20 17:49:10.980552907 +0200
> +++ after-converted-rhel8-without-vcpu-model.xml        2022-04-20 17:49:08.757590282 +0200
> @@ -1,6 +1,6 @@
>  <domain type='kvm'>
> -  <name>before-converted-rhel8-without-vcpu-model</name>
> -  <uuid>0e8326d0-2626-4303-9ae9-c2d8b68fd4e5</uuid>
> +  <name>after-converted-rhel8-without-vcpu-model</name>
> +  <uuid>24f83081-dcdd-41f2-bb20-7f915cbfaeaf</uuid>
>    <metadata>
>      <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
>        <libosinfo:os id="http://redhat.com/rhel/8.5"/>
> @@ -28,7 +28,7 @@
>      <emulator>/usr/bin/qemu-system-x86_64</emulator>
>      <disk type='volume' device='disk'>
>        <driver name='qemu' type='raw'/>
> -      <source pool='default' volume='before-converted-rhel8-without-vcpu-model-sda'/>
> +      <source pool='default' volume='after-converted-rhel8-without-vcpu-model-sda'/>
>        <target dev='vda' bus='virtio'/>
>        <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
>      </disk>

No relevant change, good.

> --- before-converted-rhel9-with-vcpu-model.xml  2022-04-20 17:49:11.525543743 +0200
> +++ after-converted-rhel9-with-vcpu-model.xml   2022-04-20 17:49:09.284581421 +0200
> @@ -1,6 +1,6 @@
>  <domain type='kvm'>
> -  <name>before-converted-rhel9-with-vcpu-model</name>
> -  <uuid>191ac9f4-ac85-4ee5-a74c-3efc1278a8c5</uuid>
> +  <name>after-converted-rhel9-with-vcpu-model</name>
> +  <uuid>ce38f9ed-aa22-4fb1-84e2-025733135ae7</uuid>
>    <metadata>
>      <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
>        <libosinfo:os id="http://redhat.com/rhel/9.1"/>
> @@ -29,7 +29,7 @@
>      <emulator>/usr/bin/qemu-system-x86_64</emulator>
>      <disk type='volume' device='disk'>
>        <driver name='qemu' type='raw'/>
> -      <source pool='default' volume='before-converted-rhel9-with-vcpu-model-sda'/>
> +      <source pool='default' volume='after-converted-rhel9-with-vcpu-model-sda'/>
>        <target dev='vda' bus='virtio'/>
>        <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
>      </disk>

No relevant change, good.

> --- before-converted-rhel9-without-vcpu-model.xml       2022-04-20 17:49:12.055534833 +0200
> +++ after-converted-rhel9-without-vcpu-model.xml        2022-04-20 17:49:09.809572594 +0200
> @@ -1,6 +1,6 @@
>  <domain type='kvm'>
> -  <name>before-converted-rhel9-without-vcpu-model</name>
> -  <uuid>b142ce89-8044-490a-ab22-1e328a425ceb</uuid>
> +  <name>after-converted-rhel9-without-vcpu-model</name>
> +  <uuid>417e55e2-27f1-4340-b0e8-e2c59a769277</uuid>
>    <metadata>
>      <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
>        <libosinfo:os id="http://redhat.com/rhel/9.1"/>

No relevant change, good.

> @@ -17,9 +17,7 @@
>      <acpi/>
>      <apic/>
>    </features>
> -  <cpu mode='custom' match='exact' check='none'>
> -    <model fallback='forbid'>qemu64</model>
> -  </cpu>
> +  <cpu mode='host-passthrough' check='none' migratable='on'/>
>    <clock offset='utc'/>
>    <on_poweroff>destroy</on_poweroff>
>    <on_reboot>restart</on_reboot>

This is the fix.

> @@ -28,7 +26,7 @@
>      <emulator>/usr/bin/qemu-system-x86_64</emulator>
>      <disk type='volume' device='disk'>
>        <driver name='qemu' type='raw'/>
> -      <source pool='default' volume='before-converted-rhel9-without-vcpu-model-sda'/>
> +      <source pool='default' volume='after-converted-rhel9-without-vcpu-model-sda'/>
>        <target dev='vda' bus='virtio'/>
>        <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
>      </disk>

No relevant change, good.

I've booted all 4 "after" domains; everything works fine.

I've also reproduced the RHEL9 guest boot failure with the
"before-converted-rhel9-without-vcpu-model" domain.

Comment 19 Laszlo Ersek 2022-04-20 16:28:41 UTC

[v2v PATCH 0/9] ensure x86-64-v2 uarch level for RHEL-9.0+ guests
Message-Id: <20220420162333.24069-1-lersek>
https://listman.redhat.com/archives/libguestfs/2022-April/028711.html

Comment 20 Laszlo Ersek 2022-04-22 07:21:39 UTC

(In reply to Laszlo Ersek from comment #19)
> [v2v PATCH 0/9] ensure x86-64-v2 uarch level for RHEL-9.0+ guests
> Message-Id: <20220420162333.24069-1-lersek>
> https://listman.redhat.com/archives/libguestfs/2022-April/028711.html

Merged upstream as commit range e7539dc6f6d1..f28757c6d100.

Comment 23 Vera 2022-05-05 07:20:12 UTC

Verified with virt-v2v-2.0.4-1.el9.x86_64

Steps:
1. Convert rhel9.1 from esx to libvirt
# virt-v2v -ic vpx://root.198.169/data/10.73.199.217/?no_verify=1 -it vddk -io vddk-libdir=/home/vddk7.0.3 -io  vddk-thumbprint=B5:52:1F:B4:21:09:45:24:51:32:56:F6:63:6A:93:5D:54:08:2D:78  esx7.0-rhel9.1-x86_64 -ip /v2v-ops/esxpw
[   0.9] Setting up the source: -i libvirt -ic vpx://root.198.169/data/10.73.199.217/?no_verify=1 -it vddk esx7.0-rhel9.1-x86_64
[   2.7] Opening the source
[  11.0] Inspecting the source
[  18.9] Checking for sufficient free disk space in the guest
[  18.9] Converting Red Hat Enterprise Linux 9.1 Beta (Plow) to run on KVM
virt-v2v: This guest has virtio drivers installed.
[ 147.6] Mapping filesystem data to avoid copying unused and blank areas
[ 148.6] Closing the overlay
[ 148.8] Assigning disks to buses
[ 148.8] Checking if the guest needs BIOS or UEFI to boot
[ 148.8] Setting up the destination: -o libvirt
[ 150.6] Copying disk 1/1
█ 100% [****************************************]
[ 263.2] Creating output metadata
[ 263.2] Finishing off

2. start VM and check the checkpoints
# virsh list --all |grep rhel9.1
 2    esx7.0-rhel9.1-x86_64                      running

Moving to Verified.

Comment 24 Vera 2022-05-12 08:32:16 UTC

Also verified "to rhv" scenario with virt-v2v-2.0.4-1.el9.x86_64.

# virt-v2v -ic vpx://root.227.27/data/10.73.199.217/?no_verify=1 -it vddk -io vddk-libdir=/home/vddk7.0.3 -io  vddk-thumbprint=76:75:59:0E:32:F5:1E:58:69:93:75:5A:7B:51:32:C5:D1:6D:F1:21  esx7.0-rhel9.1-x86_64 -ip /v2v-ops/esxpw -o rhv-upload -oc https://dell-per740-22.lab.eng.pek2.redhat.com/ovirt-engine/api  -op /v2v-ops/rhvpasswd -os nfs_data -b ovirtmgmt 
[   0.0] Setting up the source: -i libvirt -ic vpx://root.227.27/data/10.73.199.217/?no_verify=1 -it vddk esx7.0-rhel9.1-x86_64
[   1.8] Opening the source
[   6.7] Inspecting the source
[  13.1] Checking for sufficient free disk space in the guest
[  13.1] Converting Red Hat Enterprise Linux 9.1 Beta (Plow) to run on KVM
virt-v2v: This guest has virtio drivers installed.
[ 119.6] Mapping filesystem data to avoid copying unused and blank areas
[ 120.4] Closing the overlay
[ 120.5] Assigning disks to buses
[ 120.5] Checking if the guest needs BIOS or UEFI to boot
[ 120.5] Setting up the destination: -o rhv-upload -oc https://dell-per740-22.lab.eng.pek2.redhat.com/ovirt-engine/api -os nfs_data
[ 132.8] Copying disk 1/1
█ 100% [****************************************]
[ 249.8] Creating output metadata
[ 268.7] Finishing off

Comment 32 errata-xmlrpc 2022-11-15 09:56:05 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: virt-v2v security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:7968