Bug 1975384 - rhv upload precheck failed when rhv cluster's cpu architecture is not set
Summary: rhv upload precheck failed when rhv cluster's cpu architecture is not set
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: virt-v2v
Version: 9.0
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: beta
: ---
Assignee: Richard W.M. Jones
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-06-23 14:46 UTC by Xiaodai Wang
Modified: 2023-06-30 22:43 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Xiaodai Wang 2021-06-23 14:46:42 UTC
Description of problem:
rhv upload precheck failed when rhv cluster's cpu architecture is not set

Version-Release number of selected component (if applicable):
rhv: 4.4.6.8-0.1.el8ev
v2v: any version

How reproducible:
100%

Steps to Reproduce:
1. Make sure the rhv cluster's cpu architecture is not set.
2. Convet a guest to the rhv server by v2v + '-o rhv-upload'.

Actual results:
virt-v2v: virt-v2v 1.42.0rhel=8,release=12.module+el8.5.0+10976+d40a93e9 (x86_64)
libvirt version: 7.4.0
/usr/libexec/platform-python '-c' 'import ovirtsdk4'
nbdkit --dump-config
nbdkit version: 1.24.0
nbdkit python '/tmp/v2v.ocOeak/rhv-upload-plugin.py' --dump-plugin >/dev/null
/usr/libexec/platform-python '/tmp/v2v.Bh1fQk/rhv-upload-precheck.py' '/tmp/v2v.Bh1fQk/params1.json'
Traceback (most recent call last):
  File "/tmp/v2v.Bh1fQk/rhv-upload-precheck.py", line 97, in <module>
    if cpu.architecture == types.Architecture.UNDEFINED:
AttributeError: 'NoneType' object has no attribute 'architecture'
virt-v2v: error: failed server prechecks, see earlier errors
rm -rf '/tmp/v2v.FyK65k'
rm -rf '/tmp/v2v.v0o1bj'
rm -rf '/tmp/v2v.ocOeak'
rm -rf '/tmp/v2v.3rKIcn'
rm -rf '/tmp/v2v.Bh1fQk'
rm -rf '/tmp/rhvupload.LyrAZi'
rm -rf '/var/tmp/null.pFUzyl'

Expected results:
v2v command should run successfully.

Additional info:

Comment 1 Richard W.M. Jones 2021-06-24 14:06:25 UTC
Tomas or Nir, would you know about changes here?  It seems like
this code stopped working:

https://github.com/libguestfs/virt-v2v/blob/d4ecd15eca883f1e25cc2113e7501f4d19ef833a/v2v/rhv-upload-precheck.py#L96

Specifically it seems as if cluster.cpu == None, whereas before
it must have been a dict containing a field called "architecture".
Or perhaps it is not defined in this particular RHV cluster for some
reason (does it need to be set somewhere?)

Comment 2 Nir Soffer 2021-06-24 14:16:39 UTC
This is not my area, adding Arik.

Comment 3 Arik 2021-06-24 16:09:46 UTC
No, this part didn't change in the last 6 years [1] :)

I've just played a bit with that and I have a possible explanation:
The "Default" cluster is initially set with empty cpu_name so when setting undefined architecture, you'll get as a result a map or dictionary with the 'architecture' key.
Other clusters are initially set with NULL cpu_name so they won't be set with the CPU part.
Maybe this test was previously executed against the "Default" cluster and now against a DC with a cluster that is not "Default"?
Another possible reason, less likely though, is that the cpu_name of the "Default" cluster was set to NULL after its creation.

[1] https://github.com/oVirt/ovirt-engine/blame/master/backend/manager/modules/restapi/types/src/main/java/org/ovirt/engine/api/restapi/types/ClusterMapper.java#L207-L214

Comment 4 Arik 2021-06-24 16:19:11 UTC
> Steps to Reproduce:
> 1. Make sure the rhv cluster's cpu architecture is not set.
> 2. Convet a guest to the rhv server by v2v + '-o rhv-upload'.
...
> Expected results:
> v2v command should run successfully.

But I wonder about this part - who cares about importing to a cluster that never had a host running on it and was not explicitly set with an architecture/bios-type?
We now block the creation of VMs from the 'Blank' template on such clusters. It's really not an interesting case..

Comment 5 Xiaodai Wang 2021-06-25 02:02:34 UTC
> But I wonder about this part - who cares about importing to a cluster that
> never had a host running on it and was not explicitly set with an
> architecture/bios-type?

A host can be registered to such a cluster, that is how we found the issue.
mxie is more clear about the environment. If this scenario is invalid, maybe
the host should not be allowed to register to the cluster?

Comment 6 mxie@redhat.com 2021-06-25 02:35:28 UTC
> But I wonder about this part - who cares about importing to a cluster that
> never had a host running on it and was not explicitly set with an
> architecture/bios-type?
> We now block the creation of VMs from the 'Blank' template on such clusters.
> It's really not an interesting case..

It's strange that the cpu type of rhv node shows empty after registering a rhel8.4 host on rhv4.4.6.8-0.1, please refer to screenshot"rhv-node-cpu-type-empty.png"

Comment 8 Arik 2021-06-25 06:17:33 UTC
OK, my assumption that the cluster lacked cpu_name because there was no host in that cluster was incorrect - you've added a host but something was wrong with its reported CPU types.
I see in your setup that it was sorted out and now the cluster is set with "Intel SandyBridge Family" so 'cpu.architecture' should now be included in the API, right?

Comment 9 mxie@redhat.com 2021-06-25 13:28:40 UTC
(In reply to Arik from comment #8)
> OK, my assumption that the cluster lacked cpu_name because there was no host
> in that cluster was incorrect - you've added a host but something was wrong
> with its reported CPU types.
> I see in your setup that it was sorted out and now the cluster is set with
> "Intel SandyBridge Family" so 'cpu.architecture' should now be included in
> the API, right?

I found can use this way to reproduce the bug:

1.Make rhv node enter into maintenance status 
2.Create a new cluster ''test' in datacenter, make sure CPU Architecture is undefined and CPU type is auto detect in cluster
3.Change the cluster of rhv node to 'test' but not active the rhv node at this time
4.Convert a guest to rhv by v2v
# virt-v2v -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1 -it vddk -io vddk-libdir=/home/vddk7 -io vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA  -o rhv-upload -of qcow2 -oc https://dell-per740-22.lab.eng.pek2.redhat.com/ovirt-engine/api -ip /home/passwd -op /home/rhvpasswd  -os nfs_data -n ovirtmgmt  esx6.7-rhel8.3-x86_64  -oo rhv-cluster=test
Traceback (most recent call last):
  File "/tmp/v2v.CFHeMd/rhv-upload-precheck.py", line 97, in <module>
    if cpu.architecture == types.Architecture.UNDEFINED:
AttributeError: 'NoneType' object has no attribute 'architecture'
virt-v2v: error: failed server prechecks, see earlier errors

If reporting bugs, run virt-v2v with debugging enabled and include the 
complete output:

  virt-v2v -v -x [...]


When I first met this problem, the status of rhv node is active but CPU type of cluster is auto detect, then found the reason why CPU type of cluster is auto detect is the cpu type of rhv node shows empty

Comment 10 Arik 2021-06-27 07:24:04 UTC
(In reply to mxie from comment #9)
> I found can use this way to reproduce the bug:
> 
> 1.Make rhv node enter into maintenance status 
> 2.Create a new cluster ''test' in datacenter, make sure CPU Architecture is
> undefined and CPU type is auto detect in cluster
> 3.Change the cluster of rhv node to 'test' but not active the rhv node at
> this time
> 4.Convert a guest to rhv by v2v
> # virt-v2v -ic vpx://root.73.141/data/10.73.75.219/?no_verify=1 -it
> vddk -io vddk-libdir=/home/vddk7 -io
> vddk-thumbprint=1F:97:34:5F:B6:C2:BA:66:46:CB:1A:71:76:7D:6B:50:1E:03:00:EA 
> -o rhv-upload -of qcow2 -oc
> https://dell-per740-22.lab.eng.pek2.redhat.com/ovirt-engine/api -ip
> /home/passwd -op /home/rhvpasswd  -os nfs_data -n ovirtmgmt 
> esx6.7-rhel8.3-x86_64  -oo rhv-cluster=test
> Traceback (most recent call last):
>   File "/tmp/v2v.CFHeMd/rhv-upload-precheck.py", line 97, in <module>
>     if cpu.architecture == types.Architecture.UNDEFINED:
> AttributeError: 'NoneType' object has no attribute 'architecture'
> virt-v2v: error: failed server prechecks, see earlier errors
> 
> If reporting bugs, run virt-v2v with debugging enabled and include the 
> complete output:
> 
>   virt-v2v -v -x [...]
> 
> 
> When I first met this problem, the status of rhv node is active but CPU type
> of cluster is auto detect, then found the reason why CPU type of cluster is
> auto detect is the cpu type of rhv node shows empty

OK, so let's separate this to two cases:

1. The flow that is described above - that is the flow I previously referred to of having a cluster that had no active host and was not set with a CPU type explicitly.
The only problem I see here is the inconsistency between the Default cluster and new clusters - we should set the initial CPU type to NULL (rather than to an empty string) and then the CPU dictionary will always be None for clusters that are not set with CPU type.
You wrote at the description that the expected result is that V2V command should run successfully but in this case it's ok for it to fail as the target cluster is not well-defined yet.

2. The flow you've mentioned at the end in which the cluster was not set with CPU type although there was an active host in that cluster.
This shouldn't happen but we'll need to get the reported CPUs by libvirt on that host to figure out what happened.

Comment 11 mxie@redhat.com 2021-06-28 14:05:41 UTC
(In reply to Arik from comment #10)

> 2. The flow you've mentioned at the end in which the cluster was not set
> with CPU type although there was an active host in that cluster.
> This shouldn't happen but we'll need to get the reported CPUs by libvirt on
> that host to figure out what happened.

I have installed and register another rhv node to reproduce the problem, will send the environment by mail, then you can check the CPU info by yourself

Comment 12 Arik 2021-06-28 20:15:39 UTC
(In reply to mxie from comment #11)
> (In reply to Arik from comment #10)
> 
> > 2. The flow you've mentioned at the end in which the cluster was not set
> > with CPU type although there was an active host in that cluster.
> > This shouldn't happen but we'll need to get the reported CPUs by libvirt on
> > that host to figure out what happened.
> 
> I have installed and register another rhv node to reproduce the problem,
> will send the environment by mail, then you can check the CPU info by
> yourself

It seems there's something wrong with libvirt/qemu on that host:
> virsh domcapabilities 
error: failed to get emulator capabilities
error: internal error: unknown feature amd-sev-es

That can explain why VDSM doesn't get any model in the 'cpuFlags' (in the output of 'vdsm-client Host getCapabilities'):
"cpuFlags": "pge,ibrs,ss,rdtscp,sse4_1,ept,vpid,tsc,acpi,fxsr,monitor,msr,de,arat,pti,md_clear,pclmulqdq,avx,pdpe1gb,aperfmperf,syscall,mmx,pat,xsaveopt,sse2,xtopology,tsc_deadline_timer,mce,tm2,constant_tsc,aes,ssbd,tpr_shadow,mtrr,dts,pae,dtes64,tm,sse4_2,pln,xsave,nonstop_tsc,vme,cmov,flexpriority,clflush,pni,pdcm,pse,x2apic,vnmi,ds_cpl,cpuid,ht,dtherm,stibp,apic,nx,lm,pebs,arch_perfmon,ssse3,dca,ibpb,pcid,mca,flush_l1d,est,sep,cx16,pts,bts,pse36,pbe,sse,lahf_lm,vmx,fpu,rep_good,xtpr,smx,nopl,popcnt,epb,cx8"

Comment 13 Richard W.M. Jones 2021-06-28 21:31:49 UTC
> error: internal error: unknown feature amd-sev-es

https://bugzilla.redhat.com/show_bug.cgi?id=1961558

You need to update libvirt to at least libvirt-7.4.0-1.el8.

Comment 14 Richard W.M. Jones 2021-06-28 21:32:55 UTC
I should read more closely.  Are you implying that this libvirt
bug is actually causing this bug in virt-v2v?

Comment 15 mxie@redhat.com 2021-06-29 01:37:19 UTC
> It seems there's something wrong with libvirt/qemu on that host:
> > virsh domcapabilities 
> error: failed to get emulator capabilities
> error: internal error: unknown feature amd-sev-es
> 
> That can explain why VDSM doesn't get any model in the 'cpuFlags' (in the
> output of 'vdsm-client Host getCapabilities'):
> "cpuFlags":
> "pge,ibrs,ss,rdtscp,sse4_1,ept,vpid,tsc,acpi,fxsr,monitor,msr,de,arat,pti,
> md_clear,pclmulqdq,avx,pdpe1gb,aperfmperf,syscall,mmx,pat,xsaveopt,sse2,
> xtopology,tsc_deadline_timer,mce,tm2,constant_tsc,aes,ssbd,tpr_shadow,mtrr,
> dts,pae,dtes64,tm,sse4_2,pln,xsave,nonstop_tsc,vme,cmov,flexpriority,clflush,
> pni,pdcm,pse,x2apic,vnmi,ds_cpl,cpuid,ht,dtherm,stibp,apic,nx,lm,pebs,
> arch_perfmon,ssse3,dca,ibpb,pcid,mca,flush_l1d,est,sep,cx16,pts,bts,pse36,
> pbe,sse,lahf_lm,vmx,fpu,rep_good,xtpr,smx,nopl,popcnt,epb,cx8"

I think the bug has relationship with vdsm version

When I first met this problem, I installed vdsm from rhv-4.4.7-4 repo for rhv node, then the bug disappeared after reinstalling vdsm from rhv-4.4.6-9 repo

You can see the NFS node and ISCSI node of our rhv4.4 env has same version of libvirt and qemu, but NFS node shows CPU type normally

NFS node:
vdsm-4.40.60.7-1.el8ev.x86_64
libvirt-7.0.0-14.1.module+el8.4.0+11095+d46acebf.x86_64
qemu-kvm-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64


ISCSI node:
vdsm-4.40.70.4-1.el8ev.x86_64
libvirt-7.0.0-14.1.module+el8.4.0+11095+d46acebf.x86_64
qemu-kvm-5.2.0-16.module+el8.4.0+10806+b7d97207.x86_64

Comment 16 Arik 2021-06-29 07:41:40 UTC
(In reply to Richard W.M. Jones from comment #14)
> I should read more closely.  Are you implying that this libvirt
> bug is actually causing this bug in virt-v2v?

Nope, it causes a bug in RHV as it leads to having a cluster with an active host that doesn't match any of the supported CPU types
Lucia, had this cluster been already set with a CPU type, I assume this host would have turned into non-operational state but here it's the first host that gets activated in the cluster - and it looks like an issue on our side, as that host should not be activate without a proper CPU type nevertheless. Can you please take a look?

From the virt-v2v side, I think it would be handy to check if the target cluster is set with a CPU type to cover the case of clusters that are set with auto-detection of the CPU but had no active host to retrieve that information from. I don't know how likely is this particular issue with the NFS node, libvirt-7.0.0-14.1.module+el8.4.0+11095+d46acebf is also used by RHV QE for quite some time and this issue wasn't reported

Comment 18 Richard W.M. Jones 2021-06-29 07:53:25 UTC
How about this (untested) change:

--- a/v2v/rhv-upload-precheck.py
+++ b/v2v/rhv-upload-precheck.py
@@ -93,7 +93,7 @@ if len(clusters) == 0:
                         params['output_storage']))
 cluster = clusters[0]
 cpu = cluster.cpu
-if cpu.architecture == types.Architecture.UNDEFINED:
+if cpu is not None and cpu.architecture == types.Architecture.UNDEFINED:
     raise RuntimeError("The cluster ‘%s’ has an unknown architecture" %
                        (params['rhv_cluster']))

Comment 19 Arik 2021-06-29 08:05:52 UTC
(In reply to Richard W.M. Jones from comment #18)
> How about this (untested) change:
> 
> --- a/v2v/rhv-upload-precheck.py
> +++ b/v2v/rhv-upload-precheck.py
> @@ -93,7 +93,7 @@ if len(clusters) == 0:
>                          params['output_storage']))
>  cluster = clusters[0]
>  cpu = cluster.cpu
> -if cpu.architecture == types.Architecture.UNDEFINED:
> +if cpu is not None and cpu.architecture == types.Architecture.UNDEFINED:
>      raise RuntimeError("The cluster ‘%s’ has an unknown architecture" %
>                         (params['rhv_cluster']))

That's better but it still leaves some room for a failure to import when the architecture is defined and the bios type is undefined.
Lucia, another question please: I see that we can have a cluster with architecture=X86_64 and yet the bios type is undefined (with or without CPU type) and I see we'll fail the import due to the missing bios type.
We can't ask virt-v2v to check to the bios type property of the cluster since I believe the bios type won't be reported by all RHV 4.X versions, right?
Shouldn't we set the bios type automatically based on architecture + compatibility version? that way the suggested check would make sure the bios type validation wouldn't fail

Comment 20 Lucia Jelinkova 2021-06-29 10:48:26 UTC
Arik, you are correct that if the cluster has already set the cpu and architecture, the host becomes non operational if it does not support any cpu model. Also, if it is the first host in the cluster and the host fails to report any cpu flags at all, it becomes non operational too. However, this is a third case, when it is a first host in a cluster and reports only cpu flags (no cpu models). This special case is not covered. I wonder if we should cover it in the engine as vdsm should report the models and if not it should handle the error from libvirt itself.


As for the cluster and bios type, I can see that we do set a default bios type in add/update cluster commands:
https://github.com/oVirt/ovirt-engine/blob/7c1f9ff4b98ae82f0660ee7205e350fa61d25076/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/AddClusterCommand.java#L70
https://github.com/oVirt/ovirt-engine/blob/7c1f9ff4b98ae82f0660ee7205e350fa61d25076/backend/manager/modules/bll/src/main/java/org/ovirt/engine/core/bll/UpdateClusterCommand.java#L343

I do not see how that could be possible to have a cluster with architecture and without bios type. Can you share details?

Comment 21 Arik 2021-06-29 14:58:35 UTC
(In reply to Lucia Jelinkova from comment #20)
> Arik, you are correct that if the cluster has already set the cpu and
> architecture, the host becomes non operational if it does not support any
> cpu model. Also, if it is the first host in the cluster and the host fails
> to report any cpu flags at all, it becomes non operational too. However,
> this is a third case, when it is a first host in a cluster and reports only
> cpu flags (no cpu models). This special case is not covered. I wonder if we
> should cover it in the engine as vdsm should report the models and if not it
> should handle the error from libvirt itself.

Right, the third case is indeed different.
I think it would be simpler to handle it on the engine side and let the engine make the host non-operational

> As for the cluster and bios type, I can see that we do set a default bios
> type in add/update cluster commands:
> https://github.com/oVirt/ovirt-engine/blob/
> 7c1f9ff4b98ae82f0660ee7205e350fa61d25076/backend/manager/modules/bll/src/
> main/java/org/ovirt/engine/core/bll/AddClusterCommand.java#L70
> https://github.com/oVirt/ovirt-engine/blob/
> 7c1f9ff4b98ae82f0660ee7205e350fa61d25076/backend/manager/modules/bll/src/
> main/java/org/ovirt/engine/core/bll/UpdateClusterCommand.java#L343
> 
> I do not see how that could be possible to have a cluster with architecture
> and without bios type. Can you share details?

Oh my bad - I didn't press "ok" in the dialog and assumed what the webadmin shows me is what the cluster is going to have. Indeed, the backend sets the bios type then.
So Richard, yes - the change you've proposed looks good to me

Comment 24 Klaus Heinrich Kiwi 2021-08-09 20:29:33 UTC
(In reply to Arik from comment #21)
> (In reply to Lucia Jelinkova from comment #20)
> > Arik, you are correct that if the cluster has already set the cpu and
> > architecture, the host becomes non operational if it does not support any
> > cpu model. Also, if it is the first host in the cluster and the host fails
> > to report any cpu flags at all, it becomes non operational too. However,
> > this is a third case, when it is a first host in a cluster and reports only
> > cpu flags (no cpu models). This special case is not covered. I wonder if we
> > should cover it in the engine as vdsm should report the models and if not it
> > should handle the error from libvirt itself.
> 
> Right, the third case is indeed different.
> I think it would be simpler to handle it on the engine side and let the
> engine make the host non-operational
> 
> > As for the cluster and bios type, I can see that we do set a default bios
> > type in add/update cluster commands:
> > https://github.com/oVirt/ovirt-engine/blob/
> > 7c1f9ff4b98ae82f0660ee7205e350fa61d25076/backend/manager/modules/bll/src/
> > main/java/org/ovirt/engine/core/bll/AddClusterCommand.java#L70
> > https://github.com/oVirt/ovirt-engine/blob/
> > 7c1f9ff4b98ae82f0660ee7205e350fa61d25076/backend/manager/modules/bll/src/
> > main/java/org/ovirt/engine/core/bll/UpdateClusterCommand.java#L343
> > 
> > I do not see how that could be possible to have a cluster with architecture
> > and without bios type. Can you share details?
> 
> Oh my bad - I didn't press "ok" in the dialog and assumed what the webadmin
> shows me is what the cluster is going to have. Indeed, the backend sets the
> bios type then.
> So Richard, yes - the change you've proposed looks good to me

I'm a bit confused as to what the next steps here are.. do we need a fix on v2v, testcase, or somewhere else?

Comment 25 Laszlo Ersek 2022-07-07 09:27:42 UTC
Based on some git history digging and some code analysis, I don't think the work-around patch from Rich's comment 18 is sufficient.

That patch indeed takes care of "cpu" being None, and so the invalid access below is avoided:

    if cpu.architecture == types.Architecture.UNDEFINED:

But if "cpu" is None, we have a larger problem than that!


Namely, this code was added to virt-v2v in commit 2d54b54e9a9e ("-o rhv-upload: check guest arch with cluster", 2020-02-14). What that commit does (expressed in today's -- i.e., modular -- virt-v2v's terms, is the following:

- In the setup phase, when we run the precheck python script, and save whatever information that produces, in a JSON file, *extend* this JSON file with the CPU architecture info. In the setup phase, *parse* this information from the JSON, and remember it.

- In the finalization phase, recall the CPU architecture of the cluster, and check it against "gcaps_arch".


To this day (as of virt-v2v commit 4368b94ee172), we have the following in place; function "finalize", file "output/output_rhv_upload.ml":

    (* Check the cluster CPU arch matches what we derived about the
     * guest during conversion.
     *)
    (match rhv_cluster_cpu_architecture with
     | None -> assert false
     | Some arch ->
        if arch <> target_meta.guestcaps.gcaps_arch then
          error (f_"the cluster ‘%s’ does not support the architecture %s but %s")
            rhv_cluster_name target_meta.guestcaps.gcaps_arch arch
    );

If we simply avoid printing the "rhv_cluster_cpu_architecture" key to the JSON in the precheck python script, then the above-quoted match will fire *at the latest*. So conversion will fail either way.

There's simply no way around the *intent* of commit virt-v2v 4368b94ee172: that is, enforcing that the cluster's CPU arch match the guest's CPU arch.

If there are VALID cases when RHV cannot provide this descriptive information about the target cluster, then we must *relax* commit 2d54b54e9a9e. Currently we fail if the cluster arch is unknown, or different from the guest arch. We'd have to replace that with failure on *explicit difference* only, and assume "unknown" to be a successful architecture match.

Lucia, Arik, please advise: will this be fixed in the RHV data model (in which case there's nothing to do in virt-v2v), or should virt-v2v anticipate cases where RHV lacks this information by design (in which case we need to relax the existing virt-v2v logic)?

BTW: if "cpu" is None in the python precheck script, the proposed workaround from comment 18 would still lead to an identical problem at first, only lower down in the file:

results = {
    "rhv_storagedomain_uuid": storage_domain.id,
    "rhv_cluster_uuid": cluster.id,
    "rhv_cluster_cpu_architecture": cpu.architecture.value,
}

In other words, the formatting of the value into the JSON file would blow up then.

Thanks
Laszlo

Comment 26 Lucia Jelinkova 2022-07-12 08:38:39 UTC
It is hard to reply without knowing the details of your tests and what you are trying to solve right now. 

As for RHV logic, when you create a new cluster, it is set with UNDEFINED architecture (unless you specify it explicitly). When you add the first host and it is activated, the architecture and the CPU is detected automatically. From that point on, the architecture cannot be UNKNOWN unless there is a bug (there is no known bug in that area right now). I think it is worth checking if they're set early in the test and fail quickly if they're not. At the end of your tests, you can then compare cluster's architecture to guest's architecture if that is part of your test case.

Comment 27 Richard W.M. Jones 2022-07-12 10:22:32 UTC
(In reply to Lucia Jelinkova from comment #26)
> It is hard to reply without knowing the details of your tests and what you
> are trying to solve right now. 

At a minimum we're trying to make it so this code never breaks, particularly line 96:

https://github.com/libguestfs/virt-v2v/blob/af4a0454cdd21bb5e86f2dbfaa153e83afca3988/output/rhv-upload-precheck.py#L86-L98

The bigger issue is having virt-v2v ensure that the cluster architecture matches
the architecture of the guest, which we find out after we have done the conversion.
If we cannot get the cpu architecture then we cannot do that.  See Laszlo's comment 25.

Comment 28 Laszlo Ersek 2022-07-12 10:27:18 UTC
(In reply to Lucia Jelinkova from comment #26)

> I think it is worth checking
> if they're set early in the test and fail quickly if they're not. At the end
> of your tests, you can then compare cluster's architecture to guest's
> architecture if that is part of your test case.

Thanks for this answer; it seems quite comfortable for us. It confirms that we should continue failing on "unknown arch"; we just have to do it in a better way.

Namely, in this case, we only need to update Rich's proposal from comment#18 a little bit. It was:

--- a/v2v/rhv-upload-precheck.py
+++ b/v2v/rhv-upload-precheck.py
@@ -93,7 +93,7 @@ if len(clusters) == 0:
                         params['output_storage']))
 cluster = clusters[0]
 cpu = cluster.cpu
-if cpu.architecture == types.Architecture.UNDEFINED:
+if cpu is not None and cpu.architecture == types.Architecture.UNDEFINED:
     raise RuntimeError("The cluster ‘%s’ has an unknown architecture" %
                        (params['rhv_cluster']))

We only need replace

  (cpu is not None) and ...

with

  (cpu is None) or ...

That is, if we can't get the CPU type at all, report a user-friendly error message.

If the modified check passes, that's going to mean:

  cpu is not None and cpu.architecture != types.Architecture.UNDEFINED

therefore the rest of the code (both in Pyton and OCaml) will function as intended.

Comment 29 Lucia Jelinkova 2022-07-12 13:03:26 UTC
Now that I am looking into it - the condition could be even simpler by checking only the CPU because if the CPU is defined, the architecture is always defined too.


Note You need to log in before you can comment on or make changes to this bug.