Description of problem: Error Code 8]: invalid argument: the machine 'pc-q35-rhel9.0.0' is not supported by emulator Tempest test failure: tempest.exceptions.BuildErrorException: Server 8fb4a195-beeb-4b94-950e-0575c1357aca failed to build and is in ERROR status Details: Fault: {'code': 500, 'created': '2022-07-25T13:32:05Z', 'message': 'Exceeded maximum number of retries. Exhausted all hosts available for retrying build failures for instance 8fb4a195-beeb-4b94-950e-0575c1357aca.'}. Server boot request ID: req-d7750704-0691-469f-8808-d050feed16e7. Version-Release number of selected component (if applicable): How reproducible: Always Steps to Reproduce: Actual results: 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [req-d7750704-0691-469f-8808-d050feed16e7 445ee2254bfc4febb6d4adbc07ac2408 95364adc38cf4bd69bd1035cac35b60f - default default] [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] Instance failed to spawn: libvirt.libvirtError: unsupported configuration: Emulator '/usr/libexec/qemu-kvm' does not support machine type 'pc-q35-rhel9.0.0' 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] Traceback (most recent call last): 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2640, in _build_resources 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] yield resources 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 2413, in _build_and_run_instance 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] accel_info=accel_info) 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 4198, in spawn 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] cleanup_instance_disks=created_disks) 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 7271, in _create_guest_with_network 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] cleanup_instance_disks=cleanup_instance_disks) 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__ 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] self.force_reraise() 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] raise self.value 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 7239, in _create_guest_with_network 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] post_xml_callback=post_xml_callback) 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 7171, in _create_guest 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] guest = libvirt_guest.Guest.create(xml, self._host) 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 155, in create 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] encodeutils.safe_decode(xml)) 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 227, in __exit__ 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] self.force_reraise() 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 200, in force_reraise 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] raise self.value 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py", line 151, in create 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] guest = host.write_instance_config(xml) 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/host.py", line 1203, in write_instance_config 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] domain = self.get_connection().defineXML(xml) 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 190, in doit 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] result = proxy_call(self._autowrap, f, *args, **kwargs) 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 148, in proxy_call 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] rv = execute(f, *args, **kwargs) 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 129, in execute 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] six.reraise(c, e, tb) 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] raise value 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib/python3.6/site-packages/eventlet/tpool.py", line 83, in tworker 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] rv = meth(*args, **kwargs) 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] File "/usr/lib64/python3.6/site-packages/libvirt.py", line 4380, in defineXML 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] raise libvirtError('virDomainDefineXML() failed') 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] libvirt.libvirtError: unsupported configuration: Emulator '/usr/libexec/qemu-kvm' does not support machine type 'pc-q35-rhel9.0.0' 2022-07-25 13:32:02.505 7 ERROR nova.compute.manager [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] 2022-07-25 13:32:02.531 7 INFO nova.compute.manager [req-d7750704-0691-469f-8808-d050feed16e7 445ee2254bfc4febb6d4adbc07ac2408 95364adc38cf4bd69bd1035cac35b60f - default default] [instance: 8fb4a195-beeb-4b94-950e-0575c1357aca] Terminating instance Expected results: Additional info:
After workaround is merged, we are no longer seeing this issue. Closing this bug.
It is bug regardless to you have a workaround. When the package is built for el8 it should not suggest not existing machine types by default.
When building el8 package, how come a heat parameter default gets into play? See https://code.engineering.redhat.com/gerrit/c/openstack-tripleo-heat-templates/+/417310/5/deployment/nova/nova-compute-container-puppet.yaml . If it does, I don't think it should?
(In reply to Bogdan Dobrelya from comment #8) > When building el8 package, how come a heat parameter default gets into play? > See > https://code.engineering.redhat.com/gerrit/c/openstack-tripleo-heat- > templates/+/417310/5/deployment/nova/nova-compute-container-puppet.yaml . If > it does, I don't think it should? Yeah, putting needinfo on Attila, based on Bogdan's question. Hi, Attila: OSP 17 has RHEL9 versioned machine type hardcoded in it by default.
I'm closing this bug: it is _not_ a bug because RHEL8 hosts will not have the RHEL9 versioned machine types. However, if you use plain "q35", it _will_ work on RHEL8 — that's because it (RHEL8) still has RHEL8-based versioned "q35" machine types.
Addint to what Kashayp said - mixed mode during the 16 -> 17.1 FFU will have to handle both pc and q35 machines types somehow, but greenfield deployments of 17.1 need to be q35, so the default here makes sense, and will not change.
*** Bug 2115778 has been marked as a duplicate of this bug. ***
q35 should work so set q35 instead 'pc-q35-rhel9.0.0' Switching to openstack-tripleo-heat-template component since it defaulted to pc-q35-rhel9.0.0 even tough it has to support el8 nodes.
17.0 fresh deployments are *only* on RHEL 9.0 17.1 fresh deployments are *only* on RHEL 9.2. With that in mind, it makes sense to have a TripleO default machine type of `q35` for both (more on that later). However, for the CI job that seeks to approximate mixed RHEL upgrades testing, we need a manual override of the default, done either via Infrared or otherwise, to set it to the machine type used by most customers on OSP 16.2 / RHEL 8.4. This would be `pc`, and the unversioned variant of that is probably good enough, though to be as close to our customers as possible, it should ideally be the versioned variant of `pc`, which I believe would be the old RHEL 7.6 one, as no new variants were added in RHEL 8.4 for `pc`. The other requirement here is that for *all* 17.x releases, the machine type has to be the versioned lowest common denominator that's available on all underlying versions of RHEL 9.x. This is because during updates from 17.0 to 17.1, live migrations have to work in both directions, from RHEL 9.0 to 9.2 hosts, as well as RHEL 9.2 to 9.0 hosts. The only way to achieve that is to set the `pc-q35-rhel9.0.0` machine type as the default for *all* 17.x releases, including 17.1 on RHEL 9.2. For mixed RHEL upgrades from 16.2 to 17.1, the update tooling needs to preserve whatever machine type is set in the 16.2 environment, whether that's the default `pc` type, or something custom that the customer has set. @Attila do you know if a tracker for this request has been filed with the Upgrades team?
"This is because during updates from 17.0 to 17.1, live migrations have to work in both directions" Can you show the scenario when you do migration from new machine to old during upgrade ? "@Attila do you know if a tracker for this request has been filed with the Upgrades team?" No, I am also not aware if the pining request really coming for actual upgrade needs. Not just for running tempest with an admin only test case in the meantime. Tempest has (admin only) live migrate test case which might try to start a VM on a "random" machine (new) and migrate it to "random" other (old) in a small 2 machine half updated compute deployment it would "randomly" fail, but otherwise does upgrade really needs a pinned q35 ? What is the point in changing the tripleo defaults at each rhel version switch to the pinned/versioned q35 when it is actually pointing to the same place as the q35 alias anyway ? BTW, el9 considers all versions of the pc (pc-i440fx-rhel7.6.0) deprecated. /usr/libexec/qemu-kvm -machine ? Supported machines are: pc RHEL 7.6.0 PC (i440FX + PIIX, 1996) (alias of pc-i440fx-rhel7.6.0) pc-i440fx-rhel7.6.0 RHEL 7.6.0 PC (i440FX + PIIX, 1996) (default) (deprecated) q35 RHEL-9.0.0 PC (Q35 + ICH9, 2009) (alias of pc-q35-rhel9.0.0) pc-q35-rhel9.0.0 RHEL-9.0.0 PC (Q35 + ICH9, 2009) pc-q35-rhel8.6.0 RHEL-8.6.0 PC (Q35 + ICH9, 2009) (deprecated) pc-q35-rhel8.5.0 RHEL-8.5.0 PC (Q35 + ICH9, 2009) (deprecated) pc-q35-rhel8.4.0 RHEL-8.4.0 PC (Q35 + ICH9, 2009) (deprecated) pc-q35-rhel8.3.0 RHEL-8.3.0 PC (Q35 + ICH9, 2009) (deprecated) pc-q35-rhel8.2.0 RHEL-8.2.0 PC (Q35 + ICH9, 2009) (deprecated) pc-q35-rhel8.1.0 RHEL-8.1.0 PC (Q35 + ICH9, 2009) (deprecated) pc-q35-rhel8.0.0 RHEL-8.0.0 PC (Q35 + ICH9, 2009) (deprecated) pc-q35-rhel7.6.0 RHEL-7.6.0 PC (Q35 + ICH9, 2009) (deprecated)
Nova does not seams to handle the supported chipset as a trait therefore in mixed environment if you have pined settings to a new only chipset, the scheduler will not try to select a new only node. The unversioned chipset supposed to work also in this (q35 alias) case.
(In reply to Artom Lifshitz from comment #21) > 17.0 fresh deployments are *only* on RHEL 9.0 > 17.1 fresh deployments are *only* on RHEL 9.2. > > With that in mind, it makes sense to have a TripleO default machine type of > `q35` for both (more on that later). (Right, implicit in your message is, when you say `q35` here, you're implying it is versioned `q35`.) > However, for the CI job that seeks to approximate mixed RHEL upgrades > testing, we need a manual override of the default, done either via Infrared > or otherwise, to set it to the machine type used by most customers on OSP > 16.2 / RHEL 8.4. This would be `pc`, and the unversioned variant of that is > probably good enough, though to be as close to our customers as possible, it > should ideally be the versioned variant of `pc`, which I believe would be > the old RHEL 7.6 one, as no new variants were added in RHEL 8.4 for `pc`. Yes, on 16.2 / RHEL 8.4, the default machine-type set by TripleO is still "pc-i440fx-rhel7.6.0" (which will go away in RHEL9) And, I agree: for the CI job that's dealing with mixed RHEL upgrades only, probably unversioned `pc` might suffice. if they're doing only forward migrations. > The other requirement here is that for *all* 17.x releases, the machine type > has to be the versioned lowest common denominator that's available on all > underlying versions of RHEL 9.x. This is because during updates from 17.0 to > 17.1, live migrations have to work in both directions, from RHEL 9.0 to 9.2 > hosts, as well as RHEL 9.2 to 9.0 hosts. The only way to achieve that is to > set the `pc-q35-rhel9.0.0` machine type as the default for *all* 17.x > releases, including 17.1 on RHEL 9.2. If live migrations from 17.0 to 17.1 must work in _both_ directions, then the machine-type choice is clear for 17.2: `pc-q35-rhel9.0.0`. > For mixed RHEL upgrades from 16.2 to 17.1, the update tooling needs to > preserve whatever machine type is set in the 16.2 environment, whether > that's the default `pc` type, or something custom that the customer has set. > @Attila do you know if a tracker for this request has been filed with the > Upgrades team?
If I am really guessing right and the only reason to pin is passing an admin-only tempest test case in a half updated system, you can also consider adding/implement an option to tempest/tempest.conf to provide an exact hypervisor (scheduler hint) for starting the migration. During the test case the machine type/chipset is not expected to change you should be able to move the machine from old->new->old, but without pinning the new->old(->new) is not working reliably.
(In reply to Kashyap Chamarthy from comment #24) [...] > Yes, on 16.2 / RHEL 8.4, the default machine-type set by TripleO is still > "pc-i440fx-rhel7.6.0" (which will go away in RHEL9) Correction: "pc-i440fx-rhel7.6.0" will go away in *RHEL10*; it is still supported on RHEL9, for OSP's use-case. [...]
(In reply to Artom Lifshitz from comment #21) > 17.0 fresh deployments are *only* on RHEL 9.0 This is true. RHEL 9.0 is the only supported RHEL version for OSP 17.0. > 17.1 fresh deployments are *only* on RHEL 9.2. This is not necessarily true. This may be what we end up supporting for customers, but for the sake of being able to test mixed environment functionality we may want to deploy with both RHEL 8.4 and RHEL 9.2 compute nodes. This allows us to test compute functionality in the mixed environment without being forced to start with 16.2 and upgrade before we can test.
(In reply to Jesse Pretorius from comment #27) > (In reply to Artom Lifshitz from comment #21) > > 17.0 fresh deployments are *only* on RHEL 9.0 > > This is true. RHEL 9.0 is the only supported RHEL version for OSP 17.0. > > > 17.1 fresh deployments are *only* on RHEL 9.2. > > This is not necessarily true. This may be what we end up supporting for > customers, but for the sake of being able to test mixed environment > functionality we may want to deploy with both RHEL 8.4 and RHEL 9.2 compute > nodes. This allows us to test compute functionality in the mixed environment > without being forced to start with 16.2 and upgrade before we can test. DF and/or Upgrades might have a better sense of whether our concern here makes sense or not, but Compute thinks that because such a CI "appromixation" of mixed RHEL upgrades would be OSP 17 TripleO deploying on RHEL 8.4, it's not a valid test. In a real mixed RHEL FFU, the RHEL 8.4 compute hosts will have been deployed by OSP 16 TripleO.
(In reply to Attila Fazekas from comment #22) > "This is because during updates from 17.0 to 17.1, live migrations have to > work in both directions" > Can you show the scenario when you do migration from new machine to old > during upgrade ? Because during an update from 17.0 to 17.1 that's something that customers are allowed to do, and might even be constrained to do, depending on how full their cloud is and how complex their compute node update slide puzzle is. This is not an artificial situation that can happen only during Tempest runs. > "@Attila do you know if a tracker for this request has been filed with the > Upgrades team?" > No, I am also not aware if the pining request really coming for actual > upgrade needs. > Not just for running tempest with an admin only test case in the meantime. > Tempest has (admin only) live migrate test case which might try to start a > VM on a "random" machine (new) and migrate it to "random" other (old) in a > small 2 machine half updated compute deployment it > would "randomly" fail, but otherwise does upgrade really needs a pinned q35 ? > > What is the point in changing the tripleo defaults at each rhel version > switch to the pinned/versioned q35 when it is actually pointing to the same > place as the q35 alias anyway ? > > BTW, el9 considers all versions of the pc (pc-i440fx-rhel7.6.0) deprecated. > /usr/libexec/qemu-kvm -machine ? > Supported machines are: > pc RHEL 7.6.0 PC (i440FX + PIIX, 1996) (alias of > pc-i440fx-rhel7.6.0) > pc-i440fx-rhel7.6.0 RHEL 7.6.0 PC (i440FX + PIIX, 1996) (default) > (deprecated) > q35 RHEL-9.0.0 PC (Q35 + ICH9, 2009) (alias of > pc-q35-rhel9.0.0) > pc-q35-rhel9.0.0 RHEL-9.0.0 PC (Q35 + ICH9, 2009) > pc-q35-rhel8.6.0 RHEL-8.6.0 PC (Q35 + ICH9, 2009) (deprecated) > pc-q35-rhel8.5.0 RHEL-8.5.0 PC (Q35 + ICH9, 2009) (deprecated) > pc-q35-rhel8.4.0 RHEL-8.4.0 PC (Q35 + ICH9, 2009) (deprecated) > pc-q35-rhel8.3.0 RHEL-8.3.0 PC (Q35 + ICH9, 2009) (deprecated) > pc-q35-rhel8.2.0 RHEL-8.2.0 PC (Q35 + ICH9, 2009) (deprecated) > pc-q35-rhel8.1.0 RHEL-8.1.0 PC (Q35 + ICH9, 2009) (deprecated) > pc-q35-rhel8.0.0 RHEL-8.0.0 PC (Q35 + ICH9, 2009) (deprecated) > pc-q35-rhel7.6.0 RHEL-7.6.0 PC (Q35 + ICH9, 2009) (deprecated)
> This allows us to test compute functionality in the mixed environment without being forced to start with 16.2 and upgrade before we can test Right, firstly, such a testing case could be accomplished by overriding the pinned value as needed. Secondly, as we have a goal of jobs unification, perhaps such a test case could and should be merged with 16.2 17.1 FFU "mixed rhel" upgrades testing scenario.
(In reply to Bogdan Dobrelya from comment #31) > > This allows us to test compute functionality in the mixed environment without being forced to start with 16.2 and upgrade before we can test > > Right, firstly, such a testing case could be accomplished by overriding the > pinned value as needed. > Secondly, as we have a goal of jobs unification, perhaps such a test case > could and should be merged with 16.2 17.1 FFU "mixed rhel" upgrades testing > scenario. So, given the above and the rest of the discussion, can we close this bug, folks?
(In reply to Kashyap Chamarthy from comment #32) > (In reply to Bogdan Dobrelya from comment #31) > > > This allows us to test compute functionality in the mixed environment without being forced to start with 16.2 and upgrade before we can test > > > > Right, firstly, such a testing case could be accomplished by overriding the > > pinned value as needed. > > Secondly, as we have a goal of jobs unification, perhaps such a test case > > could and should be merged with 16.2 17.1 FFU "mixed rhel" upgrades testing > > scenario. > > So, given the above and the rest of the discussion, can we close this bug, > folks? Okay, I'm closing this, based on the above.
*** Bug 2129284 has been marked as a duplicate of this bug. ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days