Bug 2076884 - [Openstack 16.1.5] Live-migration fails in an homogeneous environment
Summary: [Openstack 16.1.5] Live-migration fails in an homogeneous environment
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 16.1 (Train)
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: z9
: 16.1 (Train on RHEL 8.2)
Assignee: Kashyap Chamarthy
QA Contact: OSP DFG:Compute
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-20 07:18 UTC by Luca Davidde
Modified: 2023-09-18 04:35 UTC (History)
12 users (show)

Fixed In Version: openstack-nova-20.4.1-1.20220429124637.1ee93b9.el8ost
Doc Type: Known Issue
Doc Text:
There is currently a known issue when live migrating instances with CPUs that are incompatible with the destination host CPUs. As a workaround, you can skip the Compute service CPU comparison check on the destination host before migrating an instance, because libvirt (QEMU >= 2.9 and libvirt >= 4.4.0) correctly handles the CPU compatibility checks on the destination host during live migration. + Workaround: Before performing instance live migration, add the following configuration in the `nova.conf` file of each affected Compute node: + ---- [workarounds] skip_cpu_compare_on_dest = True ----
Clone Of:
Environment:
Last Closed: 2022-12-07 20:26:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-14768 0 None None None 2022-04-20 07:21:13 UTC
Red Hat Product Errata RHBA-2022:8795 0 None None None 2022-12-07 20:26:48 UTC

Description Luca Davidde 2022-04-20 07:18:30 UTC
Description of problem:

A customer has a 16.1.5 environment with this setup:


- all the compute nodes with tsx=off in cmdline (all with Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz stepping=7)
- capabilities file shows the host cpu as: Cascadelake-Server-noTSX
- all the instances running show in their host process the CPU: -cpu Cascadelake-Server 
- nova.conf with cpu_mode=host-model
- libvirt 6.0.0-25.5

in this situation live-migration doesn't work with error:
"Unacceptable CPU info: CPU doesn't have compatibility"


Version-Release number of selected component (if applicable):
rhosp 16.1.5
openstack-nova-libvirt:16.1.5-1 container
libvirt 6.0.0-25.5


How reproducible:
all the times


Steps to Reproduce:
1.create a test instance on source compute 
2.live migrate instance to destination
3.

Actual results:

live-migration doesn't work with error:
"Unacceptable CPU info: CPU doesn't have compatibility.

0

Refer to http://libvirt.org/html/libvirt-libvirt-host.html#virCPUCompareResult (HTTP 400)"

Expected results:

working live migration.
Additional info:
The customer has another environment with the same setup but on 16.2.1: there live migration works out of the box.
Is there any patch that we can apply on the actual env just to avoid forcing the cpu model in nova (which is working, but it's not serviceable 'cause all the instances need to be rebooted before the live migration)?
Linking the case.

Comment 1 Artom Lifshitz 2022-04-20 15:08:11 UTC
Hi Lucca,

To my knowledge nothing has been added in 16.2 that would address this kind of issue. Would it be possible to provide an example live migration request UUID (from the output `openstack server event list --long`), as well as sosreports from the source and destination nodes? That'll help us understand what's going on here.

Thanks!

Comment 2 Luca Davidde 2022-04-21 10:56:51 UTC
Hello Artom,
customer just attached the files after a reproduction on the case.
There's also the output of the event show:

{
  "action": "live-migration",
  "events": [
    {
      "event": "compute_check_can_live_migrate_destination",
      "start_time": "2022-04-21T09:44:09.000000",
      "finish_time": "2022-04-21T09:44:09.000000",
      "result": "Error",
      "traceback": "  File \"/usr/lib/python3.6/site-packages/nova/compute/utils.py\", line 1372, in decorated_function\n    return function(self, context, *args, **kwargs)\n  File \"/usr/lib/python3.6/site-packages/nova/compute/manager.py\", line 219, in decorated_function\n    kwargs['instance'], e, sys.exc_info())\n  File \"/usr/lib/python3.6/site-packages/oslo_utils/excutils.py\", line 220, in __exit__\n    self.force_reraise()\n  File \"/usr/lib/python3.6/site-packages/oslo_utils/excutils.py\", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n  File \"/usr/lib/python3.6/site-packages/six.py\", line 675, in reraise\n    raise value\n  File \"/usr/lib/python3.6/site-packages/nova/compute/manager.py\", line 207, in decorated_function\n    return function(self, context, *args, **kwargs)\n  File \"/usr/lib/python3.6/site-packages/nova/compute/manager.py\", line 6789, in check_can_live_migrate_destination\n    block_migration, disk_over_commit)\n  File \"/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py\", line 8249, in check_can_live_migrate_destination\n    self._compare_cpu(None, source_cpu_info, instance)\n  File \"/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py\", line 8566, in _compare_cpu\n    raise exception.InvalidCPUInfo(reason=m % {'ret': ret, 'u': u})\n"
    },
    {
      "event": "conductor_migrate_server",
      "start_time": "2022-04-21T09:44:08.000000",
      "finish_time": "2022-04-21T09:44:11.000000",
      "result": "Error",
      "traceback": "  File \"/usr/lib/python3.6/site-packages/nova/compute/utils.py\", line 1372, in decorated_function\n    return function(self, context, *args, **kwargs)\n  File \"/usr/lib/python3.6/site-packages/nova/conductor/manager.py\", line 291, in migrate_server\n    block_migration, disk_over_commit, request_spec)\n  File \"/usr/lib/python3.6/site-packages/nova/conductor/manager.py\", line 474, in _live_migrate\n    migration.save()\n  File \"/usr/lib/python3.6/site-packages/oslo_utils/excutils.py\", line 220, in __exit__\n    self.force_reraise()\n  File \"/usr/lib/python3.6/site-packages/oslo_utils/excutils.py\", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n  File \"/usr/lib/python3.6/site-packages/six.py\", line 675, in reraise\n    raise value\n  File \"/usr/lib/python3.6/site-packages/nova/conductor/manager.py\", line 456, in _live_migrate\n    task.execute()\n  File \"/usr/lib/python3.6/site-packages/nova/conductor/tasks/base.py\", line 27, in wrap\n    self.rollback()\n  File \"/usr/lib/python3.6/site-packages/oslo_utils/excutils.py\", line 220, in __exit__\n    self.force_reraise()\n  File \"/usr/lib/python3.6/site-packages/oslo_utils/excutils.py\", line 196, in force_reraise\n    six.reraise(self.type_, self.value, self.tb)\n  File \"/usr/lib/python3.6/site-packages/six.py\", line 675, in reraise\n    raise value\n  File \"/usr/lib/python3.6/site-packages/nova/conductor/tasks/base.py\", line 24, in wrap\n    return original(self)\n  File \"/usr/lib/python3.6/site-packages/nova/conductor/tasks/base.py\", line 42, in execute\n    return self._execute()\n  File \"/usr/lib/python3.6/site-packages/nova/conductor/tasks/live_migrate.py\", line 103, in _execute\n    self.destination, dest_node, self.limits = self._find_destination()\n  File \"/usr/lib/python3.6/site-packages/nova/conductor/tasks/live_migrate.py\", line 503, in _find_destination\n    return_objects=True, return_alternates=False)\n  File \"/usr/lib/python3.6/site-packages/nova/scheduler/client/query.py\", line 42, in select_destinations\n    instance_uuids, return_objects, return_alternates)\n  File \"/usr/lib/python3.6/site-packages/nova/scheduler/rpcapi.py\", line 160, in select_destinations\n    return cctxt.call(ctxt, 'select_destinations', **msg_args)\n  File \"/usr/lib/python3.6/site-packages/oslo_messaging/rpc/client.py\", line 181, in call\n    transport_options=self.transport_options)\n  File \"/usr/lib/python3.6/site-packages/oslo_messaging/transport.py\", line 129, in _send\n    transport_options=transport_options)\n  File \"/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py\", line 646, in send\n    transport_options=transport_options)\n  File \"/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/amqpdriver.py\", line 636, in _send\n    raise result\n"
    }
  ],
  "instance_uuid": "1669a92d-75f5-4c11-b254-85553e917cad",
  "message": "Error",
  "project_id": "2ed016f0f951492391dc3448f6124bce",
  "request_id": "req-745b155f-7469-4dfe-beb7-b5a2f978cd1b",
  "start_time": "2022-04-21T09:44:08.000000",
  "user_id": "a9337a5539953fe05571c3e3c1108a92108683cdb622919357087b9278be47d3"
}

thank you!

(In reply to Artom Lifshitz from comment #1)
> Hi Lucca,
> 
> To my knowledge nothing has been added in 16.2 that would address this kind
> of issue. Would it be possible to provide an example live migration request
> UUID (from the output `openstack server event list --long`), as well as
> sosreports from the source and destination nodes? That'll help us understand
> what's going on here.
> 
> Thanks!

Comment 5 Luca Davidde 2022-04-22 14:56:42 UTC
Hi Kashyap,
thank you.
Can you make a hotfix with the patch so we can install it?

Thanks!

Comment 26 errata-xmlrpc 2022-12-07 20:26:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.9 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:8795

Comment 27 Red Hat Bugzilla 2023-09-18 04:35:49 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.