Bug 1222414 - [RFE] Enable live migration for pinned instances
Summary: [RFE] Enable live migration for pinned instances
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Linux
urgent
urgent
Target Milestone: beta
: 16.0 (Train on RHEL 8.1)
Assignee: Artom Lifshitz
QA Contact: James Parker
URL:
Whiteboard:
: 1319385 1559314 1565129 1585068 1703734 (view as bug list)
Depends On:
Blocks: 1281573 1500557 1595325 1339866 1414999 1431627 1442136 1478186 1500145 1669579 1732913 1756916 1769425 1780366
TreeView+ depends on / blocked
 
Reported: 2015-05-18 07:47 UTC by Itzik Brown
Modified: 2022-03-13 14:20 UTC (History)
49 users (show)

Fixed In Version: openstack-nova-20.0.1-0.20191025043858.390db63.el8ost
Doc Type: Enhancement
Doc Text:
With this enhancement, support for live migration of instances with a NUMA topology has been added. Previously, this action was disabled by default. It could be enabled using the '[workarounds] enable_numa_live_migration' config option, but this defaulted to False because live migrating such instances resulted in them being moved to the destination host without updating any of the underlying NUMA guest-to-host mappings or the resource usage. With the new NUMA-aware live migration feature, if the instance cannot fit on the destination, the live migration will be attempted on an alternate destination if the request is set up to have alternates. If the instance can fit on the destination, the NUMA guest-to-host mappings will be re-calculated to reflect its new host, and its resource usage updated.
Clone Of:
: 1780366 (view as bug list)
Environment:
Last Closed: 2020-02-06 14:37:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
NUMALiveMigrationTest Whitebox Tempest results (478.07 KB, text/plain)
2019-12-04 18:08 UTC, James Parker
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1289064 0 None None None 2018-12-19 11:38:58 UTC
OpenStack gerrit 575179 0 'None' ABANDONED WIP: Libvirt live migration: update NUMA XML for dest 2021-02-10 15:45:07 UTC
OpenStack gerrit 599587 0 'None' MERGED Re-propose numa-aware-live-migration spec 2021-02-10 15:45:08 UTC
Red Hat Issue Tracker OSP-4525 0 None None None 2022-03-13 14:20:06 UTC
Red Hat Knowledge Base (Solution) 2191071 0 None None None 2018-05-30 15:49:15 UTC
Red Hat Product Errata RHEA-2020:0283 0 None None None 2020-02-06 14:39:51 UTC

Description Itzik Brown 2015-05-18 07:47:59 UTC
Description of problem:

When Live migrating an instance from a Hypervisor with more CPUs than the destination the migration fails with error in Nova log:
instance: 44ab86db-e529-4a73-8a51-d3a59fdc90c5] Live Migration failure: Invalid value '0-5,12-17' for 'cpuset.cpus': Invalid argument


Version-Release number of selected component (if applicable):
python-nova-2015.1.0-3.el7ost.noarch
libvirt-1.2.8-16.el7_1.2.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Set up an environment where one machine has more CPUs than the other

2. Launch an instance on a machine with the most CPUs 
   # nova boot --flavor m1.small --image fedora --nic net-id=8ddecf6b-7dc9-4899-961b- da7ff778f2c8 vm1

3. Verify the host of the instance
   # nova show <instance id>

4. Live migrate the instance
   # nova live-migration --block-migrate 523becab-ed4d-413e-8bda-dc1b698bd1e9

5. Verify the instance stays on the source machine
   # nova show <instance id>
6. Look for errors in /var/log/nova/nova-compute.log

Actual results:
Migration Fails

Expected results:


Additional info:

Comment 3 Nikola Dipanov 2015-06-26 14:34:52 UTC
This is a well know issue upstream. There is a blueprint proposed (not approved for Liberty release at this point, but likely to get) to fix this.

The fix is (as is described on https://review.openstack.org/#/c/193576/) very invasive and unlikely to be easily backportable.

We should probably add a release note for this saying that live migration is not supported for instances with CPU pinning, (in addition we might want to outright disable it).

If we decide to disable it - then it makes sense to keep this as a blocker and do it for GA, otherwise we should not block on this, relnote, and clone the bug for the next release of RHOS where it will get properly fixed (upstream in Liberty).

Comment 5 Jon Schlueter 2015-07-31 14:50:04 UTC
Moving out to A1 as it's not a regression and not a blocker for GA

Comment 7 Stephen Gordon 2015-11-26 17:10:50 UTC
(In reply to Nikola Dipanov from comment #3)
> This is a well know issue upstream. There is a blueprint proposed (not
> approved for Liberty release at this point, but likely to get) to fix this.
> 
> The fix is (as is described on https://review.openstack.org/#/c/193576/)
> very invasive and unlikely to be easily backportable.
> 
> We should probably add a release note for this saying that live migration is
> not supported for instances with CPU pinning, (in addition we might want to
> outright disable it).
> 
> If we decide to disable it - then it makes sense to keep this as a blocker
> and do it for GA, otherwise we should not block on this, relnote, and clone
> the bug for the next release of RHOS where it will get properly fixed
> (upstream in Liberty).

Based on the above and my understanding that this was not in fact fixed in Liberty I am moving the flags to rhos-9.0, Mitaka. Let me know if my interpretation is incorrect...

Comment 16 Eoghan Glynn 2017-01-09 20:15:45 UTC
The patch is well-developed, but dependent on review traction to land.

Comment 17 Stephen Gordon 2017-01-18 19:47:28 UTC
*** Bug 1319385 has been marked as a duplicate of this bug. ***

Comment 18 Stephen Gordon 2017-01-29 23:27:12 UTC
Hi Sahid,

Is there any chance of this being accepted in the rc-* phase given it's treated as a bug upstream, or should I move this to Pike?

Thanks,

Steve

Comment 19 Sahid Ferdjaoui 2017-01-30 09:03:27 UTC
(In reply to Stephen Gordon from comment #18)
> Hi Sahid,
> 
> Is there any chance of this being accepted in the rc-* phase given it's
> treated as a bug upstream, or should I move this to Pike?
> 
> Thanks,
> 
> Steve

Nothing is really moving in upstream. I guess you should move it to Pike.

Comment 33 Stephen Finucane 2018-02-07 16:58:16 UTC
*** Bug 1360970 has been marked as a duplicate of this bug. ***

Comment 35 Stephen Finucane 2018-03-23 12:13:55 UTC
*** Bug 1559314 has been marked as a duplicate of this bug. ***

Comment 49 Artom Lifshitz 2019-01-16 16:54:08 UTC
*** Bug 1585068 has been marked as a duplicate of this bug. ***

Comment 50 michaelor 2019-01-29 07:33:10 UTC
Any estimation on which version this issue will get resolved?

Comment 51 Artom Lifshitz 2019-05-17 13:48:02 UTC
*** Bug 1703734 has been marked as a duplicate of this bug. ***

Comment 61 Artom Lifshitz 2019-08-30 13:00:37 UTC
Feature freeze upstream is September 12th. The series is under active review, and has a decent chance of landing before then. If it lands, it'll be in the OSP16 release, but is not backportable to previous releases.

Comment 64 Artom Lifshitz 2019-10-15 16:01:43 UTC
*** Bug 1565129 has been marked as a duplicate of this bug. ***

Comment 66 Artom Lifshitz 2019-11-26 15:46:42 UTC
I'm going to set HasTestAutomation, since we have test cases in upstream whitebox [1]. We could probably add more, but what we currently have at least tests the happy path. We also have functional tests [2] up for review that cover the Nova-specific bits (rollback, rolling upgrade, etc).

[1] https://opendev.org/x/whitebox-tempest-plugin/src/branch/master/whitebox_tempest_plugin/api/compute/test_cpu_pinning.py#L419
[2] https://review.opendev.org/#/c/672595/

Comment 67 James Parker 2019-12-04 18:08:15 UTC
Created attachment 1642181 [details]
NUMALiveMigrationTest Whitebox Tempest results

Comment 70 errata-xmlrpc 2020-02-06 14:37:21 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:0283


Note You need to log in before you can comment on or make changes to this bug.