Description of problem: When Live migrating an instance from a Hypervisor with more CPUs than the destination the migration fails with error in Nova log: instance: 44ab86db-e529-4a73-8a51-d3a59fdc90c5] Live Migration failure: Invalid value '0-5,12-17' for 'cpuset.cpus': Invalid argument Version-Release number of selected component (if applicable): python-nova-2015.1.0-3.el7ost.noarch libvirt-1.2.8-16.el7_1.2.x86_64 How reproducible: 100% Steps to Reproduce: 1. Set up an environment where one machine has more CPUs than the other 2. Launch an instance on a machine with the most CPUs # nova boot --flavor m1.small --image fedora --nic net-id=8ddecf6b-7dc9-4899-961b- da7ff778f2c8 vm1 3. Verify the host of the instance # nova show <instance id> 4. Live migrate the instance # nova live-migration --block-migrate 523becab-ed4d-413e-8bda-dc1b698bd1e9 5. Verify the instance stays on the source machine # nova show <instance id> 6. Look for errors in /var/log/nova/nova-compute.log Actual results: Migration Fails Expected results: Additional info:
This is a well know issue upstream. There is a blueprint proposed (not approved for Liberty release at this point, but likely to get) to fix this. The fix is (as is described on https://review.openstack.org/#/c/193576/) very invasive and unlikely to be easily backportable. We should probably add a release note for this saying that live migration is not supported for instances with CPU pinning, (in addition we might want to outright disable it). If we decide to disable it - then it makes sense to keep this as a blocker and do it for GA, otherwise we should not block on this, relnote, and clone the bug for the next release of RHOS where it will get properly fixed (upstream in Liberty).
Moving out to A1 as it's not a regression and not a blocker for GA
(In reply to Nikola Dipanov from comment #3) > This is a well know issue upstream. There is a blueprint proposed (not > approved for Liberty release at this point, but likely to get) to fix this. > > The fix is (as is described on https://review.openstack.org/#/c/193576/) > very invasive and unlikely to be easily backportable. > > We should probably add a release note for this saying that live migration is > not supported for instances with CPU pinning, (in addition we might want to > outright disable it). > > If we decide to disable it - then it makes sense to keep this as a blocker > and do it for GA, otherwise we should not block on this, relnote, and clone > the bug for the next release of RHOS where it will get properly fixed > (upstream in Liberty). Based on the above and my understanding that this was not in fact fixed in Liberty I am moving the flags to rhos-9.0, Mitaka. Let me know if my interpretation is incorrect...
The patch is well-developed, but dependent on review traction to land.
*** Bug 1319385 has been marked as a duplicate of this bug. ***
Hi Sahid, Is there any chance of this being accepted in the rc-* phase given it's treated as a bug upstream, or should I move this to Pike? Thanks, Steve
(In reply to Stephen Gordon from comment #18) > Hi Sahid, > > Is there any chance of this being accepted in the rc-* phase given it's > treated as a bug upstream, or should I move this to Pike? > > Thanks, > > Steve Nothing is really moving in upstream. I guess you should move it to Pike.
*** Bug 1360970 has been marked as a duplicate of this bug. ***
*** Bug 1559314 has been marked as a duplicate of this bug. ***
*** Bug 1585068 has been marked as a duplicate of this bug. ***
Any estimation on which version this issue will get resolved?
*** Bug 1703734 has been marked as a duplicate of this bug. ***
Feature freeze upstream is September 12th. The series is under active review, and has a decent chance of landing before then. If it lands, it'll be in the OSP16 release, but is not backportable to previous releases.
*** Bug 1565129 has been marked as a duplicate of this bug. ***
I'm going to set HasTestAutomation, since we have test cases in upstream whitebox [1]. We could probably add more, but what we currently have at least tests the happy path. We also have functional tests [2] up for review that cover the Nova-specific bits (rollback, rolling upgrade, etc). [1] https://opendev.org/x/whitebox-tempest-plugin/src/branch/master/whitebox_tempest_plugin/api/compute/test_cpu_pinning.py#L419 [2] https://review.opendev.org/#/c/672595/
Created attachment 1642181 [details] NUMALiveMigrationTest Whitebox Tempest results
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:0283