Bug 1149494
Summary: | Migration between NUMA and non-NUMA host not work | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Artyom <alukiano> | ||||||
Component: | vdsm | Assignee: | Martin Sivák <msivak> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Artyom <alukiano> | ||||||
Severity: | urgent | Docs Contact: | |||||||
Priority: | urgent | ||||||||
Version: | 3.5.0 | CC: | alukiano, bazulay, dfediuck, eblake, ecohen, gklein, iheim, lpeer, lsurette, mavital, mprivozn, msivak, rbalakri, rgolan, Rhev-m-bugs, sherold, yeylon | ||||||
Target Milestone: | --- | Keywords: | Triaged | ||||||
Target Release: | 3.5.0 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | sla | ||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2014-11-19 09:41:19 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
This may actually be in the libvirt level, as we can see in vdsm log: Thread-1879::DEBUG::2014-10-05 16:16:22,576::migration::409::vm.Vm::(monitor_migration) vmId=`06a52b59-9bb3-446d-a173-66e91942121c`::starting migration monitor thread Thread-1877::DEBUG::2014-10-05 16:16:22,843::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 38 edom: 10 level: 2 message: Unable to set cpuset.mems for domain test_numa: Invalid argument Thread-1877::DEBUG::2014-10-05 16:16:22,843::migration::375::vm.Vm::(cancel) vmId=`06a52b59-9bb3-446d-a173-66e91942121c`::canceling migration downtime thread Can you please provide the libvirt log? Created attachment 944873 [details]
libvirt log
(In reply to Artyom from comment #2) > Created attachment 944873 [details] > libvirt log Eric, looking at the attached libvirt log shows: 2014-10-08 07:58:39.712+0000: 11922: error : virNetClientProgramDispatchError:174 : Unable to set cpuset.mems for domain test_numa: Invalid argument Is this something that goes deeper into qemu or something we're doing wrong here? (In reply to Doron Fediuck from comment #3) > (In reply to Artyom from comment #2) > > Created attachment 944873 [details] > > libvirt log > > Eric, looking at the attached libvirt log shows: > > 2014-10-08 07:58:39.712+0000: 11922: error : > virNetClientProgramDispatchError:174 : Unable to set cpuset.mems for domain > test_numa: Invalid argument > > Is this something that goes deeper into qemu or something we're doing wrong > here? Well, the problem as I see it, is that libvirt is trying to honour NUMA setting on the destination (the provided logs are from the source btw) and libvirt does this via CGroups too (relying on numa_*() is not enough, malicious guest could change it). And since the requested NUMA nodes are not there, kernel gives us an error which is then transferred to the source side and migration is aborted. This is something which should be resolved in vdsm though. I mean, libvirt must honor all the settings requested in domain XML. We don't want libvirt to have any logic like 'yeah, NUMA's not available, so I'll just ingore that'. It's the management application responsibility to request only available configuration. Fortunately, there's an option to start a guest on a NUMA machine, and later, when doing a migration (and vdsm must in fact know that destination is an UMA host), a modified XML can be provided to libvirt. The modified XML is then started on the destination. BTW vdsm is already doing that. So I think this is a vdsm bug after all. Martin are we sure we need that filter? I think Gilad already solved this bug by excluding the numeTune element from the create details. Artyom can you reproduce and verify? from your host1.log I see that vm has <numatune> element that the engine send, and this is what Gilad fixed - if the VM is migratable it means it doesn't have numa specification and there for it can migrate. Martin the patch as is will not solve the bug as it checks if the vm has numa nodes which will always be 0 if the VM is migrating. The patch as is makes sense though, to select numa hosts to numa vms I do not see any Gilad's patch referenced in this bugzilla. Removing numaTune elements from create details when migrating is probably not a good idea. Will the VM get the NUMA behaviour back when it migrates to NUMA aware host? Also I tried to discuss this over email, but I got no response at all from Gilad. So I am surprised that he has a "solution" for it.. (In reply to Martin Sivák from comment #6) > I do not see any Gilad's patch referenced in this bugzilla. > > Removing numaTune elements from create details when migrating is probably > not a good idea. Will the VM get the NUMA behaviour back when it migrates to > NUMA aware host? > > Also I tried to discuss this over email, but I got no response at all from > Gilad. So I am surprised that he has a "solution" for it.. have a look at Bug 1147261. Artyom Bug 1147261 may solved this already. can you verify that a Vm with no pinning is migrating properly now? Ah, well the referenced bug fixes something close, but not exactly it. It deals with VMs that do not require NUMA. The bug here could be updated to deal with NUMA VMs. We have to handle the migration logic for those as well. Looks like patch for this bug https://bugzilla.redhat.com/show_bug.cgi?id=1147261. also solve this problem too, because when vm without numa pinning just don't have numa parameters and migration works fine. Checked on rhevm-3.5.0-0.18.beta.el6ev.noarch, migration between numa host and non-numa host work fine (In reply to Martin Sivák from comment #9) > Ah, well the referenced bug fixes something close, but not exactly it. It > deals with VMs that do not require NUMA. The bug here could be updated to > deal with NUMA VMs. We have to handle the migration logic for those as well. NUMA VMs are non-migratable by design at the moment. we probably should handle that when it will be supported. Doron, Martin - unless I'm missing something we can close this Closing based on comment 10. Going forward we should look at migration between rhel 7 hosts. |
Created attachment 944041 [details] logs Description of problem: Migration between NUMA and non-NUMA host not work Version-Release number of selected component (if applicable): rhevm-3.5.0-0.13.beta.el6ev.noarch How reproducible: Always Steps to Reproduce: 1. Have two hosts(one Numa support and second non NUma support) 2. Run vm on one of the hosts(no need pinning) 3. Migrate vm Actual results: Migration failed Expected results: Migration success Additional info: