Bug 1149494 - Migration between NUMA and non-NUMA host not work
Summary: Migration between NUMA and non-NUMA host not work
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.5.0
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 3.5.0
Assignee: Martin Sivák
QA Contact: Artyom
URL:
Whiteboard: sla
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-10-05 13:21 UTC by Artyom
Modified: 2016-02-10 20:19 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-11-19 09:41:19 UTC
oVirt Team: SLA
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (982.70 KB, application/zip)
2014-10-05 13:21 UTC, Artyom
no flags Details
libvirt log (7.58 MB, text/plain)
2014-10-08 08:19 UTC, Artyom
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 34699 0 master ABANDONED Add scheduling filter that checks NUMA compatibility Never

Description Artyom 2014-10-05 13:21:31 UTC
Created attachment 944041 [details]
logs

Description of problem:
Migration between NUMA and non-NUMA host not work

Version-Release number of selected component (if applicable):
rhevm-3.5.0-0.13.beta.el6ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Have two hosts(one Numa support and second non NUma support)
2. Run vm on one of the hosts(no need pinning)
3. Migrate vm

Actual results:
Migration failed

Expected results:
Migration success

Additional info:

Comment 1 Doron Fediuck 2014-10-07 15:23:49 UTC
This may actually be in the libvirt level, as we can see in vdsm log:

Thread-1879::DEBUG::2014-10-05 16:16:22,576::migration::409::vm.Vm::(monitor_migration) vmId=`06a52b59-9bb3-446d-a173-66e91942121c`::starting migration monitor thread
Thread-1877::DEBUG::2014-10-05 16:16:22,843::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 38 edom: 10 level: 2 message: Unable to set cpuset.mems for domain test_numa: Invalid argument
Thread-1877::DEBUG::2014-10-05 16:16:22,843::migration::375::vm.Vm::(cancel) vmId=`06a52b59-9bb3-446d-a173-66e91942121c`::canceling migration downtime thread

Can you please provide the libvirt log?

Comment 2 Artyom 2014-10-08 08:19:34 UTC
Created attachment 944873 [details]
libvirt log

Comment 3 Doron Fediuck 2014-10-08 08:53:21 UTC
(In reply to Artyom from comment #2)
> Created attachment 944873 [details]
> libvirt log

Eric, looking at the attached libvirt log shows:

2014-10-08 07:58:39.712+0000: 11922: error : virNetClientProgramDispatchError:174 : Unable to set cpuset.mems for domain test_numa: Invalid argument

Is this something that goes deeper into qemu or something we're doing wrong here?

Comment 4 Michal Privoznik 2014-10-15 09:46:32 UTC
(In reply to Doron Fediuck from comment #3)
> (In reply to Artyom from comment #2)
> > Created attachment 944873 [details]
> > libvirt log
> 
> Eric, looking at the attached libvirt log shows:
> 
> 2014-10-08 07:58:39.712+0000: 11922: error :
> virNetClientProgramDispatchError:174 : Unable to set cpuset.mems for domain
> test_numa: Invalid argument
> 
> Is this something that goes deeper into qemu or something we're doing wrong
> here?

Well, the problem as I see it, is that libvirt is trying to honour NUMA setting on the destination (the provided logs are from the source btw) and libvirt does this via CGroups too (relying on numa_*() is not enough, malicious guest could change it). And since the requested NUMA nodes are not there, kernel gives us an error which is then transferred to the source side and migration is aborted. This is something which should be resolved in vdsm though. I mean, libvirt must honor all the settings requested in domain XML. We don't want libvirt to have any logic like 'yeah, NUMA's not available, so I'll just ingore that'. It's the management application responsibility to request only available configuration.

Fortunately, there's an option to start a guest on a NUMA machine, and later, when doing a migration (and vdsm must in fact know that destination is an UMA host), a modified XML can be provided to libvirt. The modified XML is then started on the destination. BTW vdsm is already doing that. So I think this is a vdsm bug after all.

Comment 5 Roy Golan 2014-11-04 14:00:56 UTC
Martin are we sure we need that filter?  I think Gilad already solved this
bug by excluding the numeTune element from the create details.

Artyom can you reproduce and verify? from your host1.log I see that vm has <numatune> element that the engine
send, and this is what Gilad fixed - if the VM is migratable it means it doesn't have numa specification and there for it can migrate.

Martin the patch as is will not solve the bug as it checks if the vm has numa nodes which will always be 0 if the VM is migrating.

The patch as is makes sense though, to select numa hosts to numa vms

Comment 6 Martin Sivák 2014-11-04 14:09:50 UTC
I do not see any Gilad's patch referenced in this bugzilla.

Removing numaTune elements from create details when migrating is probably not a good idea. Will the VM get the NUMA behaviour back when it migrates to NUMA aware host?

Also I tried to discuss this over email, but I got no response at all from Gilad. So I am surprised that he has a "solution" for it..

Comment 7 Roy Golan 2014-11-05 07:19:50 UTC
(In reply to Martin Sivák from comment #6)
> I do not see any Gilad's patch referenced in this bugzilla.
> 
> Removing numaTune elements from create details when migrating is probably
> not a good idea. Will the VM get the NUMA behaviour back when it migrates to
> NUMA aware host?
> 
> Also I tried to discuss this over email, but I got no response at all from
> Gilad. So I am surprised that he has a "solution" for it..

have a look at Bug 1147261.

Comment 8 Roy Golan 2014-11-05 07:21:33 UTC
Artyom Bug 1147261 may solved this already. can you verify that a Vm with no pinning is migrating properly now?

Comment 9 Martin Sivák 2014-11-05 10:59:15 UTC
Ah, well the referenced bug fixes something close, but not exactly it. It deals with VMs that do not require NUMA. The bug here could be updated to deal with NUMA VMs. We have to handle the migration logic for those as well.

Comment 10 Artyom 2014-11-06 13:40:31 UTC
Looks like patch for this bug https://bugzilla.redhat.com/show_bug.cgi?id=1147261. also solve this problem too, because when vm without numa pinning just don't have numa parameters and migration works fine. Checked on rhevm-3.5.0-0.18.beta.el6ev.noarch, migration between numa host and non-numa host work fine

Comment 11 Roy Golan 2014-11-09 07:41:11 UTC
(In reply to Martin Sivák from comment #9)
> Ah, well the referenced bug fixes something close, but not exactly it. It
> deals with VMs that do not require NUMA. The bug here could be updated to
> deal with NUMA VMs. We have to handle the migration logic for those as well.

NUMA VMs are non-migratable by design at the moment. we probably should
handle that when it will be supported.

Doron, Martin - unless I'm missing something we can close this

Comment 12 Doron Fediuck 2014-11-19 09:41:19 UTC
Closing based on comment 10.
Going forward we should look at migration between rhel 7 hosts.


Note You need to log in before you can comment on or make changes to this bug.