Bug 1149494

Summary: Migration between NUMA and non-NUMA host not work
Product: Red Hat Enterprise Virtualization Manager Reporter: Artyom <alukiano>
Component: vdsmAssignee: Martin Sivák <msivak>
Status: CLOSED CURRENTRELEASE QA Contact: Artyom <alukiano>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.5.0CC: alukiano, bazulay, dfediuck, eblake, ecohen, gklein, iheim, lpeer, lsurette, mavital, mprivozn, msivak, rbalakri, rgolan, Rhev-m-bugs, sherold, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: sla
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-11-19 09:41:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs
none
libvirt log none

Description Artyom 2014-10-05 13:21:31 UTC
Created attachment 944041 [details]
logs

Description of problem:
Migration between NUMA and non-NUMA host not work

Version-Release number of selected component (if applicable):
rhevm-3.5.0-0.13.beta.el6ev.noarch

How reproducible:
Always

Steps to Reproduce:
1. Have two hosts(one Numa support and second non NUma support)
2. Run vm on one of the hosts(no need pinning)
3. Migrate vm

Actual results:
Migration failed

Expected results:
Migration success

Additional info:

Comment 1 Doron Fediuck 2014-10-07 15:23:49 UTC
This may actually be in the libvirt level, as we can see in vdsm log:

Thread-1879::DEBUG::2014-10-05 16:16:22,576::migration::409::vm.Vm::(monitor_migration) vmId=`06a52b59-9bb3-446d-a173-66e91942121c`::starting migration monitor thread
Thread-1877::DEBUG::2014-10-05 16:16:22,843::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 38 edom: 10 level: 2 message: Unable to set cpuset.mems for domain test_numa: Invalid argument
Thread-1877::DEBUG::2014-10-05 16:16:22,843::migration::375::vm.Vm::(cancel) vmId=`06a52b59-9bb3-446d-a173-66e91942121c`::canceling migration downtime thread

Can you please provide the libvirt log?

Comment 2 Artyom 2014-10-08 08:19:34 UTC
Created attachment 944873 [details]
libvirt log

Comment 3 Doron Fediuck 2014-10-08 08:53:21 UTC
(In reply to Artyom from comment #2)
> Created attachment 944873 [details]
> libvirt log

Eric, looking at the attached libvirt log shows:

2014-10-08 07:58:39.712+0000: 11922: error : virNetClientProgramDispatchError:174 : Unable to set cpuset.mems for domain test_numa: Invalid argument

Is this something that goes deeper into qemu or something we're doing wrong here?

Comment 4 Michal Privoznik 2014-10-15 09:46:32 UTC
(In reply to Doron Fediuck from comment #3)
> (In reply to Artyom from comment #2)
> > Created attachment 944873 [details]
> > libvirt log
> 
> Eric, looking at the attached libvirt log shows:
> 
> 2014-10-08 07:58:39.712+0000: 11922: error :
> virNetClientProgramDispatchError:174 : Unable to set cpuset.mems for domain
> test_numa: Invalid argument
> 
> Is this something that goes deeper into qemu or something we're doing wrong
> here?

Well, the problem as I see it, is that libvirt is trying to honour NUMA setting on the destination (the provided logs are from the source btw) and libvirt does this via CGroups too (relying on numa_*() is not enough, malicious guest could change it). And since the requested NUMA nodes are not there, kernel gives us an error which is then transferred to the source side and migration is aborted. This is something which should be resolved in vdsm though. I mean, libvirt must honor all the settings requested in domain XML. We don't want libvirt to have any logic like 'yeah, NUMA's not available, so I'll just ingore that'. It's the management application responsibility to request only available configuration.

Fortunately, there's an option to start a guest on a NUMA machine, and later, when doing a migration (and vdsm must in fact know that destination is an UMA host), a modified XML can be provided to libvirt. The modified XML is then started on the destination. BTW vdsm is already doing that. So I think this is a vdsm bug after all.

Comment 5 Roy Golan 2014-11-04 14:00:56 UTC
Martin are we sure we need that filter?  I think Gilad already solved this
bug by excluding the numeTune element from the create details.

Artyom can you reproduce and verify? from your host1.log I see that vm has <numatune> element that the engine
send, and this is what Gilad fixed - if the VM is migratable it means it doesn't have numa specification and there for it can migrate.

Martin the patch as is will not solve the bug as it checks if the vm has numa nodes which will always be 0 if the VM is migrating.

The patch as is makes sense though, to select numa hosts to numa vms

Comment 6 Martin Sivák 2014-11-04 14:09:50 UTC
I do not see any Gilad's patch referenced in this bugzilla.

Removing numaTune elements from create details when migrating is probably not a good idea. Will the VM get the NUMA behaviour back when it migrates to NUMA aware host?

Also I tried to discuss this over email, but I got no response at all from Gilad. So I am surprised that he has a "solution" for it..

Comment 7 Roy Golan 2014-11-05 07:19:50 UTC
(In reply to Martin Sivák from comment #6)
> I do not see any Gilad's patch referenced in this bugzilla.
> 
> Removing numaTune elements from create details when migrating is probably
> not a good idea. Will the VM get the NUMA behaviour back when it migrates to
> NUMA aware host?
> 
> Also I tried to discuss this over email, but I got no response at all from
> Gilad. So I am surprised that he has a "solution" for it..

have a look at Bug 1147261.

Comment 8 Roy Golan 2014-11-05 07:21:33 UTC
Artyom Bug 1147261 may solved this already. can you verify that a Vm with no pinning is migrating properly now?

Comment 9 Martin Sivák 2014-11-05 10:59:15 UTC
Ah, well the referenced bug fixes something close, but not exactly it. It deals with VMs that do not require NUMA. The bug here could be updated to deal with NUMA VMs. We have to handle the migration logic for those as well.

Comment 10 Artyom 2014-11-06 13:40:31 UTC
Looks like patch for this bug https://bugzilla.redhat.com/show_bug.cgi?id=1147261. also solve this problem too, because when vm without numa pinning just don't have numa parameters and migration works fine. Checked on rhevm-3.5.0-0.18.beta.el6ev.noarch, migration between numa host and non-numa host work fine

Comment 11 Roy Golan 2014-11-09 07:41:11 UTC
(In reply to Martin Sivák from comment #9)
> Ah, well the referenced bug fixes something close, but not exactly it. It
> deals with VMs that do not require NUMA. The bug here could be updated to
> deal with NUMA VMs. We have to handle the migration logic for those as well.

NUMA VMs are non-migratable by design at the moment. we probably should
handle that when it will be supported.

Doron, Martin - unless I'm missing something we can close this

Comment 12 Doron Fediuck 2014-11-19 09:41:19 UTC
Closing based on comment 10.
Going forward we should look at migration between rhel 7 hosts.