1147644 – [RHEL7,SElinux] Vm migration fails in case of mixed version cluster (rhel7,rhel6.5) due to security reasons

Bug 1147644 - [RHEL7,SElinux] Vm migration fails in case of mixed version cluster (rhel7,rhel6.5) due to security reasons

Summary: [RHEL7,SElinux] Vm migration fails in case of mixed version cluster (rhel7,rh...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-engine
Sub Component:
Version:	3.5.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Francesco Romani
QA Contact:
Docs Contact:
URL:
Whiteboard:	virt
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2014-09-29 16:44 UTC by Ori Gofen
Modified:	2016-05-26 01:51 UTC (History)
CC List:	16 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-02-03 08:18:15 UTC
oVirt Team:	Virt
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
vdsm+engine logs + image (1.17 MB, application/x-bzip) 2014-09-29 16:44 UTC, Ori Gofen	no flags	Details
vdsm+engine+qemu+messages logs (872.21 KB, application/x-bzip) 2014-10-01 08:48 UTC, Ori Gofen	no flags	Details
vdsm+engine+messages logs (1.70 MB, application/x-bzip) 2014-10-07 07:08 UTC, Ori Gofen	no flags	Details
vdsm getVdsCaps output (59.41 KB, text/plain) 2016-02-01 15:05 UTC, Francesco Romani	no flags	Details
View All

Description Ori Gofen 2014-09-29 16:44:07 UTC

Created attachment 942426 [details]
vdsm+engine logs + image

Description of problem:

This is probably another SElinux compatibility issue,A cluser made of 2 hosts with different rhel versions(one is rhel7) does not support VM migration.

from engine log:
 
libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.116.2/system: authentication failed: Failed to verify peer's certificate

Version-Release number of selected component (if applicable):
vt4

How reproducible:
100%

Steps to Reproduce:
1.migrate VM from host with rhel7 version to 6.5

Actual results:
Operation fails due to security reasons

Expected results:
operation should be supported

Additional info:

Comment 1 Omer Frenkel 2014-09-30 07:50:41 UTC

what is selinux policy on both hosts?
please attach libvirt log from both hosts

Comment 2 Ori Gofen 2014-09-30 08:57:48 UTC

selinux policy was set to enforce on both host

Comment 3 Omer Frenkel 2014-09-30 15:29:09 UTC

still waiting for libvirt logs

Comment 4 Francesco Romani 2014-09-30 15:44:12 UTC

(In reply to Ori from comment #0)
> Created attachment 942426 [details]
> vdsm+engine logs + image
> 
> Description of problem:
> 
> This is probably another SElinux compatibility issue,A cluser made of 2
> hosts with different rhel versions(one is rhel7) does not support VM
> migration.
> 
> from engine log:
>  
> libvirtError: operation failed: Failed to connect to remote libvirt URI
> qemu+tls://10.35.116.2/system: authentication failed: Failed to verify
> peer's certificate

Quite likely this means that host configuration has become invalid.
Has one of the host being renamed?
A reinstall through engine could help.

Any way, so far looks like a configuration issue more than a virt bug.

Comment 5 Francesco Romani 2014-10-01 08:44:24 UTC

(In reply to Ori from comment #0)
> Created attachment 942426 [details]
> vdsm+engine logs + image
> 
> Description of problem:
> 
> This is probably another SElinux compatibility issue,A cluser made of 2
> hosts with different rhel versions(one is rhel7) does not support VM
> migration.
> 
> from engine log:
>  
> libvirtError: operation failed: Failed to connect to remote libvirt URI
> qemu+tls://10.35.116.2/system: authentication failed: Failed to verify
> peer's certificate
> 
> Version-Release number of selected component (if applicable):
> vt4
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1.migrate VM from host with rhel7 version to 6.5
> 
> Actual results:
> Operation fails due to security reasons
> 
> Expected results:
> operation should be supported
> 
> Additional info:

Just tried with last oVirt 3.5.0 RC, which is good enough because I'm not aware of any relevant difference in that area. Added a new RHEL7 host to an existing 6.5 cluster, succesfully did 7->6.6 migration. Does this really reproduce all the times when migrating?

Comment 6 Ori Gofen 2014-10-01 08:48:39 UTC

Created attachment 942969 [details]
vdsm+engine+qemu+messages logs

(In reply to Omer Frenkel from comment #3)
> still waiting for libvirt logs

libvirtd.log are out of the system due to BZ #1141763 (which is from some reason closed with NOTABUG,left a comment)

attaching all other relevant logs.

reproduced on vt4: 11:10:55.

Comment 7 Francesco Romani 2014-10-01 09:00:32 UTC

(In reply to Ori from comment #6)

> reproduced on vt4: 11:10:55.

Indeed migration still fails but with new and different error:

Oct  1 11:10:57 camel-vdsc journal: Group record for user '107' was not found: No such file or directory
Oct  1 11:10:57 camel-vdsc journal: Child quit during startup handshake: Input/output error
Oct  1 11:10:57 camel-vdsc journal: internal error: Process exited prior to exec: libvirt:  error : internal error: NUMA node 1 is out of range

Oct  1 11:10:57 camel-vdsc journal: vdsm vm.Vm ERROR vmId=`de46ea45-e62c-4144-8a66-b4ad435aee00`::Failed to start a migration destination vm
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/vm.py", line 2293, in _startUnderlyingVm
    self._completeIncomingMigration()
  File "/usr/share/vdsm/virt/vm.py", line 4000, in _completeIncomingMigration
    self._incomingMigrationFinished.isSet(), usedTimeout)
  File "/usr/share/vdsm/virt/vm.py", line 4053, in _attachLibvirtDomainAfterMigration
    raise MigrationError(e.get_error_message())
MigrationError: Domain not found: no domain with matching uuid 'de46ea45-e62c-4144-8a66-b4ad435aee00'

Comment 8 Ori Gofen 2014-10-01 09:29:01 UTC

wow,Francesco Romani, you are right,that's so weird

Comment 9 Francesco Romani 2014-10-03 09:29:28 UTC

(In reply to Francesco Romani from comment #7)
> (In reply to Ori from comment #6)
> 
> > reproduced on vt4: 11:10:55.
> 
> Indeed migration still fails but with new and different error:
> 
> Oct  1 11:10:57 camel-vdsc journal: Group record for user '107' was not
> found: No such file or directory
> Oct  1 11:10:57 camel-vdsc journal: Child quit during startup handshake:
> Input/output error
> Oct  1 11:10:57 camel-vdsc journal: internal error: Process exited prior to
> exec: libvirt:  error : internal error: NUMA node 1 is out of range
> 
> Oct  1 11:10:57 camel-vdsc journal: vdsm vm.Vm ERROR
> vmId=`de46ea45-e62c-4144-8a66-b4ad435aee00`::Failed to start a migration
> destination vm
> Traceback (most recent call last):
>   File "/usr/share/vdsm/virt/vm.py", line 2293, in _startUnderlyingVm
>     self._completeIncomingMigration()
>   File "/usr/share/vdsm/virt/vm.py", line 4000, in _completeIncomingMigration
>     self._incomingMigrationFinished.isSet(), usedTimeout)
>   File "/usr/share/vdsm/virt/vm.py", line 4053, in
> _attachLibvirtDomainAfterMigration
>     raise MigrationError(e.get_error_message())
> MigrationError: Domain not found: no domain with matching uuid
> 'de46ea45-e62c-4144-8a66-b4ad435aee00'

For this error, please check engine and/or engine configuration.
here VDSM is acting just as dumb middleman, but libvirt doesn't like what is given.

VM definition received by VDSM doesn't look right. I see:

               u'numaTune': {u'mode': u'interleave', u'nodeset': u'0,1'},
               u'smp': u'1',
               u'smpCoresPerSocket': u'1',

Omer, I believe this is validated on Engine side. Am I wrong?

Comment 10 Francesco Romani 2014-10-03 09:32:58 UTC

(In reply to Ori from comment #8)
> wow,Francesco Romani, you are right,that's so weird

Please confirm, once the NUMA error is sorted, if this really reproduces 100% of times on RHEL7->RHEL6.5 migrations.

Please try to reinstall hosts to make sure SSL certs are deployed ok, because so far it looks just a SSL configuration issue.

Comment 11 Michal Skrivanek 2014-10-03 13:43:13 UTC

Gilad should be able to answer comment #9 about NUMA

Comment 12 Ori Gofen 2014-10-05 10:12:11 UTC

(In reply to Francesco Romani from comment #10)
> (In reply to Ori from comment #8)
> > wow,Francesco Romani, you are right,that's so weird
> 
> Please confirm, once the NUMA error is sorted, if this really reproduces
> 100% of times on RHEL7->RHEL6.5 migrations.
> 
> Please try to reinstall hosts to make sure SSL certs are deployed ok,
> because so far it looks just a SSL configuration issue.

The confirmation is in progress,though it will take a while.
The Numa Error is not what caused this bug to appear at the first place,but it seems like I can't reproduce the first behaviour. once I'm done re-provisioning I'll post my results.

Comment 13 Francesco Romani 2014-10-06 14:41:41 UTC

(In reply to Ori from comment #12)
> (In reply to Francesco Romani from comment #10)
> > (In reply to Ori from comment #8)
> > > wow,Francesco Romani, you are right,that's so weird
> > 
> > Please confirm, once the NUMA error is sorted, if this really reproduces
> > 100% of times on RHEL7->RHEL6.5 migrations.
> > 
> > Please try to reinstall hosts to make sure SSL certs are deployed ok,
> > because so far it looks just a SSL configuration issue.
> 
> The confirmation is in progress,though it will take a while.
> The Numa Error is not what caused this bug to appear at the first place,but
> it seems like I can't reproduce the first behaviour. once I'm done
> re-provisioning I'll post my results.

Yep.
My hypotesis is both misbehaviours were ephemerals:
I'm not convinced that this really reproduces with each 7.0->6.5 migration.
Unfortunately, in both cases there are not enough data in the logs to understand why they happened.

Comment 14 Ori Gofen 2014-10-07 07:08:06 UTC

Created attachment 944451 [details]
vdsm+engine+messages logs

Ok, we have here two issues to my opinion, one is failure to VM migrate on a crossed Numa platform(one host has Numa support while the other does not) and one is when both hosts have cpu that support Numa feature, there's a fail to migrate due to security reasons.

from gold vdsc:

ile "/usr/lib64/python2.7/site-packages/libvirt.py", line 1264, in migrateToURI2
    if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self)
libvirtError: operation failed: Failed to connect to remote libvirt URI qemu+tls://10.35.116.2/system: authentication failed: Failed to verify peer's certifica
te

btw, please inform me if your requirement for additional logs have not been met this time.

Comment 15 Michal Skrivanek 2014-10-07 09:00:37 UTC

Giald, is this the issue fixed by bug 1147261?

Ori, please be careful about cleaning needinfo on other people when you reply

Comment 16 Francesco Romani 2014-10-07 15:55:44 UTC

(In reply to Francesco Romani from comment #13)

> My hypotesis is both misbehaviours were ephemerals:
> I'm not convinced that this really reproduces with each 7.0->6.5 migration.
> Unfortunately, in both cases there are not enough data in the logs to
> understand why they happened.

I was right -somehow-.
Turns out that it is all about randomness (and that I must do deeper verification in my future reproductions) because backwards migration from 7.0 to 6.5 is just not supported:

https://bugzilla.redhat.com/show_bug.cgi?id=1150191

Engine should at least warn or better inhibit that.

The only pending issue is the NUMA misbehaviour.

Comment 17 Gilad Chaplik 2014-10-07 16:12:02 UTC

Michal, should be resolved.

Comment 18 Francesco Romani 2014-10-08 07:09:01 UTC

So, to wrap it up:

* Backward migration RHEL 7.0 -> RHEL 6.5 (https://bugzilla.redhat.com/show_bug.cgi?id=1147644#c0) is unsupported, so everything could happen, change across builds, break without notice.
Engine should not allow that for safety reasons: filed https://bugzilla.redhat.com/show_bug.cgi?id=1150191

* NUMA issue found trying to reproduce is known and should be fixed.

So nothing left here. I'm going to close, please reopen if new evidence shows up.

Comment 19 Michal Skrivanek 2014-10-22 16:41:52 UTC

it's not Deffered, it's not a bug. RHEL 7 to RHEL 6 migration doesn't work and it's not supposed to. That you're allowed to try till now is a different problem

Comment 20 Ori Gofen 2014-10-22 17:03:54 UTC

I am hearing this for the first time now, was it ever documented?

Comment 21 Michal Skrivanek 2014-10-31 11:04:57 UTC

well, it is going to be in bug 1150191

Comment 22 Nicolas Ecarnot 2016-02-01 10:14:27 UTC

(In reply to Francesco Romani from comment #18)
> * NUMA issue found trying to reproduce is known and should be fixed.
> 
> So nothing left here. I'm going to close, please reopen if new evidence
> shows up.

Hello,

I upgraded a 3-hosts DC from 3.5.x to 3.6.1 then 3.6.2, and found out that migrations did not work anymore.

All 3 hosts are CentOS 7.2 with 3.6.2 oVirt, and the engine is another dedicated CentOS 6.7 host.

The first host is an old Dell Poweredge R610 (Nehalem), and the 2 other hosts are newer Dell Poweredge R430 (Haswell).

Migration between new R430 is OK and fast.
Migration from any R430 towards the old R610 is failing with the NUMA error :

Thread-6341::ERROR::2016-02-01 10:39:25,364::migration::310::virt.vm::(run) vmId=`49bfc23b-79a6-4f6c-840c-95ea04562513`::Failed to migrate
Traceback (most recent call last):
  File "/usr/share/vdsm/virt/migration.py", line 294, in run
    self._startUnderlyingMigration(time.time())
  File "/usr/share/vdsm/virt/migration.py", line 364, in _startUnderlyingMigration
    self._perform_migration(duri, muri)
  File "/usr/share/vdsm/virt/migration.py", line 403, in _perform_migration
    self._vm._dom.migrateToURI3(duri, params, flags)
  File "/usr/share/vdsm/virt/virdomain.py", line 68, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 124, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1836, in migrateToURI3
    if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
libvirtError: unsupported configuration: NUMA node 1 is unavailable

So it seems this bug has to be reopened.

Comment 23 Francesco Romani 2016-02-01 10:26:49 UTC

(In reply to Nicolas Ecarnot from comment #22)
> (In reply to Francesco Romani from comment #18)
> > * NUMA issue found trying to reproduce is known and should be fixed.
> > 
> > So nothing left here. I'm going to close, please reopen if new evidence
> > shows up.
> 
> Hello,
> 
> I upgraded a 3-hosts DC from 3.5.x to 3.6.1 then 3.6.2, and found out that
> migrations did not work anymore.
> 
> All 3 hosts are CentOS 7.2 with 3.6.2 oVirt, and the engine is another
> dedicated CentOS 6.7 host.
> 
> The first host is an old Dell Poweredge R610 (Nehalem), and the 2 other
> hosts are newer Dell Poweredge R430 (Haswell).
> 
> Migration between new R430 is OK and fast.
> Migration from any R430 towards the old R610 is failing with the NUMA error :
> 
> Thread-6341::ERROR::2016-02-01 10:39:25,364::migration::310::virt.vm::(run)
> vmId=`49bfc23b-79a6-4f6c-840c-95ea04562513`::Failed to migrate
> Traceback (most recent call last):
>   File "/usr/share/vdsm/virt/migration.py", line 294, in run
>     self._startUnderlyingMigration(time.time())
>   File "/usr/share/vdsm/virt/migration.py", line 364, in
> _startUnderlyingMigration
>     self._perform_migration(duri, muri)
>   File "/usr/share/vdsm/virt/migration.py", line 403, in _perform_migration
>     self._vm._dom.migrateToURI3(duri, params, flags)
>   File "/usr/share/vdsm/virt/virdomain.py", line 68, in f
>     ret = attr(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line
> 124, in wrapper
>     ret = f(*args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1836, in
> migrateToURI3
>     if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed',
> dom=self)
> libvirtError: unsupported configuration: NUMA node 1 is unavailable
> 
> So it seems this bug has to be reopened.

I agree this needs to be investigated. I'm not sure this bug is the best place to do this investigation. Does it happen when migrating from 3.5.x to 3.6.x?
Please share the Vdsm logs on both source and dest side during the migrations.
The Vdsm logs should also include the creation of the Vm, so I can inspect its configuration.

Comment 24 Nicolas Ecarnot 2016-02-01 10:47:49 UTC

(In reply to Francesco Romani from comment #23)
> I agree this needs to be investigated. I'm not sure this bug is the best
> place to do this investigation. Does it happen when migrating from 3.5.x to
> 3.6.x?

If your question is : does it crash when migrating from a 3.5.x cluster to a 3.6.x cluster? Then answer is : I'm not doing that, I have one and only cluster in 3.6.2 hosts.

If your question is : did it work when this DC was in 3.5.x, and is it now crashing when migrating, once everything has been migrating to 3.6.2, the answer is yes.

> Please share the Vdsm logs on both source and dest side during the
> migrations.

I already did on the source side.
I could on the target side, but I did not find anything obvious.

> The Vdsm logs should also include the creation of the Vm, so I can inspect
> its configuration.

The VM has been creating too long ago to be able to retrieve the logs.

Do you advice me to try to create a new VM and tests again?

Comment 25 Francesco Romani 2016-02-01 11:33:54 UTC

(In reply to Nicolas Ecarnot from comment #24)
> (In reply to Francesco Romani from comment #23)
> > I agree this needs to be investigated. I'm not sure this bug is the best
> > place to do this investigation. Does it happen when migrating from 3.5.x to
> > 3.6.x?
> 
> If your question is : does it crash when migrating from a 3.5.x cluster to a
> 3.6.x cluster? Then answer is : I'm not doing that, I have one and only
> cluster in 3.6.2 hosts.
> 
> If your question is : did it work when this DC was in 3.5.x, and is it now
> crashing when migrating, once everything has been migrating to 3.6.2, the
> answer is yes.

Yep, this is what I meant.

> > Please share the Vdsm logs on both source and dest side during the
> > migrations.
> 
> I already did on the source side.
> I could on the target side, but I did not find anything obvious.

Will start from the source side. Please also do in the dest side, worst case it will be useful to crosscheck.
 
> > The Vdsm logs should also include the creation of the Vm, so I can inspect
> > its configuration.
> 
> The VM has been creating too long ago to be able to retrieve the logs.

No big deal, let's have at least the domain XML. Instructions:

1. on the host running the VM, please run
virsh -r list
2. identify the VM from the list obtained in the item #1
3. with the retrieved libvirt domain ID, run
virsh -r dumpxml $DOMAIN_ID
4. please provide the dumped xml

> Do you advice me to try to create a new VM and tests again?

That would help, yes.

Comment 26 Francesco Romani 2016-02-01 12:04:06 UTC

reopening to investigate NUMA issues in the upgrade path.

Comment 28 Francesco Romani 2016-02-01 12:10:55 UTC

(In reply to Francesco Romani from comment #25)
> (In reply to Nicolas Ecarnot from comment #24)
> > (In reply to Francesco Romani from comment #23)
> > > I agree this needs to be investigated. I'm not sure this bug is the best
> > > place to do this investigation. Does it happen when migrating from 3.5.x to
> > > 3.6.x?
> > 
> > If your question is : does it crash when migrating from a 3.5.x cluster to a
> > 3.6.x cluster? Then answer is : I'm not doing that, I have one and only
> > cluster in 3.6.2 hosts.
> > 
> > If your question is : did it work when this DC was in 3.5.x, and is it now
> > crashing when migrating, once everything has been migrating to 3.6.2, the
> > answer is yes.
> 
> Yep, this is what I meant.
> 
> > > Please share the Vdsm logs on both source and dest side during the
> > > migrations.
> > 
> > I already did on the source side.
> > I could on the target side, but I did not find anything obvious.
> 
> Will start from the source side. Please also do in the dest side, worst case
> it will be useful to crosscheck.

I fail to find new attachment. The snippet provided in https://bugzilla.redhat.com/show_bug.cgi?id=1147644#c22 is of course relevant, but I need more context to perform meaningful investigation. Please attach the full log.

Comment 29 Francesco Romani 2016-02-01 12:52:07 UTC

(In reply to Nicolas Ecarnot from comment #22)
> (In reply to Francesco Romani from comment #18)
> > * NUMA issue found trying to reproduce is known and should be fixed.
> > 
> > So nothing left here. I'm going to close, please reopen if new evidence
> > shows up.
> 
> Hello,
> 
> I upgraded a 3-hosts DC from 3.5.x to 3.6.1 then 3.6.2, and found out that
> migrations did not work anymore.
> 
> All 3 hosts are CentOS 7.2 with 3.6.2 oVirt, and the engine is another
> dedicated CentOS 6.7 host.
> 
> The first host is an old Dell Poweredge R610 (Nehalem), and the 2 other
> hosts are newer Dell Poweredge R430 (Haswell).
> 
> Migration between new R430 is OK and fast.
> Migration from any R430 towards the old R610 is failing with the NUMA error :
> 
> Thread-6341::ERROR::2016-02-01 10:39:25,364::migration::310::virt.vm::(run)
> vmId=`49bfc23b-79a6-4f6c-840c-95ea04562513`::Failed to migrate
> Traceback (most recent call last):
>   File "/usr/share/vdsm/virt/migration.py", line 294, in run
>     self._startUnderlyingMigration(time.time())
>   File "/usr/share/vdsm/virt/migration.py", line 364, in
> _startUnderlyingMigration
>     self._perform_migration(duri, muri)
>   File "/usr/share/vdsm/virt/migration.py", line 403, in _perform_migration
>     self._vm._dom.migrateToURI3(duri, params, flags)
>   File "/usr/share/vdsm/virt/virdomain.py", line 68, in f
>     ret = attr(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line
> 124, in wrapper
>     ret = f(*args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1836, in
> migrateToURI3
>     if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed',
> dom=self)
> libvirtError: unsupported configuration: NUMA node 1 is unavailable
> 
> So it seems this bug has to be reopened.

After investigation in libvirt area, to better understand this issue, I'd also need:
- output of `lscpu' on all the nodes
- output of `vdsClient -s 0 getVdsCaps' on all the nodes

Comment 30 Nicolas Ecarnot 2016-02-01 13:16:36 UTC

(In reply to Francesco Romani from comment #25)
> > > Please share the Vdsm logs on both source and dest side during the
> > > migrations.
> > 
> > I already did on the source side.
> > I could on the target side, but I did not find anything obvious.
> 
> Will start from the source side. Please also do in the dest side, worst case
> it will be useful to crosscheck.

See below.

>  
> > > The Vdsm logs should also include the creation of the Vm, so I can inspect
> > > its configuration.
> > 
> > The VM has been creating too long ago to be able to retrieve the logs.
> 
> No big deal, let's have at least the domain XML. Instructions:
> 
> 1. on the host running the VM, please run
> virsh -r list
> 2. identify the VM from the list obtained in the item #1
> 3. with the retrieved libvirt domain ID, run
> virsh -r dumpxml $DOMAIN_ID
> 4. please provide the dumped xml

I tried to prune and serve you only what is relevant :)

- dumpxml : http://ur1.ca/ogvx9

- source : http://ur1.ca/ogvy7
- target : http://ur1.ca/ogvy9
- engine : http://ur1.ca/ogvz9

> 
> > Do you advice me to try to create a new VM and tests again?
> 
> That would help, yes.

I'm on my way doing that.

Comment 31 Nicolas Ecarnot 2016-02-01 13:18:37 UTC

(In reply to Francesco Romani from comment #29)

> After investigation in libvirt area, to better understand this issue, I'd
> also need:
> - output of `lscpu' on all the nodes
> - output of `vdsClient -s 0 getVdsCaps' on all the nodes

* ===   lscpu   ===

[root@serv-vm-adms02 etc]# lscpu
Architecture :        x86_64
Mode(s) opératoire(s) des processeurs : 32-bit, 64-bit
Boutisme :            Little Endian
Processeur(s) :       16
Liste de processeur(s) en ligne : 0-15
Thread(s) par cœur : 2
Cœur(s) par socket : 4
Socket(s) :           2
Nœud(s) NUMA :       1
Identifiant constructeur : GenuineIntel
Famille de processeur : 6
Modèle :             26
Nom de modèle :      Intel(R) Xeon(R) CPU           E5540  @ 2.53GHz
Révision :           5
Vitesse du processeur en MHz : 2527.020
BogoMIPS :            5053.19
Virtualisation :      VT-x
Cache L1d :           32K
Cache L1i :           32K
Cache L2 :            256K
Cache L3 :            8192K
Nœud NUMA 0 de processeur(s) : 0-15

[root@serv-vm-adms04 etc]# lscpu
Architecture :        x86_64
Mode(s) opératoire(s) des processeurs : 32-bit, 64-bit
Boutisme :            Little Endian
Processeur(s) :       32
Liste de processeur(s) en ligne : 0-31
Thread(s) par cœur : 2
Cœur(s) par socket : 8
Socket(s) :           2
Nœud(s) NUMA :       2
Identifiant constructeur : GenuineIntel
Famille de processeur : 6
Modèle :             63
Nom de modèle :      Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Révision :           2
Vitesse du processeur en MHz : 2758.687
BogoMIPS :            4798.47
Virtualisation :      VT-x
Cache L1d :           32K
Cache L1i :           32K
Cache L2 :            256K
Cache L3 :            20480K
Nœud NUMA 0 de processeur(s) : 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
Nœud NUMA 1 de processeur(s) : 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31

[root@serv-vm-adms05 etc]# lscpu
Architecture :        x86_64
Mode(s) opératoire(s) des processeurs : 32-bit, 64-bit
Boutisme :            Little Endian
Processeur(s) :       32
Liste de processeur(s) en ligne : 0-31
Thread(s) par cœur : 2
Cœur(s) par socket : 8
Socket(s) :           2
Nœud(s) NUMA :       2
Identifiant constructeur : GenuineIntel
Famille de processeur : 6
Modèle :             63
Nom de modèle :      Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz
Révision :           2
Vitesse du processeur en MHz : 3017.250
BogoMIPS :            4798.83
Virtualisation :      VT-x
Cache L1d :           32K
Cache L1i :           32K
Cache L2 :            256K
Cache L3 :            20480K
Nœud NUMA 0 de processeur(s) : 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
Nœud NUMA 1 de processeur(s) : 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31

Comment 32 Nicolas Ecarnot 2016-02-01 13:22:37 UTC

(In reply to Nicolas Ecarnot from comment #31)
> - output of `vdsClient -s 0 getVdsCaps' on all the nodes

http://ur1.ca/ogw0v

Comment 33 Nicolas Ecarnot 2016-02-01 14:00:24 UTC

(In reply to Nicolas Ecarnot from comment #30)

> > > Do you advice me to try to create a new VM and tests again?
> > 
> > That would help, yes.
> 
> I'm on my way doing that.

Been there, tried that :
When migrating a brand new pristine VM, the error is exactly the same :

libvirtError: unsupported configuration: NUMA node 1 is unavailable

(that one can understand as my previous sendings are showing there is NO NUMA node 1 in the target host).

Now, the questions are :
- should we solve the issue on the target node about NUMA?
- should we fix oVirt to prevent its crash in such a case?
- both?

Comment 34 Francesco Romani 2016-02-01 15:05:08 UTC

Created attachment 1120138 [details]
vdsm getVdsCaps output

Comment 35 Francesco Romani 2016-02-01 15:06:19 UTC

(In reply to Nicolas Ecarnot from comment #32)
> (In reply to Nicolas Ecarnot from comment #31)
> > - output of `vdsClient -s 0 getVdsCaps' on all the nodes
> 
> http://ur1.ca/ogw0v

Thanks, but better either add inline as comment (like you did in comment 31) or attach a file. I just did that in comment 34.

The reason is the data in the bugzilla should be self-sufficient as much as possible for future reference.

Comment 36 Nicolas Ecarnot 2016-02-01 16:01:56 UTC

Francesco,

Do you prefer I open a specific BZ to deal with this NUMA issue?

Comment 37 Francesco Romani 2016-02-01 16:14:41 UTC

(In reply to Nicolas Ecarnot from comment #36)
> Francesco,
> 
> Do you prefer I open a specific BZ to deal with this NUMA issue?

Yes please, this is different enough to deserve its own bug. Please make sure to copy all the information from comment 22 onwards on the new bug.

Will close this one again afterwards.

Comment 38 Nicolas Ecarnot 2016-02-02 08:39:51 UTC

(In reply to Francesco Romani from comment #37)
> (In reply to Nicolas Ecarnot from comment #36)
> > Francesco,
> > 
> > Do you prefer I open a specific BZ to deal with this NUMA issue?
> 
> Yes please, this is different enough to deserve its own bug. Please make
> sure to copy all the information from comment 22 onwards on the new bug.
> 
> Will close this one again afterwards.

Done :
https://bugzilla.redhat.com/show_bug.cgi?id=1303842

Comment 39 Francesco Romani 2016-02-03 08:18:15 UTC

(In reply to Nicolas Ecarnot from comment #38)
> (In reply to Francesco Romani from comment #37)
> > (In reply to Nicolas Ecarnot from comment #36)
> > > Francesco,
> > > 
> > > Do you prefer I open a specific BZ to deal with this NUMA issue?
> > 
> > Yes please, this is different enough to deserve its own bug. Please make
> > sure to copy all the information from comment 22 onwards on the new bug.
> > 
> > Will close this one again afterwards.
> 
> Done :
> https://bugzilla.redhat.com/show_bug.cgi?id=1303842

Thanks, let's continue there. Re-closing this bug.

Note You need to log in before you can comment on or make changes to this bug.