Bug 1653556

Summary: Unable to migrate the HE VM from one host to another
Product: Red Hat Enterprise Linux 7 Reporter: SATHEESARAN <sasundar>
Component: libvirtAssignee: Michal Privoznik <mprivozn>
Status: CLOSED WORKSFORME QA Contact: Fangge Jin <fjin>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 7.6CC: amukherj, dyuan, fjin, hhan, lhuang, mprivozn, sasundar, xuzhang, yafu, yalzhang
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-09 09:01:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
HostedEngine.xml
none
HostedEngine.log none

Description SATHEESARAN 2018-11-27 06:18:39 UTC
Description of problem:
-----------------------
With RHHI-V environment ( RHV + RHGS hyperconverged ), Unable to migrate HE VM from one host to other.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHEL 7.6 batch update1
libvirt-4.5.0-10.el7_6.3.x86_64
RHV-4.2.7-2

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Deploy RHHI-V
2. Migrate HE VM from one host to other


Actual results:
---------------
VM migration fails

Expected results:
-----------------
VM migration should happen successfully

Additional info:
----------------
This bug was a regression with RHEL 7.6 and was fixed with https://bugzilla.redhat.com/show_bug.cgi?id=1641798 for different issue, but still it seems to be another issue blocking the functionality

Comment 3 SATHEESARAN 2018-11-27 06:24:43 UTC
I see the following in vdsm.log

<snip>

2018-11-27 11:37:37,781+0530 ERROR (migsrc/cb6274a4) [virt.vm] (vmId='cb6274a4-5678-40f7-88a2-874fe8f2a169') internal error: qemu unexpectedly closed the monitor: 2018-11-27T06:07:37.35
5665Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future
2018-11-27T06:07:37.396771Z qemu-kvm: cannot set up guest memory 'pc.ram': Cannot allocate memory (migration:290)
2018-11-27 11:37:38,109+0530 ERROR (migsrc/cb6274a4) [virt.vm] (vmId='cb6274a4-5678-40f7-88a2-874fe8f2a169') Failed to migrate (migration:455)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 437, in _regular_run
    self._startUnderlyingMigration(time.time())
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 509, in _startUnderlyingMigration
    self._perform_with_conv_schedule(duri, muri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 587, in _perform_with_conv_schedule
    self._perform_migration(duri, muri)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 529, in _perform_migration
    self._migration_flags)
  File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98, in f
    ret = attr(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1779, in migrateToURI3
    if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed', dom=self)
libvirtError: internal error: qemu unexpectedly closed the monitor: 2018-11-27T06:07:37.355665Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in NUMA config, ability to start up with partial NUMA mappings is obsoleted and will be removed in future
2018-11-27T06:07:37.396771Z qemu-kvm: cannot set up guest memory 'pc.ram': Cannot allocate memory

</snip>

@Michal, could you help with root causing the issue ?
Let me know, if you need any logs

Comment 4 Han Han 2018-11-27 07:06:13 UTC
(In reply to SATHEESARAN from comment #3)
> I see the following in vdsm.log
> 
> <snip>
> 
> 2018-11-27 11:37:37,781+0530 ERROR (migsrc/cb6274a4) [virt.vm]
> (vmId='cb6274a4-5678-40f7-88a2-874fe8f2a169') internal error: qemu
> unexpectedly closed the monitor: 2018-11-27T06:07:37.35
> 5665Z qemu-kvm: warning: All CPU(s) up to maxcpus should be described in
> NUMA config, ability to start up with partial NUMA mappings is obsoleted and
> will be removed in future
> 2018-11-27T06:07:37.396771Z qemu-kvm: cannot set up guest memory 'pc.ram':
> Cannot allocate memory (migration:290)
It seem a cpu/numa related issue.
Could you please provide the host cpu type of both sides, and the vm xml?
> 2018-11-27 11:37:38,109+0530 ERROR (migsrc/cb6274a4) [virt.vm]
> (vmId='cb6274a4-5678-40f7-88a2-874fe8f2a169') Failed to migrate
> (migration:455)
> Traceback (most recent call last):
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 437,
> in _regular_run
>     self._startUnderlyingMigration(time.time())
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 509,
> in _startUnderlyingMigration
>     self._perform_with_conv_schedule(duri, muri)
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 587,
> in _perform_with_conv_schedule
>     self._perform_migration(duri, muri)
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/migration.py", line 529,
> in _perform_migration
>     self._migration_flags)
>   File "/usr/lib/python2.7/site-packages/vdsm/virt/virdomain.py", line 98,
> in f
>     ret = attr(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py",
> line 130, in wrapper
>     ret = f(*args, **kwargs)
>   File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92,
> in wrapper
>     return func(inst, *args, **kwargs)
>   File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1779, in
> migrateToURI3
>     if ret == -1: raise libvirtError ('virDomainMigrateToURI3() failed',
> dom=self)
> libvirtError: internal error: qemu unexpectedly closed the monitor:
> 2018-11-27T06:07:37.355665Z qemu-kvm: warning: All CPU(s) up to maxcpus
> should be described in NUMA config, ability to start up with partial NUMA
> mappings is obsoleted and will be removed in future
> 2018-11-27T06:07:37.396771Z qemu-kvm: cannot set up guest memory 'pc.ram':
> Cannot allocate memory
> 
> </snip>
> 
> @Michal, could you help with root causing the issue ?
> Let me know, if you need any logs

Comment 5 Fangge Jin 2018-11-27 07:14:53 UTC
Is there hugepage settings for vm? If yes, do you configure hugepage on both src and target hosts?

Comment 6 Yaniv Kaul 2018-11-27 10:37:01 UTC
I assume the libvirt team will need complete logs, mainly the libvirt XML definition (which you should find in VDSM, I hope!)

Comment 7 SATHEESARAN 2018-11-28 11:22:33 UTC
(In reply to Han Han from comment #4)
> It seem a cpu/numa related issue.
> Could you please provide the host cpu type of both sides, and the vm xml?

CPU Type on the source host:
  <cpu>
      <arch>x86_64</arch>
      <model>Haswell-noTSX</model>
      <vendor>Intel</vendor>
      <microcode version='58'/>
      <topology sockets='1' cores='14' threads='2'/>


CPU Type on the destination host:
<cpu>
      <arch>x86_64</arch>
      <model>Haswell-noTSX</model>
      <vendor>Intel</vendor>
      <microcode version='58'/>
      <topology sockets='1' cores='14' threads='2'/>

HostedEngine.xml will be attached to the bug

Comment 8 SATHEESARAN 2018-11-28 11:23:12 UTC
Created attachment 1509449 [details]
HostedEngine.xml

VM xml

Comment 9 SATHEESARAN 2018-11-28 11:26:21 UTC
(In reply to Fangge Jin from comment #5)
> Is there hugepage settings for vm? If yes, do you configure hugepage on both
> src and target hosts?

No, there are no hugepages setup

Comment 10 SATHEESARAN 2018-11-28 11:27:31 UTC
I have also got some other VM on the RHV setup and I could migrate that between hosts. Its only HostedEngine VM that throws up errors while migration.

Comment 11 SATHEESARAN 2018-11-28 11:37:33 UTC
Created attachment 1509453 [details]
HostedEngine.log

Attaching the HostedEngine VM's qemu log - /var/lib/libvirt/qemu/HostedEngine.log

Comment 12 Michal Privoznik 2018-11-28 12:00:58 UTC
(In reply to SATHEESARAN from comment #3)
> I see the following in vdsm.log
> 
> <snip>
> 


> libvirtError: internal error: qemu unexpectedly closed the monitor:
> 2018-11-27T06:07:37.355665Z qemu-kvm: warning: All CPU(s) up to maxcpus
> should be described in NUMA config, ability to start up with partial NUMA
> mappings is obsoleted and will be removed in future

This is just a harmless warning. Basically, you are giving the domain 224 vCPUs but placing only 56 into a NUMA node. What about the remaining 168? But this is not what is making the qemu fail.

> 2018-11-27T06:07:37.396771Z qemu-kvm: cannot set up guest memory 'pc.ram':
> Cannot allocate memory

This is. You are giving the domain 244GB of memory and qemu is telling us that it could allocate that much memory on the destination. Are you sure there is enough of free memory on the destination? What is the output of "virsh freepages --all"?

Comment 16 SATHEESARAN 2019-01-09 09:01:57 UTC
Closing this bug for now, as we couldn't hit it again