RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1372153 - migration failed from rhel7.3 to rhel7.0 when guest with numa setting
Summary: migration failed from rhel7.3 to rhel7.0 when guest with numa setting
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.3
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Martin Kletzander
QA Contact: zhe peng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-01 04:04 UTC by yafu
Modified: 2017-05-12 14:49 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-05-12 14:49:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
libvirtd.log and qemu.log both on source and target host (116.50 KB, application/x-gzip)
2016-09-05 03:04 UTC, yafu
no flags Details
The guest XML (8.64 KB, text/html)
2016-10-08 08:57 UTC, yafu
no flags Details

Description yafu 2016-09-01 04:04:40 UTC
Description of problem:
migration failed from rhel7.3 to rhel7.0 when guest with numa setting


Version-Release number of selected component (if applicable):
Source:
libvirt-2.0.0-6.el7.x86_64
qemu-kvm-rhev-2.6.0-22.el7.x86_64

target:
libvirt-1.1.1-29.el7_0.7.x86_64
qemu-kvm-rhev-1.5.3-60.el7_0.10.x86_64

How reproducible:
100%

Steps to reproduce:
1.Start a guest with numa setting:
  #virsh dumpxml mig1
   ...
   <cpu>
     ...
     <numa>
      <cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
    </numa>
    ...
  </cpu>
  ...

2.Migrate to the target host:
# virsh migrate mig1 qemu+ssh://10.66.144.76/system --live --verbose
root.144.76's password:
error: operation failed: migration job: unexpectedly failed

Actual results:
Migration failed.

Expected results:
Migration complete correctly.

Additional info:
1.Error in the qemu log on the target host:
 #cat /var/log/libvirt/qemu/mig1.log
 ...
 Unknown ramblock "/objects/ram-node0", cannot accept migration
qemu: warning: error while loading state for instance 0x0 of device 'ram'
load of migration failed
 ...



Additional info:

Comment 2 yafu 2016-09-05 03:03:40 UTC
Correct the xml setting for the guest, the error was caused by numatune:
#virsh dumpxml mig1
   ...
   <cpu>
     ...
     <numa>
      <cell id='0' cpus='0-1' memory='512000' unit='KiB'/>
      <cell id='1' cpus='2-3' memory='512000' unit='KiB'/>
    </numa>
    ...
  </cpu>
...
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
...

Comment 3 yafu 2016-09-05 03:04:42 UTC
Created attachment 1197748 [details]
libvirtd.log and qemu.log both on source and  target host

Comment 4 Martin Kletzander 2016-09-23 13:07:36 UTC
Do you have some matrix of migrations from/to which work and which don't?  I'm guessing if this doesn't work, then 7.0 -> 7.3 doesn't work either, also 7.2 <-> 7.3 is broken both ways, right?  Make sure you have (at minimum):

<memoryBacking>
  <hugepages/>
</memoryBacking>
<cpu>
  <numa>
    <cell .../>
  </numa>
</cpu>

but no <numatune>, neither nodeset= in <hugepages/>.

Comment 5 Martin Kletzander 2016-09-29 13:44:57 UTC
Fixed upstream with commit v2.3.0-rc1-10-gff3112f3dc2c:

commit ff3112f3dc2c276a7e387ff7bb86f4fbbdf7bf2c
Author: Martin Kletzander <mkletzan>
Date:   Fri Sep 23 11:31:30 2016 +0200

    qemu: Only use memory-backend-file with NUMA if needed

Comment 6 yafu 2016-10-08 08:55:26 UTC
(In reply to Martin Kletzander from comment #4)
> Do you have some matrix of migrations from/to which work and which don't? 
> I'm guessing if this doesn't work, then 7.0 -> 7.3 doesn't work either, also
> 7.2 <-> 7.3 is broken both ways, right?  Make sure you have (at minimum):
> 
> <memoryBacking>
>   <hugepages/>
> </memoryBacking>
> <cpu>
>   <numa>
>     <cell .../>
>   </numa>
> </cpu>
> 
> but no <numatune>, neither nodeset= in <hugepages/>.

Sorry for late. I just come back from holiday.

With the following setting,but no <numatune>, neither nodeset= in <hugepages/>:
<memoryBacking>
  <hugepages/>
</memoryBacking>
<cpu>
  <numa>
    <cell .../>
  </numa>
</cpu>

Test results are as follows:
1.Migration failed from rhel7.3 to rhel7.0, since the qemu cmd line use "memory-backend-file" in the rhel7.3, but it uses "-mem-prealloc -mem-path /dev/hugepages/libvirt/qemu" in rhel7.0.
2.Migration works well from rhel7.0 to rhel7.3, both source and target host are use "-mem-prealloc -mem-path /dev/hugepages/libvirt/qemu".
3.It works well when do migration between rhel7.2 and rhel7.3, since both rhel7.2 and rhel7.3 use "memory-backend-file".

Comment 7 yafu 2016-10-08 08:57:06 UTC
Created attachment 1208308 [details]
The guest XML

Comment 8 yafu 2016-10-08 08:57:44 UTC
Please see the guest XML in the attachment.

Comment 9 Martin Kletzander 2016-10-10 13:47:02 UTC
(In reply to yafu from comment #6)
You are saying that 7.0 <-> 7.2 doesn't work either?  Would you mind checking 7.0 <-> 7.1 as well?  Thanks a lot in advance.

Comment 10 yafu 2016-10-11 06:23:14 UTC
(In reply to Martin Kletzander from comment #9)
> (In reply to yafu from comment #6)
> You are saying that 7.0 <-> 7.2 doesn't work either?  Would you mind
> checking 7.0 <-> 7.1 as well?  Thanks a lot in advance.

1.rhel7.2->rhel7.0 works well now, since Bug 1266856 - Migration from 7.0 to 7.2 failed with numa+hugepage settings is fixed;
2.rhel7.1->rhel7.0 failed with the same error with rhel7.3->rhel7.0;

Comment 11 Martin Kletzander 2016-10-11 10:37:17 UTC
(In reply to yafu from comment #10)
Oh, my bad, I just figured it out.  So Bug 1266856 fixed the scenario with:

<memoryBacking>
  <hugepages/>
</memoryBacking>
<cpu>
  <numa>
    <cell .../>
  </numa>
</cpu>

but what we need to fix here is:

<memoryBacking>
  <hugepages size='...'/>
</memoryBacking>
<cpu>
  <numa>
    <cell .../>
  </numa>
</cpu>

It has different code path and hence it might be beneficial to test both aproaches in the migration matrix, I guess.

Comment 12 Martin Kletzander 2017-05-12 14:49:28 UTC
Any easy fix that would be provided now could actually break newer migration scenarios (rhel7.2 -> rhel7.3).  Since this is corner case and was not reported by any customer, I'm closing this as WONTFIX.  The reasoning behind it is just that we will have less broken things this way then if we "fixed" this particular scenario.  This bug only affects migration from rhel7.0 to newer ones, I believe.


Note You need to log in before you can comment on or make changes to this bug.