Bug 1459831 - Migration fails with --rdma-pin-all option
Migration fails with --rdma-pin-all option
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev (Show other bugs)
7.4
x86_64 Linux
high Severity medium
: rc
: ---
Assigned To: Dr. David Alan Gilbert
xianwang
: Regression
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-06-08 06:08 EDT by Dan Zheng
Modified: 2017-07-06 09:36 EDT (History)
13 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-06-09 05:51:45 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
source host libvirtd (980.72 KB, text/plain)
2017-06-08 06:08 EDT, Dan Zheng
no flags Details
remote host libvirtd (1.29 MB, text/plain)
2017-06-08 06:09 EDT, Dan Zheng
no flags Details
remote qemu log (16.16 KB, text/plain)
2017-06-08 06:09 EDT, Dan Zheng
no flags Details
local host qemu (16.15 KB, text/plain)
2017-06-08 06:10 EDT, Dan Zheng
no flags Details
guest xxml (3.51 KB, text/plain)
2017-06-08 06:10 EDT, Dan Zheng
no flags Details

  None (edit)
Description Dan Zheng 2017-06-08 06:08:37 EDT
Created attachment 1286096 [details]
source host libvirtd

Description of problem:
Migration fails with --rdma-pin-all option.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.9.0-8.el7.x86_64
libvirt-3.2.0-7.el7.x86_64
3.10.0-679.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Start guest and do migration without --rdma-pin-all. 
   Migration succeeds.

2. Start guest and do migration with --rdma-pin-all.

# virsh migrate --live --migrateuri rdma://192.168.0.2 setusertest --listen-address 0 qemu+ssh://192.168.0.2/system --verbose ***--rdma-pin-all***

error: internal error: qemu unexpectedly closed the monitor: 2017-06-08T09:56:14.612417Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/3 (label charserial0)
dest_init RDMA Device opened: kernel name mlx4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx4_0, transport: (2) Ethernet
Failed to register local dest ram block!
: Cannot allocate memory
2017-06-08T09:56:14.737291Z qemu-kvm: rdma migration: error dest registering ram blocks
2017-06-08T09:56:14.737301Z qemu-kvm: error while loading state for instance 0x0 of device 'ram'
2017-06-08T09:56:14.737461Z qemu-kvm: Early error. Sending error.
2017-06-08T09:56:40.666831Z qemu-kvm: load of migration failed: Operation not permitted


Actual results:
See above

Expected results:
Migration with --rdma-pin-all is successful.

Additional info:

Logs are attached.
Comment 2 Dan Zheng 2017-06-08 06:09 EDT
Created attachment 1286097 [details]
remote host libvirtd
Comment 3 Dan Zheng 2017-06-08 06:09 EDT
Created attachment 1286098 [details]
remote qemu log
Comment 4 Dan Zheng 2017-06-08 06:10 EDT
Created attachment 1286100 [details]
local host qemu
Comment 5 Dan Zheng 2017-06-08 06:10 EDT
Created attachment 1286101 [details]
guest xxml
Comment 6 Dan Zheng 2017-06-08 06:13:14 EDT
This is a regression problem which does not exist in RHEL 7.3
Comment 8 Dr. David Alan Gilbert 2017-06-08 06:50:38 EDT
Hi Dan,
  Can you tell me how much RAM your destination host has and whether it's running any other VMs?
  Does increasing the 'hard_limit' value in the XML help?
Comment 9 Dan Zheng 2017-06-09 05:51:04 EDT
OK. I have to appologize this is a user error due to too small of hard_limit
With below section, the migration succeeds.

  <memory unit='KiB'>1048576</memory>
  <currentMemory unit='KiB'>1048576</currentMemory>
  <memtune>
    <hard_limit unit='KiB'>3145728</hard_limit>
    <swap_hard_limit unit='KiB'>4194304</swap_hard_limit>
  </memtune>

# virsh migrate --live --migrateuri rdma://192.168.100.2 setusertest --listen-address 0 qemu+ssh://192.168.100.2/system --verbose --rdma-pin-allroot@192.168.100.2's password: 
Migration: [100 %]
Comment 10 Dr. David Alan Gilbert 2017-06-09 06:34:01 EDT
Hi Dan,
  Thanks for trying that;  can you tell me whether on 7.3 the original hard_limit value worked?
Comment 11 Dan Zheng 2017-06-11 23:41:58 EDT
Hi David,
Currently I have no RDMA machines with RHEL7.3 for testing. I would like to try once I get them ready.
Comment 12 Dan Zheng 2017-06-26 06:10:55 EDT
Hi, David,

After confirmation, the configuration in original XML like below can work in RHEL7.3. Thanks.

  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <memtune>
    <hard_limit unit='KiB'>2097152</hard_limit>
    <swap_hard_limit unit='KiB'>2097152</swap_hard_limit>
  </memtune>
Comment 13 Dr. David Alan Gilbert 2017-06-29 13:46:11 EDT
(In reply to Dan Zheng from comment #12)
> Hi, David,
> 
> After confirmation, the configuration in original XML like below can work in
> RHEL7.3. Thanks.
> 
>   <memory unit='KiB'>2097152</memory>
>   <currentMemory unit='KiB'>2097152</currentMemory>
>   <memtune>
>     <hard_limit unit='KiB'>2097152</hard_limit>
>     <swap_hard_limit unit='KiB'>2097152</swap_hard_limit>
>   </memtune>


Hi Dan,
  Can you try and figure out what component causes it to change;  for example for me with a 7.4 install and a 7.3 qemu it still fails.  So can you try with 7.3 kernel and 7.4 qemu etc and see which component it is that causes the change.

Thanks.
Comment 14 yanqzhan@redhat.com 2017-07-06 03:41:27 EDT
Hi David,

This is a known change that the hard_limit setting is different between rhel7.3 and rhel7.4. On rhel7.4 the memory hard_limit need be about 2G larger than the memory as our testing experience. There are bugs related to this issue such as Bz1373783. Btw, it's not easy for us to prepare the specific machines and set up env.

For more detail changes info, please confirm with QEMU QEs, that should be a more faster way. 


Thanks.
Comment 15 Dr. David Alan Gilbert 2017-07-06 04:17:31 EDT
(In reply to yanqzhan@redhat.com from comment #14)
> Hi David,
> 
> This is a known change that the hard_limit setting is different between
> rhel7.3 and rhel7.4. On rhel7.4 the memory hard_limit need be about 2G
> larger than the memory as our testing experience. There are bugs related to
> this issue such as Bz1373783. Btw, it's not easy for us to prepare the
> specific machines and set up env.
> 
> For more detail changes info, please confirm with QEMU QEs, that should be a
> more faster way. 

I can't find any more details about it;  if it has doubled we need to understand why. bz 1373783 is just a documentation bug, it doesn't help - can you please provide some more information about what is known here.

> 
> 
> Thanks.
Comment 16 yanqzhan@redhat.com 2017-07-06 09:29:49 EDT
Sorry, it's misunderstanding between our group discussion. The larger hard limit requirement exists in previous product, refer to BZ1160997, BZ1046833. Maybe the xml configuration in comment12 (which mem equals to hard limit) needs further confirmation.
Comment 17 Dr. David Alan Gilbert 2017-07-06 09:36:34 EDT
OK, that's fine - I'm only worried if it's a regression where the amount needed suddenly increases a lot.

Note You need to log in before you can comment on or make changes to this bug.