Red Hat Bugzilla – Bug 1459831
Migration fails with --rdma-pin-all option
Last modified: 2017-07-06 09:36:34 EDT
Created attachment 1286096 [details]
source host libvirtd
Description of problem:
Migration fails with --rdma-pin-all option.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Start guest and do migration without --rdma-pin-all.
2. Start guest and do migration with --rdma-pin-all.
# virsh migrate --live --migrateuri rdma://192.168.0.2 setusertest --listen-address 0 qemu+ssh://192.168.0.2/system --verbose ***--rdma-pin-all***
error: internal error: qemu unexpectedly closed the monitor: 2017-06-08T09:56:14.612417Z qemu-kvm: -chardev pty,id=charserial0: char device redirected to /dev/pts/3 (label charserial0)
dest_init RDMA Device opened: kernel name mlx4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/mlx4_0, transport: (2) Ethernet
Failed to register local dest ram block!
: Cannot allocate memory
2017-06-08T09:56:14.737291Z qemu-kvm: rdma migration: error dest registering ram blocks
2017-06-08T09:56:14.737301Z qemu-kvm: error while loading state for instance 0x0 of device 'ram'
2017-06-08T09:56:14.737461Z qemu-kvm: Early error. Sending error.
2017-06-08T09:56:40.666831Z qemu-kvm: load of migration failed: Operation not permitted
Migration with --rdma-pin-all is successful.
Logs are attached.
Created attachment 1286097 [details]
remote host libvirtd
Created attachment 1286098 [details]
remote qemu log
Created attachment 1286100 [details]
local host qemu
Created attachment 1286101 [details]
This is a regression problem which does not exist in RHEL 7.3
Can you tell me how much RAM your destination host has and whether it's running any other VMs?
Does increasing the 'hard_limit' value in the XML help?
OK. I have to appologize this is a user error due to too small of hard_limit
With below section, the migration succeeds.
# virsh migrate --live --migrateuri rdma://192.168.100.2 setusertest --listen-address 0 qemu+ssh://192.168.100.2/system --verbose --email@example.com's password:
Migration: [100 %]
Thanks for trying that; can you tell me whether on 7.3 the original hard_limit value worked?
Currently I have no RDMA machines with RHEL7.3 for testing. I would like to try once I get them ready.
After confirmation, the configuration in original XML like below can work in RHEL7.3. Thanks.
(In reply to Dan Zheng from comment #12)
> Hi, David,
> After confirmation, the configuration in original XML like below can work in
> RHEL7.3. Thanks.
> <memory unit='KiB'>2097152</memory>
> <currentMemory unit='KiB'>2097152</currentMemory>
> <hard_limit unit='KiB'>2097152</hard_limit>
> <swap_hard_limit unit='KiB'>2097152</swap_hard_limit>
Can you try and figure out what component causes it to change; for example for me with a 7.4 install and a 7.3 qemu it still fails. So can you try with 7.3 kernel and 7.4 qemu etc and see which component it is that causes the change.
This is a known change that the hard_limit setting is different between rhel7.3 and rhel7.4. On rhel7.4 the memory hard_limit need be about 2G larger than the memory as our testing experience. There are bugs related to this issue such as Bz1373783. Btw, it's not easy for us to prepare the specific machines and set up env.
For more detail changes info, please confirm with QEMU QEs, that should be a more faster way.
(In reply to firstname.lastname@example.org from comment #14)
> Hi David,
> This is a known change that the hard_limit setting is different between
> rhel7.3 and rhel7.4. On rhel7.4 the memory hard_limit need be about 2G
> larger than the memory as our testing experience. There are bugs related to
> this issue such as Bz1373783. Btw, it's not easy for us to prepare the
> specific machines and set up env.
> For more detail changes info, please confirm with QEMU QEs, that should be a
> more faster way.
I can't find any more details about it; if it has doubled we need to understand why. bz 1373783 is just a documentation bug, it doesn't help - can you please provide some more information about what is known here.
Sorry, it's misunderstanding between our group discussion. The larger hard limit requirement exists in previous product, refer to BZ1160997, BZ1046833. Maybe the xml configuration in comment12 (which mem equals to hard limit) needs further confirmation.
OK, that's fine - I'm only worried if it's a regression where the amount needed suddenly increases a lot.