Bug 1163953

Summary: No way to turn off rdma-pin-all once it was turned on
Product: Red Hat Enterprise Linux 7 Reporter: Jiri Denemark <jdenemar>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.1CC: dyuan, rbalakri, zhwang, zpeng
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-1.2.8-7.el7 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 07:47:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jiri Denemark 2014-11-13 19:25:59 UTC
Description of problem:

When VIR_MIGRATE_RDMA_PIN_ALL flag is used to enable rdma-pin-all migration capability in QEMU and migration fails, there's no way to turn rdma-pin-all off and retry without it (even calling a migration API without VIR_MIGRATE_RDMA_PIN_ALL flag does not help).

Version-Release number of selected component (if applicable):

libvirt-1.2.8-6.el7

How reproducible:

100%

Steps to Reproduce:
1. misconfigure domain to be unusable with rdma-pin-all:
  <memory unit='KiB'>1048576</memory>
  <memtune>
    <hard_limit unit='KiB'>1048576</hard_limit>
    <swap_hard_limit unit='KiB'>1048576</swap_hard_limit>
  </memtune>
2. virsh migrate --live --rdma-pin-all --migrateuri rdma://... $DOM
3. virsh migrate --live --migrateuri rdma://... $DOM

Actual results:

Step 2 fails immediately and QEMU log on destination host shows something like the following:

source_resolve_host RDMA Device opened: kernel name mlx4_0 uverbs device name
uverbs0, infiniband_verbs class device path
/sys/class/infiniband_verbs/uverbs0, infiniband class device path
/sys/class/infiniband/mlx4_0, transport: (2) Ethernet
Failed to register local dest ram block!
: Cannot allocate memory
RDMA ERROR: receiving remote info!


Step 3 fails in exactly the same way and

  virsh qemu-monitor-command $DOM '{"execute":"query-migrate-capabilities"}'

shows rdma-pin-all is still enabled.


Expected results:

Step 3 should either succeed or fail later and if it fails, query-migrate-capabilities should report rdma-pin-all as disabled.

Additional info:

Comment 1 Jiri Denemark 2014-11-13 20:34:51 UTC
Fixed upstream by v1.2.10-103-gab39338:

commit ab393383c84eb049fc2d75c3e79249ca58062887
Author: Jiri Denemark <jdenemar>
Date:   Mon Nov 10 14:46:26 2014 +0100

    qemu: Always set migration capabilities
    
    We used to set migration capabilities only when a user asked for them in
    flags. This is fine when migration succeeds since the QEMU process is
    killed in the end but in case migration fails or if it's cancelled, some
    capabilities may remain turned on with no way to turn them off. To fix
    that, migration capabilities have to be turned on if requested but
    explicitly turned off in case they were not requested but QEMU supports
    them.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1163953
    Signed-off-by: Jiri Denemark <jdenemar>

Comment 4 zhe peng 2014-11-24 09:44:03 UTC
I can reproduce this.

verify with build:libvirt-1.2.8-8.el7.x86_64

step:
1: misconfigure domain to be unusable with rdma-pin-all:
  <memory unit='KiB'>1048576</memory>
  <memtune>
    <hard_limit unit='KiB'>1048576</hard_limit>
    <swap_hard_limit unit='KiB'>1048576</swap_hard_limit>
  </memtune>

2:start the guest and do migration:
# virsh migrate --live --rdma-pin-all --migrateuri rdma://192.168.100.2 rhel7 qemu+ssh://192.168.100.2/system --verbose
error: operation failed: migration job: unexpectedly failed

3:do migration w/o --rdma-pin-all:
# virsh migrate --live --migrateuri rdma://192.168.100.2 rhel7 qemu+ssh://192.168.100.2/system --verbose
Migration: [100 %]

Comment 6 errata-xmlrpc 2015-03-05 07:47:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0323.html