Bug 2174482

Summary: [s390x] rdma migration fails
Product: Red Hat Enterprise Linux 9 Reporter: smitterl
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
qemu-kvm sub component: Live Migration QA Contact: virt-qe-z
Status: CLOSED WONTFIX Docs Contact:
Severity: low    
Priority: low CC: bfu, clegoate, fjin, thuth, virt-maint
Version: 9.2Keywords: Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: s390x   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-04-13 09:19:23 UTC Type: Feature Request
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
1:qemu log during migration failure none

Description smitterl 2023-03-01 18:28:37 UTC
Created attachment 1947287 [details]
1:qemu log during migration failure

Description of problem:

I tried the steps that are known to work for rdma live migrations on x86_64.
The migration fails with an error.

Version-Release number of selected component (if applicable):
libvirt-9.0.0-7.el9.s390x

How reproducible:
100%


Steps to Reproduce:

0. Use two systems, each with device
     Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function]
   They must have the same PCHID, which can be checked e.g. via smc_rnics

1. Load necessary kernel modules ib_ipoib mlx5_ib ib_umad rpcrdma ib_srpt ib_iser ib_isert

2. Install necessary packages libibverbs librdmacm infiniband-diags

3. Use ibstat to confirm the ethernet controllers are
     State: Active
     Physical state: LinkUp
     Link layer: Ethernet

4. Assign local address to the network interfaces and confirm they can communicate via ping, e.g.
     ip a add 192.168.100.2/24; ping 192.168.100.3

5. Configure and restart virtqemud, activate all entries for cgroup_device_acl
	"/dev/null", "/dev/full", "/dev/zero",
	"/dev/random", "/dev/urandom",
	"/dev/ptmx", "/dev/kvm", "/dev/kqemu",
	"/dev/rtc","/dev/hpet", "/dev/vfio/vfio",
	"/dev/infiniband/rdma_cm",
	"/dev/infiniband/issm1",
	"/dev/infiniband/umad1",
	"/dev/infiniband/uverbs1"

6. Create shared storage for the VM qcow2 (e.g. NFS: nfs-utils, /etc/export: /nfs *(rw,no_root_squash,async); systemctl restart nfs-server)

7. Turn off firewall

8. Set memory limits on VM
   <memtune>
       <hard_limit unit='KiB'>1048576</hard_limit>
       <swap_hard_limit unit='KiB'>2097152</swap_hard_limit>
   </memtune>

9. Start VM

8. Migrate VM:
   virsh migrate --live --migrateuri rdma://192.168.100.2 rhel7 --listen-address 0 qemu+ssh://192.168.100.2/system --verbose


Actual results:
error: internal error: unable to execute QEMU command 'migrate-incoming': unknown migration protocol: rdma:0:49152

Expected results:
The migration succeeds.

Additional info:
1. The original instructions at hand included more steps that I decided were unnecessary (or didn't work when I tried after building the rpms from srpm)
 a. use opensm to make sure the devices can communicate for step 4; they already could communicate without this
 b. use mstconf to set the link layer type; I couldn't change this so I took the preset Ethernet
2. Attach virtqemud debug log
3. The kernel modules weren't initially built, I'll share a link to the kernel config and my rpms in a private comment

Comment 1 smitterl 2023-03-01 18:29:55 UTC
I'm not sure this ever worked so I'm setting severity low.

Comment 7 smitterl 2023-04-13 09:19:23 UTC
RDMA live migration not supported on s390x.