Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
DescriptionDr. David Alan Gilbert
2016-08-19 11:23:41 UTC
Description of problem:
RDMA migration on a chelsio T520-CR device times out with rdma-pin-all=on
but works with pin-all off
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
Additional info:
[root@rdma-dev-13 ~]$ ./rdma-test
Starting src
PID TTY TIME CMD
3678 pts/0 00:00:00 qemu-kvm
Starting dst
PID TTY TIME CMD
3689 pts/0 00:00:00 qemu-kvm
QEMU 2.6.0 monitor - type 'help' for more information
(qemu) info status
VM status: running
Found: VM status: running
QEMU 2.6.0 monitor - type 'help' for more information
(qemu) info status
VM status: paused (inmigrate)
Found: VM status: paused (inmigrate)
Good - both qemu's running
(qemu) migrate_set_speed 100G
(qemu) migrate rdma:172.31.50.43:4444
source_resolve_host RDMA Device opened: kernel name cxgb4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/cxgb4_0, transport: (2) Ethernet
dest_init RDMA Device opened: kernel name cxgb4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/cxgb4_0, transport: (2) Ethernet
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off
Migration status: completed
Found: Migration status: completed
(qemu) info status
VM status: running
Found: VM status: running
passed pin_all=false
qemu-kvm: terminating on signal 15 from pid 3669
qemu-kvm: terminating on signal 15 from pid 3669
Starting src
PID TTY TIME CMD
3763 pts/0 00:00:00 qemu-kvm
Starting dst
PID TTY TIME CMD
3774 pts/0 00:00:00 qemu-kvm
QEMU 2.6.0 monitor - type 'help' for more information
(qemu) info status
VM status: running
Found: VM status: running
QEMU 2.6.0 monitor - type 'help' for more information
(qemu) info status
VM status: paused (inmigrate)
Found: VM status: paused (inmigrate)
Good - both qemu's running
(qemu) migrate_set_speed 100G
(qemu) migrate_set_capability rdma-pin-all on
source_resolve_host RDMA Device opened: kernel name cxgb4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/cxgb4_0, transport: (2) Ethernet
dest_init RDMA Device opened: kernel name cxgb4_0 uverbs device name uverbs0, infiniband_verbs class device path /sys/class/infiniband_verbs/uverbs0, infiniband class device path /sys/class/infiniband/cxgb4_0, transport: (2) Ethernet
(qemu) migrate rdma:172.31.50.43:4444
Timeout waiting for Migration status: completed
qemu-kvm: terminating on signal 15 from pid 3669
qemu-kvm: terminating on signal 15 from pid 3669
looks like is at:
#0 0x00002b3928cc349d in read () from /lib64/libpthread.so.0
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#1 0x00002b392757c063 in ibv_get_cq_event () from /lib64/libibverbs.so.1
No symbol table info available.
#2 0x00002b391e26a8f8 in qemu_rdma_block_for_wrid ()
No symbol table info available.
#3 0x00002b391e26cf8f in qemu_rdma_registration_stop ()
No symbol table info available.
#4 0x00002b391e266a4b in ram_control_after_iterate ()
No symbol table info available.
#5 0x00002b391e0dbb0e in ram_save_iterate ()
No symbol table info available.
#6 0x00002b391e0e0733 in qemu_savevm_state_iterate ()
No symbol table info available.
(Not fixed by yee-oldee rdma-race-fix)
Comment 2Dr. David Alan Gilbert
2016-09-05 11:21:36 UTC
(In reply to qianqianzhu from comment #1)
> Hi David,
>
> Just to confirm, is it a device specific issue? Since rdma works well with
> mlx5 card according to QE's test, test env see
> https://bugzilla.redhat.com/show_bug.cgi?id=1356959#c4.
>
> Thanks,
> Qianqian
Yes, this bug is cxgb4 specific.
While you say mlx5 works for QE, I have a reliable test case of mlx5 failing, that is why I'm keeping bz 1356959 open.
Dave
(In reply to Dr. David Alan Gilbert from comment #2)
> (In reply to qianqianzhu from comment #1)
> > Hi David,
> >
> > Just to confirm, is it a device specific issue? Since rdma works well with
> > mlx5 card according to QE's test, test env see
> > https://bugzilla.redhat.com/show_bug.cgi?id=1356959#c4.
> >
> > Thanks,
> > Qianqian
>
> Yes, this bug is cxgb4 specific.
>
> While you say mlx5 works for QE, I have a reliable test case of mlx5
> failing, that is why I'm keeping bz 1356959 open.
>
> Dave
Thanks David, Sorry that I was not saying it clearly, I mean mlx5 works well for x86, bz1356959 is ppc only.
(In reply to qianqianzhu from comment #3)
> (In reply to Dr. David Alan Gilbert from comment #2)
> > (In reply to qianqianzhu from comment #1)
> > > Hi David,
> > >
> > > Just to confirm, is it a device specific issue? Since rdma works well with
> > > mlx5 card according to QE's test, test env see
> > > https://bugzilla.redhat.com/show_bug.cgi?id=1356959#c4.
> > >
> > > Thanks,
> > > Qianqian
> >
> > Yes, this bug is cxgb4 specific.
> >
> > While you say mlx5 works for QE, I have a reliable test case of mlx5
> > failing, that is why I'm keeping bz 1356959 open.
> >
> > Dave
>
> Thanks David, Sorry that I was not saying it clearly, I mean mlx5 works well
> for x86, bz1356959 is ppc only.
Double confirmed about https://bugzilla.redhat.com/show_bug.cgi?id=1356959#c4, QE's test result should be: x86 passed rdma with mlx4, and ppc failed rdma with mlx5.
Comment 5Dr. David Alan Gilbert
2016-12-01 13:22:47 UTC
Hmm latest test run is showing it failing with various timeouts with pinall=on across all cards - but different erros on different cards.