Bug 1039539 - warning and hang in ib DMA code when doing NFSoRDMA to server
Summary: warning and hang in ib DMA code when doing NFSoRDMA to server
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: kernel
Version: 7.0
Hardware: Unspecified
OS: Unspecified
medium
unspecified
Target Milestone: rc
: ---
Assignee: Sachin Prabhu
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 1171868 1255429 1289204
TreeView+ depends on / blocked
 
Reported: 2013-12-09 11:39 UTC by Jeff Layton
Modified: 2016-11-28 08:33 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-28 08:33:02 UTC
Target Upstream Version:


Attachments (Terms of Use)
warning messages from v3.13.3 kernel (131.46 KB, text/plain)
2014-02-18 14:27 UTC, Jeff Layton
no flags Details

Description Jeff Layton 2013-12-09 11:39:10 UTC
Easily triggerable warning that's followed by what seems to be a total hang of the system:

[226665.335765] ------------[ cut here ]------------
[226665.340578] WARNING: at drivers/iommu/amd_iommu.c:2692 dma_ops_domain_unmap.part.11+0x62/0x70()
[226665.349496] Modules linked in: svcrdma 8021q garp stp mrp llc mlx4_ib mlx4_en sg ext4 mbcache jbd2 ipmi_devintf dcdbas dm_mirror dm_region_hash dm_log dm_mod ib_iser libiscsi scsi_transport_iscsi mlx4_core ib_srpt target_core_mod ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm kvm_amd ib_uverbs kvm microcode raid0 pcspkr ib_umad amd64_edac_mod rdma_cm sp5100_tco edac_mce_amd serio_raw k10temp edac_core i2c_piix4 ib_cm iw_cm ipmi_si ipmi_msghandler bnx2 ib_addr shpchp ib_sa acpi_power_meter ib_mad acpi_cpufreq ib_core mperf nfsd auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm mpt2sas raid_class i2c_core scsi_transport_sas
[226665.417560] CPU: 7 PID: 2936 Comm: nfsd Not tainted 3.10.0-57.el7.x86_64 #1
[226665.424717] Hardware name: Dell Inc. PowerEdge R515/03X0MN, BIOS 2.0.2 10/22/2012
[226665.432400]  0000000000000009 ffff88081eee3d08 ffffffff815b508d ffff88081eee3d40
[226665.440052]  ffffffff81058c31 ffff8807f039aec8 0000000000000000 0000000000001000
[226665.447702]  000000000ebd9000 ffff880810d8f000 ffff88081eee3d50 ffffffff81058d0a
[226665.455355] Call Trace:
[226665.457927]  <IRQ>  [<ffffffff815b508d>] dump_stack+0x19/0x1b
[226665.463891]  [<ffffffff81058c31>] warn_slowpath_common+0x61/0x80
[226665.470080]  [<ffffffff81058d0a>] warn_slowpath_null+0x1a/0x20
[226665.476094]  [<ffffffff8147f1f2>] dma_ops_domain_unmap.part.11+0x62/0x70
[226665.482990]  [<ffffffff814806d3>] __unmap_single.isra.19+0xa3/0x110
[226665.489445]  [<ffffffff81480e61>] unmap_page+0x51/0x70
[226665.494760]  [<ffffffffa04d0e63>] svc_rdma_put_frmr+0x153/0x180 [svcrdma]
[226665.501746]  [<ffffffffa04d1019>] sq_cq_reap+0x189/0x200 [svcrdma]
[226665.508113]  [<ffffffffa04d10b8>] dto_tasklet_func+0x28/0x1c0 [svcrdma]
[226665.514921]  [<ffffffff810ec161>] ? handle_irq_event_percpu+0xe1/0x1e0
[226665.521642]  [<ffffffff8106168e>] tasklet_action+0x6e/0x110
[226665.527390]  [<ffffffff81061827>] __do_softirq+0xf7/0x290
[226665.532965]  [<ffffffff815c681c>] call_softirq+0x1c/0x30
[226665.538446]  <EOI>  [<ffffffff810149e5>] do_softirq+0x55/0x90
[226665.544401]  [<ffffffff81060de4>] local_bh_enable_ip+0x94/0xa0
[226665.550416]  [<ffffffff815bc35b>] _raw_spin_unlock_bh+0x1b/0x40
[226665.556566]  [<ffffffffa025916f>] svc_get_next_xprt+0x14f/0x2e0 [sunrpc]
[226665.563463]  [<ffffffff8108f670>] ? wake_up_state+0x20/0x20
[226665.569240]  [<ffffffffa0259462>] svc_recv+0xc2/0x5b0 [sunrpc]
[226665.575273]  [<ffffffffa028070d>] nfsd+0xad/0x130 [nfsd]
[226665.580770]  [<ffffffffa0280660>] ? nfsd_destroy+0x80/0x80 [nfsd]
[226665.591739]  [<ffffffff8107eb60>] kthread+0xc0/0xd0
[226665.601444]  [<ffffffff8107eaa0>] ? kthread_create_on_node+0x110/0x110
[226665.612833]  [<ffffffff815c4e2c>] ret_from_fork+0x7c/0xb0
[226665.623082]  [<ffffffff8107eaa0>] ? kthread_create_on_node+0x110/0x110
[226665.634372] ---[ end trace 5ef16c9c5fed171c ]---

The warning comes from dma_ops_domain_unmap.

This machine is exporting via RDMA:

/srv/NFSoRDMA-1	172.31.0.0/16(rw,async,insecure,no_root_squash) rdma-*(rw,async,insecure,no_root_squash)

I have it mounted on the client via this fstab entry:

ib0-storage-01:/srv/NFSoRDMA-1	/srv/NFSoRDMA-1	nfs	defaults,rw,rdma,port=20049,rsize=32768,wsize=32768,hard,intr,nfsvers=3,noauto	1 2

...and then I ran this to generate some read traffic from it:

# dd if=/srv/NFSoRDMA-1/RHEL-7.0-20131016.1-Workstation-x86_64-live.iso of=/tmp/RHEL-7.0-20131016.1-Workstation-x86_64-live.iso bs=512 iflag=direct

...immediately after starting the dd, the warning popped up on the server and its serial console froze.

Comment 1 Jeff Layton 2013-12-09 13:17:40 UTC
FWIW, same exact problem on -60.el6 as well...

Comment 2 Jeff Layton 2013-12-11 20:35:33 UTC
I'll grab this for now, but it looks like this is happening down in the bowels of the page mgmt code...

Comment 3 Jeff Layton 2013-12-13 13:09:23 UTC
cc'ing Chuck since he's been looking at the NFSoRDMA code too...

Comment 4 Chuck Lever 2013-12-13 17:17:14 UTC
Nothing the client does should cause a server crash (not a predictive statement, but rather a statement of design goal).  So IMO this is a server bug.  In fact, I think this is probably closer to the silicon than it is to NFS.

What HBA are you using?  Can you also add info about IB software stack version?

If you can reproduce this on upstream kernels and there is a known working configuration, try to bisect the server's kernel.

Comment 5 Jeff Layton 2013-12-13 18:02:55 UTC
Bear with me. I'm not at all familiar with this code yet, and it's quite complex...

Agreed, this is definitely a server bug. I just posted the details about the reproducer since it is reproducible. Note that I'm doing <PAGE_SIZE writes here. I need to go back and try to do page-sized DIOs from the client and see if that works around the bug. If it does, then that may tell us something about it.

It does seem like something in the bowels of the rdma code, but it's also possible that the RPC layer is passing down a bogus pagelist or something.

I think both the client and server are using Mellanox cards in this case.

On the software stack, I'm not sure. This is RHEL7 so it's fairly close to current Fedora. If you know the packages you want versions of, let me know. Doug may be able to tell us more.

Comment 6 Jeff Layton 2013-12-16 11:09:30 UTC
If I do the same dd command and use bs=4096, then there is no oops on the server. That's probably significant...

Comment 7 Jeff Layton 2013-12-16 11:32:42 UTC
...and yeah, confirmed. This IB network is:

    FDR fabric using Mellanox 36 port 56GBit/s SwitchX model SX6036

These hosts are multiply connected with different fabrics, so maybe I'll give a go with a different fabric/card driver when I get a few mins.

Comment 8 Jeff Layton 2013-12-16 18:12:47 UTC
One thing that looks suspect here:

static void frmr_unmap_dma(struct svcxprt_rdma *xprt,
                           struct svc_rdma_fastreg_mr *frmr)
{
        int page_no;
        for (page_no = 0; page_no < frmr->page_list_len; page_no++) {
                dma_addr_t addr = frmr->page_list->page_list[page_no];
                if (ib_dma_mapping_error(frmr->mr->device, addr))
                        continue;
                atomic_dec(&xprt->sc_dma_used);
                ib_dma_unmap_page(frmr->mr->device, addr, PAGE_SIZE,
                                  frmr->direction);
        }
}


...the hardcoded PAGE_SIZE in there looks suspicious. I wonder if we always know that this is a page length or not?

Comment 9 Jeff Layton 2013-12-16 18:54:19 UTC
So dma_ops_domain_unmap has this:

        pte += PM_LEVEL_INDEX(0, address);

        WARN_ON(!*pte);

        *pte = 0ULL;

...so maybe this means that there was a double unmapping, or that the pte ID'ed in that function was wrong...

Comment 10 Jeff Layton 2014-01-10 15:58:07 UTC
So if I add a "count=1" to the dd command then it all works fine. Same up to about count=3, but once I hit count=4 it fails. My suspicion at this point is that the bug is due to something like this:

We get a 512b READ request on the server. Server then maps the corresponding page on the client and starts writing to it. During this we get another 512b request on the client and the server does the same thing, but it's mapping the same page. Eventually the first READ request is finished and the server goes to remove the mapping but the other requests aren't done yet and the DMA mapping is ripped out from under them.

AFAICT, svc_rdma_get_frmr/svc_rdma_put_frmr don't do any refcounting so there doesn't seem to be any way to know whether more than one request is referencing it. Adding that refcounting doesn't look *too* hard, fortunately...

Comment 11 Jeff Layton 2014-01-10 20:43:47 UTC
I can reproduce the same problem on f20 as well. That kernel however gives some interesting warnings before the WARN pops. Pretty much the same problem though:

[ 4545.538763] ------------[ cut here ]------------
[ 4545.538810] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x00000000111df000 flags=0x0020]
[ 4545.538816] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x00000000111df0e0 flags=0x0020]
[ 4545.538820] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x00000000111df040 flags=0x0020]
[ 4545.538824] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x00000000111df0c0 flags=0x0020]
[ 4545.538828] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x00000000111df080 flags=0x0020]
[ 4545.597252] WARNING: CPU: 0 PID: 0 at drivers/iommu/amd_iommu.c:2694 dma_ops_domain_unmap.part.11+0x62/0x70()
[ 4545.607307] Modules linked in: svcrdma cfg80211 rfkill ib_iser libiscsi scsi_transport_iscsi ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs kvm_amd ipmi_devintf mlx5_ib kvm ib_umad bnx2 ipmi_si mlx5_core amd64_edac_mod dcdbas microcode rdma_cm ipmi_msghandler edac_core sp5100_tco shpchp acpi_power_meter serio_raw i2c_piix4 k10temp ib_cm edac_mce_amd acpi_cpufreq iw_cm ib_addr ib_sa ib_mad ib_core nfsd auth_rpcgss nfs_acl lockd sunrpc mgag200 i2c_algo_bit drm_kms_helper ata_generic ttm pata_acpi drm i2c_core pata_atiixp
[ 4545.656796] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.6-300.fc20.x86_64 #1
[ 4545.664210] Hardware name: Dell Inc. PowerEdge R415/0GXH08, BIOS 2.0.2 10/22/2012
[ 4545.671799]  0000000000000009 ffff88081ee03d18 ffffffff816630c1 0000000000000000
[ 4545.679358]  ffff88081ee03d50 ffffffff810691dd ffff880812612ef8 0000000000000000
[ 4545.686916]  0000000000001000 00000000111df000 ffff880810bc4c00 ffff88081ee03d60
[ 4545.694474] Call Trace:
[ 4545.696955]  <IRQ>  [<ffffffff816630c1>] dump_stack+0x45/0x56
[ 4545.702824]  [<ffffffff810691dd>] warn_slowpath_common+0x7d/0xa0
[ 4545.708922]  [<ffffffff810692ba>] warn_slowpath_null+0x1a/0x20
[ 4545.714843]  [<ffffffff81538922>] dma_ops_domain_unmap.part.11+0x62/0x70
[ 4545.721645]  [<ffffffff81539e03>] __unmap_single.isra.19+0xa3/0x110
[ 4545.728005]  [<ffffffff8153a591>] unmap_page+0x51/0x70
[ 4545.733225]  [<ffffffffa0359f3b>] svc_rdma_unmap_dma+0xdb/0xf0 [svcrdma]
[ 4545.740026]  [<ffffffffa035b17e>] dto_tasklet_func+0xee/0x1c0 [svcrdma]
[ 4545.746741]  [<ffffffff81010000>] ? xen_smp_intr_free+0xf0/0x1d0
[ 4545.752836]  [<ffffffff8106e26e>] tasklet_action+0x6e/0x110
[ 4545.758489]  [<ffffffff8106e747>] __do_softirq+0xf7/0x240
[ 4545.763969]  [<ffffffff816739dc>] call_softirq+0x1c/0x30
[ 4545.769362]  [<ffffffff810146a5>] do_softirq+0x55/0x90
[ 4545.774575]  [<ffffffff8106ea25>] irq_exit+0xb5/0xc0
[ 4545.779615]  [<ffffffff816742d6>] do_IRQ+0x56/0xc0
[ 4545.784478]  [<ffffffff81669fad>] common_interrupt+0x6d/0x6d
[ 4545.790219]  <EOI>  [<ffffffff8104e486>] ? native_safe_halt+0x6/0x10
[ 4545.796698]  [<ffffffff8101b0bf>] default_idle+0x1f/0xc0
[ 4545.802088]  [<ffffffff8101b996>] arch_cpu_idle+0x26/0x30
[ 4545.811583]  [<ffffffff810b9845>] cpu_startup_entry+0xe5/0x280
[ 4545.821498]  [<ffffffff816593f7>] rest_init+0x77/0x80
[ 4545.830620]  [<ffffffff81d1cef7>] start_kernel+0x40a/0x415
[ 4545.840161]  [<ffffffff81d1c8f6>] ? repair_env_string+0x5c/0x5c
[ 4545.850048]  [<ffffffff81d1c120>] ? early_idt_handlers+0x120/0x120
[ 4545.860116]  [<ffffffff81d1c5de>] x86_64_start_reservations+0x2a/0x2c
[ 4545.870421]  [<ffffffff81d1c6e8>] x86_64_start_kernel+0x108/0x117
[ 4545.880309] ---[ end trace 24fce75e1a28ae2f ]---

Comment 12 Jeff Layton 2014-01-13 13:06:31 UTC
...and I can't reproduce this against RHEL6, so it seems likely that something crept in after 2.6.32 to cause this regression. I'll try bisecting to track it down.

Comment 13 Jeff Layton 2014-01-13 19:03:59 UTC
Still unconclusive. I can reproduce the problem only with certain IB drivers, apparently. I checked out a 3rd machine and am unable to reproduce the hang on it.

At this point, I think I need help from someone who understands IB better than I do...

Comment 14 Jeff Layton 2014-01-13 19:18:44 UTC
The one where I can reproduce the bug has this card:

    01:00.0 InfiniBand: Mellanox Technologies MT27600 [Connect-IB]

...the other one has this:

    01:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode) (rev a0)

Apparently, RHEL6 doesn't have drivers for the Connect-IB card as it's not detected when I do a RHEL6 install on it.

I suspect the reason I can't reproduce it on those other cards is that they may not support FRMR?

Comment 15 Jeff Layton 2014-01-15 18:51:14 UTC
Ok, finally started doing some debugging. The difference is that the two hosts are using completely different codepaths. The problematic one is using the "fastreg" codepath, whereas the other is not.

Now to examine the differences and see if I can determine why it's problematic.

Comment 16 Jeff Layton 2014-01-15 19:28:10 UTC
Not directly related to this bug, but the client also doesn't seem to cope well with the server crashing and coming back:

RPC:       xprt_rdma_close: closing
RPC:       disconnected transport ffff880821c25000
RPC:       rpcrdma_conn_upcall: disconnected: 172.31.0.204:20049 (ep 0xffff880821c255e0 event 0xa)
RPC:       rpcrdma_conn_upcall: disconnected
rpcrdma: connection to 172.31.0.204:20049 closed (-103)
RPC:       rpcrdma_ep_disconnect: after wait, disconnected
RPC:       wake_up_first(ffff880821c25190 "xprt_sending")
RPC:       rpcrdma_conn_upcall: unexpected CM event 15

...that event 15 in the last line is (I think) RDMA_CM_EVENT_TIMEWAIT_EXIT, which doesn't seem to be handled properly by rpcrdma_conn_upcall.

Comment 19 Jeff Layton 2014-01-29 15:48:51 UTC
So, my theory on this bug is that we map a page in order to RDMA WRITE it to another client and then it is somehow unmapped more than once.

There is some raciness involved here too and turning up rpc_debug tends to throw off the timing. So, I did a Q&D patch that converts dfprintk() macro to use trace_printk() to lower the overhead, and added some debugging around the ib_dma_map_page and ib_dma_unmap_page (basically I just wrapped those functions and added some dprintks to the wrappers).

I then ran the test from a client that had mounted it:

    # dd if=/file/on/nfsordma of=/dev/null bs=512 count=4 iflag=direct

...that eventually gave me the typical WARN pop on the host that uses FRMR:

[  175.144493] ------------[ cut here ]------------
[  175.144540] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x0000000010a35000 flags=0x0020]
[  175.144546] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x0000000010a350c0 flags=0x0020]
[  175.144550] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x0000000010a35080 flags=0x0020]
[  175.144553] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x0000000010a35040 flags=0x0020]
[  175.144557] AMD-Vi: Event logged [IO_PAGE_FAULT device=01:00.0 domain=0x000f address=0x0000000010a350e0 flags=0x0020]
[  175.202993] WARNING: CPU: 0 PID: 0 at drivers/iommu/amd_iommu.c:2694 dma_ops_domain_unmap.part.11+0x62/0x70()
[  175.213048] Modules linked in: svcrdma(OF) cfg80211 rfkill ib_iser libiscsi scsi_transport_iscsi ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm mlx5_ib ib_ucm ib_uverbs ib_umad mlx5_core kvm_amd amd64_edac_mod rdma_cm edac_core kvm sp5100_tco ipmi_devintf dcdbas bnx2 microcode serio_raw k10temp edac_mce_amd ib_cm i2c_piix4 shpchp iw_cm ib_addr ib_sa ipmi_si ipmi_msghandler ib_mad acpi_power_meter ib_core acpi_cpufreq nfsd(OF) auth_rpcgss(OF) nfs_acl lockd(OF) sunrpc(OF) mgag200 i2c_algo_bit drm_kms_helper ata_generic ttm pata_acpi drm i2c_core pata_atiixp
[  175.264433] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GF          O 3.12.7-300.fc20.x86_64 #1
[  175.272817] Hardware name: Dell Inc. PowerEdge R415/0GXH08, BIOS 2.0.2 10/22/2012
[  175.280410]  0000000000000009 ffff88081ee03ce0 ffffffff8166799b 0000000000000000
[  175.287981]  ffff88081ee03d18 ffffffff8106921d ffff8807f58a41a8 0000000000000000
[  175.295550]  0000000000001000 0000000010a35000 ffff880813687800 ffff88081ee03d28
[  175.303120] Call Trace:
[  175.305606]  <IRQ>  [<ffffffff8166799b>] dump_stack+0x45/0x56
[  175.311483]  [<ffffffff8106921d>] warn_slowpath_common+0x7d/0xa0
[  175.317582]  [<ffffffff810692fa>] warn_slowpath_null+0x1a/0x20
[  175.323505]  [<ffffffff8153dcc2>] dma_ops_domain_unmap.part.11+0x62/0x70
[  175.330307]  [<ffffffff8153f1a3>] __unmap_single.isra.19+0xa3/0x110
[  175.336670]  [<ffffffff8153f931>] unmap_page+0x51/0x70
[  175.341893]  [<ffffffffa0437049>] svc_rdma_unmap_page+0x89/0xe0 [svcrdma]
[  175.348784]  [<ffffffffa043712c>] svc_rdma_unmap_dma+0x8c/0xb0 [svcrdma]
[  175.355588]  [<ffffffffa04382fe>] dto_tasklet_func+0xee/0x1c0 [svcrdma]
[  175.362305]  [<ffffffff81010000>] ? xen_smp_intr_free+0xf0/0x1d0
[  175.368403]  [<ffffffff8106e2ae>] tasklet_action+0x6e/0x110
[  175.374059]  [<ffffffff8106e787>] __do_softirq+0xf7/0x240
[  175.379542]  [<ffffffff8167821c>] call_softirq+0x1c/0x30
[  175.384935]  [<ffffffff810146a5>] do_softirq+0x55/0x90
[  175.390153]  [<ffffffff8106ea65>] irq_exit+0xb5/0xc0
[  175.395194]  [<ffffffff81678b16>] do_IRQ+0x56/0xc0
[  175.400060]  [<ffffffff8166e7ed>] common_interrupt+0x6d/0x6d
[  175.405800]  <EOI>  [<ffffffff8104e486>] ? native_safe_halt+0x6/0x10
[  175.412287]  [<ffffffff8101b0bf>] default_idle+0x1f/0xc0
[  175.421714]  [<ffffffff8101b996>] arch_cpu_idle+0x26/0x30
[  175.431200]  [<ffffffff810b9905>] cpu_startup_entry+0xe5/0x280
[  175.441111]  [<ffffffff8165dcf7>] rest_init+0x77/0x80
[  175.450215]  [<ffffffff81d1def7>] start_kernel+0x40a/0x415
[  175.459661]  [<ffffffff81d1d8f6>] ? repair_env_string+0x5c/0x5c
[  175.469468]  [<ffffffff81d1d120>] ? early_idt_handlers+0x120/0x120
[  175.479496]  [<ffffffff81d1d5de>] x86_64_start_reservations+0x2a/0x2c
[  175.489745]  [<ffffffff81d1d6e8>] x86_64_start_kernel+0x108/0x117
[  175.499582] ---[ end trace c61a3ae8cf3c7e75 ]---

...and then 1s later, an oops in svc_rdma_xdr_encode_error():

[  175.508036] general protection fault: 0000 [#1] SMP 
[  175.865830] Modules linked in: svcrdma(OF) cfg80211 rfkill ib_iser libiscsi scsi_transport_iscsi ib_srp scsi_transport_srp scsi_tgt ib_ipoib rdma_ucm mlx5_ib ib_ucm ib_uverbs ib_umad mlx5_core kvm_amd amd64_edac_mod rdma_cm edac_core kvm sp5100_tco ipmi_devintf dcdbas bnx2 microcode serio_raw k10temp edac_mce_amd ib_cm i2c_piix4 shpchp iw_cm ib_addr ib_sa ipmi_si ipmi_msghandler ib_mad acpi_power_meter ib_core acpi_cpufreq nfsd(OF) auth_rpcgss(OF) nfs_acl lockd(OF) sunrpc(OF) mgag200 i2c_algo_bit drm_kms_helper ata_generic ttm pata_acpi drm i2c_core pata_atiixp
[  176.496096] CPU: 2 PID: 1246 Comm: nfsd Tainted: GF       W  O 3.12.7-300.fc20.x86_64 #1
[  176.630502] Hardware name: Dell Inc. PowerEdge R415/0GXH08, BIOS 2.0.2 10/22/2012
[  176.762797] task: ffff8807f5cd0840 ti: ffff8807f5dee000 task.ti: ffff8807f5dee000
[  176.893238] RIP: 0010:[<ffffffffa0438c75>]  [<ffffffffa0438c75>] svc_rdma_xdr_encode_error+0x5/0x50 [svcrdma]
[  177.026320] RSP: 0018:ffff8807f5defd80  EFLAGS: 00010286
[  177.151415] RAX: ffff880000000000 RBX: dead000000200200 RCX: ffff8807f5cc9000
[  177.276665] RDX: 0000000000000001 RSI: dead000000200200 RDI: ffff8807f809e400
[  177.398001] RBP: ffff8807f5defe08 R08: 0000000000000001 R09: 0000000000000000
[  177.517886] R10: 0000000000004afe R11: ffff8808134c2488 R12: ffff8807f809e400
[  177.636805] R13: ffffea001fd73240 R14: 0000000000000001 R15: ffff88080f915800
[  177.754753] FS:  00007fc84e776700(0000) GS:ffff88081ee40000(0000) knlGS:0000000000000000
[  177.875485] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  177.993077] CR2: 00007fc840002e20 CR3: 00000007fda10000 CR4: 00000000000007e0
[  178.112235] Stack:
[  178.224480]  ffffffffa0438715 005ff00000000000 ffffea001fd73240 ffff88081ee579a0
[  178.345609]  0000000000000000 ffff8807f5defde0 0000000000000297 ffffea001fd73240
[  178.468151]  ffffea002046e7c0 ffff8807f66dc000 00000000000000e4 ffff88080f915800
[  178.590292] Call Trace:
[  178.705934]  [<ffffffffa0438715>] ? svc_rdma_send_error+0x55/0x1f0 [svcrdma]
[  178.829480]  [<ffffffffa043b41a>] svc_rdma_recvfrom+0x4ca/0x10b0 [svcrdma]
[  178.952854]  [<ffffffffa00c60d0>] svc_recv+0x490/0x5d0 [sunrpc]
[  179.075332]  [<ffffffff81077fdb>] ? recalc_sigpending+0x1b/0x50
[  179.197764]  [<ffffffffa010570d>] nfsd+0xad/0x130 [nfsd]
[  179.318980]  [<ffffffffa0105660>] ? nfsd_destroy+0x80/0x80 [nfsd]
[  179.441501]  [<ffffffff8108b110>] kthread+0xc0/0xd0
[  179.562652]  [<ffffffff8108b050>] ? insert_kthread_work+0x40/0x40
[  179.686362]  [<ffffffff816768bc>] ret_from_fork+0x7c/0xb0
[  179.809329]  [<ffffffff8108b050>] ? insert_kthread_work+0x40/0x40
[  179.933761] Code: c3 48 8b 35 4e 5a 00 00 48 89 fa 31 c0 48 c7 c7 4b 8c 43 a0 e8 3d 47 ce e0 e9 52 ff ff ff 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <8b> 06 55 83 fa 01 48 89 e5 0f c8 89 01 8b 46 04 0f c8 89 41 04 
[  180.206129] RIP  [<ffffffffa0438c75>] svc_rdma_xdr_encode_error+0x5/0x50 [svcrdma]
[  180.338961]  RSP <ffff8807f5defd80>
[  180.466419] ---[ end trace c61a3ae8cf3c7e76 ]---


...looking at the original warning, we see this value on the stack:

    0000000010a35000

...which is the address shown in the first line of the WARN messages. If I then go back through my debug output, I see this:

            nfsd-1240  [001] ....   137.007907: svc_rdma_map_page: svc_rdma_map_page: dev=0xffff880808660000 addr=0x10a35000 page=0xffffea002046e7c0 offset=0 size=4096 dir=2
          <idle>-0     [000] ..s.   175.144489: svc_rdma_unmap_page: svc_rdma_unmap_page: dev=0xffff880808660000 addr=0x10a35000 size=4096 dir=2

...so the mapping and unmapping seem to have a 1:1 correspondence there, so it doesn't seem to be that the RPC layer is unmapping more often than it should. Does FRMR do some sort of "automatic" unmapping of the pages?

I think I need to fix my debugging to prove this point one way or the other...

Comment 20 Jeff Layton 2014-02-18 14:27:28 UTC
Created attachment 864576 [details]
warning messages from v3.13.3 kernel

Updated the box where I'm seeing the panic to 3.13.3-201.fc20.x86_64. I still see the WARN messages, but I didn't see things hang. Note that this output has some extra debug messages. I added some wrappers around ib_dma_map_page and ib_dma_unmap_page called svc_rdma_map_page and svc_rdma_unmap_page, and put some dprintks in there to show the arguments and results.

Comment 21 Jeff Layton 2014-02-18 14:56:30 UTC
One thing that looks strange though again I don't really understand this code that well. Here is the mapping phase for some of the pages:

[  149.426913] svc_rdma_map_page: dev=0xffff880811340000 addr=0x11b4b000 page=0xffffea0000dce800 offset=0 size=4096 dir=1
[  149.426917] svc_rdma_map_page: dev=0xffff880811340000 addr=0x11b4c800 page=0xffffea0002e52a40 offset=2048 size=2048 dir=1
[  149.426921] svc_rdma_map_page: dev=0xffff880811340000 addr=0x11b4d000 page=0xffffea001fcc6480 offset=0 size=512 dir=1
[  149.426956] svc_rdma_map_page: dev=0xffff880811340000 addr=0x11b4f000 page=0xffffea0000dc8600 offset=0 size=64 dir=1

...and here is the unmapping:

[  149.426971] svc_rdma_unmap_page: dev=0xffff880811340000 addr=0x11b4f000 size=64 dir=1
[  149.426974] svc_rdma_unmap_page: dev=0xffff880811340000 addr=0x11b4b000 size=4096 dir=1
[  149.426977] svc_rdma_unmap_page: dev=0xffff880811340000 addr=0x11b4c800 size=4096 dir=1
[  149.426980] svc_rdma_unmap_page: dev=0xffff880811340000 addr=0x11b4d000 size=4096 dir=1

...the last one triggers the WARN. The thing I notice is that the size seems to be different in the unmapping and the mapping. This doesn't always seem to trigger the issue, but maybe the unmapping request is treading into an area that wasn't originally mapped?

Comment 23 Ludek Smid 2014-06-26 10:50:04 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Comment 24 Ludek Smid 2014-06-26 11:14:48 UTC
The comment above is incorrect. The correct version is bellow.
I'm sorry for any inconvenience.
---------------------------------------------------------------

This request was NOT resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you need
to escalate this bug.

Comment 25 Larry Troan 2015-11-06 21:02:15 UTC
Likely a DUP of Bug 1272171. May also be a DUP of Bug 1262728.


Note You need to log in before you can comment on or make changes to this bug.