Hide Forgot
Description of problem: Client crashes when attempting to read over pNFS when mounting with rsize < 4096. Note that this problem does not occur with NFSv4.1 (non-pNFS), NFSv4.0, NFSv3 or NFSv2. Also, this problem occurs only with read, pNFS writes of any size work just fine. Version-Release number of selected component (if applicable): RHEL6.2 Beta 2.6.32-220.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. Mount with rsize < 4096 mount -t nfs4 -o minorversion=1,hard,intr,rsize=1024 server:/ /mnt/t 2. Read file cat /mnt/t/testfile 3. Actual results: The client crashes Expected results: Read should succeed Additional info: RIP: 0010:[<ffffffffa04928df>] [<ffffffffa04928df>] xdr_set_page_base+0x4f/0xb0 [sunrpc] RSP: 0018:ffff88003c6e5638 EFLAGS: 00010246 RAX: 0000160000000000 RBX: ffff88003bc76c00 RCX: 0000000000000000 RDX: 0000000000000058 RSI: 0000000000000000 RDI: ffff88003c6e56c8 RBP: ffff88003c6e5638 R08: 0000000000000010 R09: 0000000000000058 R10: 0000000000000000 R11: 00000000000000c0 R12: ffff88003c6e5758 R13: ffff88003c6e5688 R14: 00000000000000d0 R15: 00000000000080d0 FS: 00007fbbb4f12700(0000) GS:ffff880002200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000010 CR3: 000000003b77e000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process cat (pid: 3146, threadinfo ffff88003c6e4000, task ffff88003b632b00) Stack: ffff88003c6e5648 ffffffffa0492a51 ffff88003c6e5738 ffffffffa00ee049 <0> 0000000000015fc0 ffff88003c2e22c0 00000000ffffffff ffff88003c2e22c0 <0> ffffea0000ca7a58 000000d0814eca40 0000000000000000 0000000000000000 Call Trace: [<ffffffffa0492a51>] xdr_init_decode+0x61/0x70 [sunrpc] [<ffffffffa00ee049>] filelayout_decode_layout+0x99/0x3f0 [nfs_layout_nfsv41_files] [<ffffffffa00ee44d>] filelayout_alloc_lseg+0xad/0x4d0 [nfs_layout_nfsv41_files] [<ffffffffa057be15>] pnfs_layout_process+0xa5/0x360 [nfs] [<ffffffff81090c30>] ? wake_bit_function+0x0/0x50 [<ffffffffa056566d>] nfs4_proc_layoutget+0xcd/0x130 [nfs] [<ffffffffa057d34b>] pnfs_update_layout+0x55b/0x790 [nfs] [<ffffffffa0555597>] nfs_pagein_multi+0xd7/0x1e0 [nfs] [<ffffffffa0555780>] ? readpage_async_filler+0x0/0x160 [nfs] [<ffffffffa05528e9>] nfs_pageio_doio+0x19/0x60 [nfs] [<ffffffffa05529eb>] nfs_pageio_add_request+0x5b/0x130 [nfs] [<ffffffffa0555808>] readpage_async_filler+0x88/0x160 [nfs] [<ffffffffa0555780>] ? readpage_async_filler+0x0/0x160 [nfs] [<ffffffff81126d82>] read_cache_pages+0xa2/0xf0 [<ffffffffa0554ab1>] nfs_readpages+0x171/0x250 [nfs] [<ffffffffa05554c0>] ? nfs_pagein_multi+0x0/0x1e0 [nfs] [<ffffffffa057d600>] ? pnfs_read_pg_test+0x0/0x80 [nfs] [<ffffffff81126b25>] __do_page_cache_readahead+0x185/0x210 [<ffffffff81126bd1>] ra_submit+0x21/0x30 [<ffffffff81126f45>] ondemand_readahead+0x115/0x240 [<ffffffff81127163>] page_cache_sync_readahead+0x33/0x50 [<ffffffff81112778>] generic_file_aio_read+0x558/0x700 [<ffffffff811458fd>] ? page_add_new_anon_rmap+0x9d/0xf0 [<ffffffffa05470da>] nfs_file_read+0xca/0x130 [nfs] [<ffffffff8117641a>] do_sync_read+0xfa/0x140 [<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40 [<ffffffff8117b6b4>] ? cp_new_stat+0xe4/0x100 [<ffffffff8121902b>] ? selinux_file_permission+0xfb/0x150 [<ffffffff8120c3c6>] ? security_file_permission+0x16/0x20 [<ffffffff81176e15>] vfs_read+0xb5/0x1a0 [<ffffffff810d46e2>] ? audit_syscall_entry+0x272/0x2a0 [<ffffffff81176f51>] sys_read+0x51/0x90 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b Code: f1 01 f0 41 89 c0 41 c1 e8 0c 45 89 c0 49 c1 e0 03 4c 03 41 20 89 c1 48 b8 00 00 00 00 00 16 00 00 81 e1 ff 0f 00 00 4c 89 47 30 <49> 03 00 49 b8 b7 6d db b6 6d db b6 6d 48 c7 47 18 00 00 00 00 RIP [<ffffffffa04928df>] xdr_set_page_base+0x4f/0xb0 [sunrpc] RSP <ffff88003c6e5638> CR2: 0000000000000010 ---[ end trace e6f5483a165c0179 ]--- Kernel panic - not syncing: Fatal exception Pid: 3146, comm: cat Tainted: G D ---------------- 2.6.32-220.el6.x86_64 #1 Call Trace: [<ffffffff814ec341>] ? panic+0x78/0x143 [<ffffffff814f04d4>] ? oops_end+0xe4/0x100 [<ffffffff8104230b>] ? no_context+0xfb/0x260 [<ffffffffa01d38be>] ? e1000_xmit_frame+0x9ce/0x1110 [e1000]
Just curious... What server are you using?
Since RHEL 6.3 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux.
Created attachment 582994 [details] NFSv4.1 fix page number calculation bug for filelayout decode buffers Fixes bug.
Created attachment 583024 [details] A backported version up the upstream patch
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
Reproducer refer to comment 0, if no pnfs server available, try to use pynfs.
Patch(es) available on kernel-2.6.32-288.el6
This bug is verified on rhel6.4 [root@dell-p690-01 ~]# mount -o vers=4,minorversion=1,rsize=1024,intr,hard netapp-qe2.lab.bos.redhat.com:/export/qe-test /mnt/test [root@dell-p690-01 ~]# dd if=/dev/zero of=/mnt/test/img bs=1M count=1 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.30428 s, 3.4 MB/s [root@dell-p690-01 ~]# umount /mnt/test [root@dell-p690-01 ~]# mount -o vers=4,minorversion=1,rsize=1024,intr,hard netapp-qe2.lab.bos.redhat.com:/export/qe-test /mnt/test [root@dell-p690-01 ~]# cat /mnt/test/img > /dev/zero [root@dell-p690-01 ~]# uname -a Linux dell-p690-01.rhts.eng.bos.redhat.com 2.6.32-355.el6.x86_64 #1 SMP Tue Jan 15 17:45:38 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-0496.html