Bug 2028370
Summary: | [xfstests/nfs generic/476] test never finishes | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 9 | Reporter: | Boyang Xue <bxue> |
Component: | kernel | Assignee: | Jeff Layton <jlayton> |
kernel sub component: | NFS | QA Contact: | Boyang Xue <bxue> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | chuck.lever, hshuai, jlayton, xiagao, xzhou, yoyang |
Version: | 9.0 | Keywords: | Triaged |
Target Milestone: | rc | ||
Target Release: | 9.1 | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | kernel-5.14.0-150.el9 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-11-15 10:51:55 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Deadline: | 2022-08-15 |
Description
Boyang Xue
2021-12-02 08:32:25 UTC
I also hit this issue on virtiofs with xfstest + nfs. nfs info: 127.0.0.1:/mnt/virtio_fs1_test_nfs on /tmp/virtio_fs1_test type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0.0.1) 127.0.0.1:/mnt/virtio_fs2_test_nfs on /tmp/virtio_fs2_test type nfs4 (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0.0.1) pkg version: kernel-5.14.0-39.el9.x86_64(guest & host) qemu-kvm-6.2.0-1.el9.x86_64 Besides generic/476 test case, generic/531 also never finished on virtiofs. @bxue, does generic/531 work in your side? (In reply to xiagao from comment #1) > I also hit this issue on virtiofs with xfstest + nfs. > > nfs info: > 127.0.0.1:/mnt/virtio_fs1_test_nfs on /tmp/virtio_fs1_test type nfs4 > (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp, > timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0. > 0.1) > 127.0.0.1:/mnt/virtio_fs2_test_nfs on /tmp/virtio_fs2_test type nfs4 > (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp, > timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0. > 0.1) > > pkg version: > kernel-5.14.0-39.el9.x86_64(guest & host) > qemu-kvm-6.2.0-1.el9.x86_64 > Thanks for sharing these information! > > Besides generic/476 test case, generic/531 also never finished on virtiofs. > > @bxue, does generic/531 work in your side? There's some runs that generic/531 hangs on nfs, like https://beaker.engineering.redhat.com/recipes/11180377 # kernel-5.14.0-37.el9, x86_64 But it's not 100%, at least all my runs on x86_64 VM didn't hit this problem. (In reply to Boyang Xue from comment #2) > (In reply to xiagao from comment #1) > > I also hit this issue on virtiofs with xfstest + nfs. > > > > nfs info: > > 127.0.0.1:/mnt/virtio_fs1_test_nfs on /tmp/virtio_fs1_test type nfs4 > > (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp, > > timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0. > > 0.1) > > 127.0.0.1:/mnt/virtio_fs2_test_nfs on /tmp/virtio_fs2_test type nfs4 > > (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp, > > timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0. > > 0.1) > > > > pkg version: > > kernel-5.14.0-39.el9.x86_64(guest & host) > > qemu-kvm-6.2.0-1.el9.x86_64 > > > > Thanks for sharing these information! > > > > > Besides generic/476 test case, generic/531 also never finished on virtiofs. > > > > @bxue, does generic/531 work in your side? > > There's some runs that generic/531 hangs on nfs, like > https://beaker.engineering.redhat.com/recipes/11180377 # > kernel-5.14.0-37.el9, x86_64 > But it's not 100%, at least all my runs on x86_64 VM didn't hit this problem. Boyang, thanks for your quick response. And in my test env generic/650 seems also experienced the same issue. (In reply to xiagao from comment #4) > (In reply to Boyang Xue from comment #2) > > (In reply to xiagao from comment #1) > > > I also hit this issue on virtiofs with xfstest + nfs. > > > > > > nfs info: > > > 127.0.0.1:/mnt/virtio_fs1_test_nfs on /tmp/virtio_fs1_test type nfs4 > > > (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp, > > > timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0. > > > 0.1) > > > 127.0.0.1:/mnt/virtio_fs2_test_nfs on /tmp/virtio_fs2_test type nfs4 > > > (rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp, > > > timeo=600,retrans=2,sec=sys,clientaddr=127.0.0.1,local_lock=none,addr=127.0. > > > 0.1) > > > > > > pkg version: > > > kernel-5.14.0-39.el9.x86_64(guest & host) > > > qemu-kvm-6.2.0-1.el9.x86_64 > > > > > > > Thanks for sharing these information! > > > > > > > > Besides generic/476 test case, generic/531 also never finished on virtiofs. > > > > > > @bxue, does generic/531 work in your side? > > > > There's some runs that generic/531 hangs on nfs, like > > https://beaker.engineering.redhat.com/recipes/11180377 # > > kernel-5.14.0-37.el9, x86_64 > > But it's not 100%, at least all my runs on x86_64 VM didn't hit this problem. > > Boyang, thanks for your quick response. And in my test env generic/650 seems > also experienced the same issue. Yes I have few generic/650 runs never finished, but it has no problem on other runs. Since generic/476, generic/531 and generic/650 they are all stress test, which highly depends on the spec/performance of the system, I think we have to take closer look to confirm they're real problems in general. For generic/476, it's almost 100% chance the test never finishes in my test, but for generic/531 and generic/650 I still have batch of jobs running finished without problem. That's a test I usually skip upstream, unfortunately, because it takes an hour or so on my test setup. Are you booting both client and server to the same kernel? If so one simple way to narrow down the problem would be to try each combination (RHEL8 client, RHEL9 server, etc.). Hi, I tested xfstests/nfs:generic/476 on fx700 with RHEL-9.0.0-20220307.0(aarch64), it passed. `generic/476 1795s ... 1783s` xfstests/nfs: generic/476 also passed on x86_64(kernel-5.14.0-70.el9.x86_64) 2022-03-15 01:28:32: generic/476 2022-03-15 01:37:01: 509s But generic/531 never finished. generic/476 still hangs in my tests on kernel-5.14.0-72.el9 on all arches, no matter in multi-host or single-host configuration. The local.config for multi-host is like: ``` FSTYP=nfs TEST_DEV=sweetpig-17.4a2m.lab.eng.bos.redhat.com:/mnt/xfstests/test/nfs-server TEST_DIR=/mnt/xfstests/test/nfs-client SCRATCH_DEV=sweetpig-17.4a2m.lab.eng.bos.redhat.com:/mnt/xfstests/scratch/nfs-server SCRATCH_MNT=/mnt/xfstests/scratch/nfs-client MOUNT_OPTIONS="-o vers=4.2" TEST_FS_MOUNT_OPTS="-o vers=4.2" ``` generic/476 works just fine for me: [jlayton@centos9 xfstests-dev]$ sudo ./check generic/476 FSTYP -- nfs PLATFORM -- Linux/x86_64 centos9 5.14.0-125.mr1113_220707_1634.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Jul 7 16:48:49 UTC 2022 MKFS_OPTIONS -- ceph:/export/scratch MOUNT_OPTIONS -- -o context=system_u:object_r:root_t:s0 ceph:/export/scratch /mnt/scratch generic/476 1512s Ran: generic/476 Passed all 1 tests My local.config: export TEST_DIR=/mnt/test export SCRATCH_MNT=/mnt/scratch export FSTYP=nfs export TEST_DEV="ceph:/export/test" export SCRATCH_DEV="ceph:/export/scratch" I didn't explicitly specify '-o vers=4.2' but that was what the client and server negotiated during the test. Boyang, are you still seeing problems with this test? Yes. I'm still seeing this problem. [root@kvm103 ~]# cat repo_xfstests-dev/local.config FSTYP=nfs TEST_DIR=/test TEST_DEV=localhost:/test_TMP SCRATCH_MNT=/scratch SCRATCH_DEV=localhost:/scratch_TMP LOGWRITES_MNT=/logwrites LOGWRITES_DEV=/dev/vda6 #MKFS_OPTIONS="-b 4096" MOUNT_OPTIONS="-o vers=4.2" TEST_FS_MOUNT_OPTS="-o vers=4.2" [root@kvm103 repo_xfstests-dev]# ./check -nfs generic/476 FSTYP -- nfs PLATFORM -- Linux/x86_64 kvm103 5.14.0-127.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Sat Jul 9 08:20:22 EDT 2022 MKFS_OPTIONS -- localhost:/scratch_TMP MOUNT_OPTIONS -- -o vers=4.2 -o context=system_u:object_r:root_t:s0 localhost:/scratch_TMP /scratch generic/476 (...stuck here...) # ps uxf root 3722 0.0 0.2 10064 6540 pts/0 S+ 04:37 0:00 | \_ /bin/bash ./check -nfs generic/476 root 3980 0.0 0.2 6724 5544 pts/0 S+ 04:38 0:00 | \_ /bin/bash /root/repo_xfstests-dev/tests/generic/476 root 4149 0.0 0.0 2728 832 pts/0 S 04:38 0:00 | \_ ./ltp/fsstress -w -d /scratch -n 100000 -p 4 root 4151 0.1 0.2 6784 5320 pts/0 D 04:38 0:44 | \_ ./ltp/fsstress -w -d /scratch -n 100000 -p 4 root 4152 0.1 0.2 6760 5240 pts/0 D 04:38 0:42 | \_ ./ltp/fsstress -w -d /scratch -n 100000 -p 4 [root@kvm103 ~]# cat /proc/4151/stack [<0>] nfs_direct_wait+0x19/0x40 [nfs] [<0>] nfs_file_direct_write+0x248/0x340 [nfs] [<0>] new_sync_write+0x11f/0x1b0 [<0>] vfs_write+0x1eb/0x280 [<0>] ksys_write+0x5f/0xe0 [<0>] do_syscall_64+0x5c/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae This was running with localhost exported nfs. I'm running the same test on remote exported nfs now to see if the result changes. That stack trace shows the client waiting on a WRITE reply from the server. What we'd probably need for you to do is track down the nfsd thread that is servicing the request and see why it's stuck. Running tests on an NFS server exported over localhost can be problematic for anything that stresses the memory on the machine. It's quite possible to hit a deadlock where the server needs memory to satisfy a request, and tries to flush dirty NFS pages to free some up, but the server is still stuck waiting for memory and can't process the request. In general, running simple functionality tests over localhost that don't stress the client or server is usually fine, but you want to avoid running anything too intensive in this sort of configuration to avoid that sort of deadlock. Hi Jeff, Thanks for the explaination. I'm trying to provide more information here. First here's the localhost configuration, when the test stucks as described in comment#14, the stack trace of the nfsd is changing, not fixed. ``` [root@kvm103 ~]# cat /proc/3716/stack [<0>] svc_get_next_xprt+0x10b/0x1d0 [sunrpc] [<0>] svc_recv+0x8e/0x1b0 [sunrpc] [<0>] nfsd+0xdb/0x150 [nfsd] [<0>] kthread+0x149/0x170 [<0>] ret_from_fork+0x22/0x30 [root@kvm103 ~]# cat /proc/3716/stack [<0>] xlog_wait_on_iclog+0x117/0x180 [xfs] [<0>] xfs_log_force_seq+0x8f/0x150 [xfs] [<0>] xfs_file_fsync+0x149/0x210 [xfs] [<0>] xfs_file_buffered_write+0x25d/0x2a0 [xfs] [<0>] do_iter_readv_writev+0x152/0x1b0 [<0>] do_iter_write+0x7d/0x150 [<0>] nfsd_vfs_write+0x10b/0x680 [nfsd] [<0>] nfsd4_write+0x143/0x1d0 [nfsd] [<0>] nfsd4_proc_compound+0x3b2/0x6c0 [nfsd] [<0>] nfsd_dispatch+0x149/0x270 [nfsd] [<0>] svc_process_common+0x3b8/0x5e0 [sunrpc] [<0>] svc_process+0xb7/0xf0 [sunrpc] [<0>] nfsd+0xed/0x150 [nfsd] [<0>] kthread+0x149/0x170 [<0>] ret_from_fork+0x22/0x30 [root@kvm103 ~]# cat /proc/3716/stack [<0>] __flush_work.isra.0+0x160/0x220 [<0>] xlog_cil_force_seq+0x6d/0x210 [xfs] [<0>] xfs_log_force_seq+0x72/0x150 [xfs] [<0>] xfs_file_fsync+0x149/0x210 [xfs] [<0>] nfsd_commit+0x1a1/0x200 [nfsd] [<0>] nfsd4_proc_compound+0x3b2/0x6c0 [nfsd] [<0>] nfsd_dispatch+0x149/0x270 [nfsd] [<0>] svc_process_common+0x3b8/0x5e0 [sunrpc] [<0>] svc_process+0xb7/0xf0 [sunrpc] [<0>] nfsd+0xed/0x150 [nfsd] [<0>] kthread+0x149/0x170 [<0>] ret_from_fork+0x22/0x30 ``` There's 50% write errors on the SCRATCH_MNT ``` [root@kvm103 ~]# nfsiostat localhost:/test_TMP mounted on /test: ops/s rpc bklog 2871.623 0.000 read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) avg queue (ms) errors 0.000 0.000 0.000 0 (0.0%) 0.000 0.000 0.000 0 (0.0%) write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) avg queue (ms) errors 0.000 0.000 0.000 0 (0.0%) 0.000 0.000 0.000 0 (0.0%) localhost:/scratch_TMP mounted on /scratch: ops/s rpc bklog 2871.623 0.000 read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) avg queue (ms) errors 0.341 37.267 109.279 0 (0.0%) 1.155 1.236 0.034 0 (0.0%) write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) avg queue (ms) errors 1884.265 752077.789 399.136 0 (0.0%) 0.714 0.729 0.00781485438 (50.0%) ``` Second, when I run the test with multi-hosts configuration, the test stucks in part of runs: https://beaker.engineering.redhat.com/jobs/6809647 # stucks on x86_64 and ppc64le https://beaker.engineering.redhat.com/jobs/6809643 # stucks on aarch64 and ppc https://beaker.engineering.redhat.com/jobs/6809639 # stucks on x86_64 and ppc64le The NFSD's stack trace and the output of `nfsiostat` is very similiar with those in localhost configuration. Please let me know if there's anything else information I can provide. If the stack trace is changing then progress is likely being made. Perhaps this is just a situation where you have extremely slow underlying storage? The error count is interesting. It might be worth sniffing NFS traffic in your test rig to see if you can determine what sort of errors you're seeing. In any case, I'll try to replicate your loopback mount configuration and give this test a go to see if I can reproduce it. It's unlikely we'll be able to make much progress on this w/o more to go on. I find when it stucks and `nfsiostat` shows 50% error rate in writing on SCRATCH_MNT, `df` shows SCRATCH_MNT is 100% full. Perhaps this is the reason why there's 50% error rate? Anyway, when I extend the backing storage of SCRATCH_MNT from 11G to 50G, the test run passed in 10 mins. The resulting output of `df` shows: ``` [root@kvm103 repo_xfstests-dev]# df -h Filesystem Size Used Avail Use% Mounted on devtmpfs 1.1G 0 1.1G 0% /dev tmpfs 1.2G 0 1.2G 0% /dev/shm tmpfs 455M 6.3M 449M 2% /run /dev/vda7 24G 4.7G 19G 20% / /dev/vda1 1014M 332M 683M 33% /boot tmpfs 228M 4.0K 228M 1% /run/user/0 /dev/vda2 11G 111M 11G 1% /test_TMP /dev/vdb 50G 15G 36G 30% /scratch_TMP ``` So it seems SCRATCH_MNT requires at least 15GB. A full disk can certainly cause writeback errors. I'm not sure what the space requirements for xfstests actually are, and the READMEs don't really say. It might be worth sending a note to fstests.org to see what the minimum is (if it's known). To be clear, this is probably indicative of a bug in the nfs client. It shouldn't hang when there are writeback errors, no matter the size of the underlying filesystem. cc'ing Chuck since he expressed some interest in this issue. (In reply to Jeff Layton from comment #20) > To be clear, this is probably indicative of a bug in the nfs client. It > shouldn't hang when there are writeback errors, no matter the size of the > underlying filesystem. Agreed, looks like a client-side issue. Zorro's posted call traces don't have any direct server call paths. It might help to bisect the client kernel, as the report I saw suggested the failures don't occur on kernels before v5.10. Here's what I'm seeing in the fsstress process: [jlayton@centos9 ~]$ sudo cat /proc/1951/stack [<0>] nfs_direct_wait+0x19/0x40 [nfs] [<0>] nfs_file_direct_write+0x248/0x340 [nfs] [<0>] new_sync_write+0x11f/0x1b0 [<0>] vfs_write+0x1eb/0x280 [<0>] ksys_write+0x5f/0xe0 [<0>] do_syscall_64+0x5c/0x80 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae I tried to attach strace to it, but it got stuck, so I'm pretty sure this thing is stuck in the same DIO write syscall for some reason. I turned up some random nfs write tracepoints and saw this: kworker/u32:0-2163 [001] .... 2540.086489: nfs4_write: error=0 (OK) fileid=00:2b:8889676 fhandle=0xb00f143d offset=1048576 count=1048576 res=294912 stateid=1:0xb93c1f29 layoutstateid=0:0x00000000 kworker/u32:0-2163 [001] .... 2540.086496: nfs_writeback_done: fileid=00:2b:8889676 fhandle=0xb00f143d offset=1048576 count=1048576 res=294912 status=294912 stable=FILE_SYNC verifier=0000000000000000 kworker/u32:0-2163 [001] .... 2540.093373: nfs4_write: error=-28 (ENOSPC) fileid=00:2b:8889676 fhandle=0xb00f143d offset=1343488 count=753664 res=0 stateid=1:0xb93c1f29 layoutstateid=0:0x00000000 kworker/u32:0-2163 [001] .... 2540.093378: nfs_writeback_done: fileid=00:2b:8889676 fhandle=0xb00f143d offset=1343488 count=753664 res=0 status=-28 stable=UNSTABLE verifier=0000000000000000 kworker/u32:0-2163 [001] .... 2540.096165: nfs_initiate_write: fileid=00:2b:8889676 fhandle=0xb00f143d offset=1048576 count=1048576 stable=FILE_SYNC kworker/u32:0-2163 [001] .... 2540.119430: nfs4_write: error=0 (OK) fileid=00:2b:8889676 fhandle=0xb00f143d offset=1048576 count=1048576 res=294912 stateid=1:0xb93c1f29 layoutstateid=0:0x00000000 kworker/u32:0-2163 [001] .... 2540.119439: nfs_writeback_done: fileid=00:2b:8889676 fhandle=0xb00f143d offset=1048576 count=1048576 res=294912 status=294912 stable=FILE_SYNC verifier=0000000000000000 kworker/u32:0-2163 [001] .... 2540.126377: nfs4_write: error=-28 (ENOSPC) fileid=00:2b:8889676 fhandle=0xb00f143d offset=1343488 count=753664 res=0 stateid=1:0xb93c1f29 layoutstateid=0:0x00000000 kworker/u32:0-2163 [001] .... 2540.126382: nfs_writeback_done: fileid=00:2b:8889676 fhandle=0xb00f143d offset=1343488 count=753664 res=0 status=-28 stable=UNSTABLE verifier=0000000000000000 kworker/u32:0-2163 [001] .... 2540.129160: nfs_initiate_write: fileid=00:2b:8889676 fhandle=0xb00f143d offset=1048576 count=1048576 stable=FILE_SYNC kworker/u32:0-2163 [001] .... 2540.154001: nfs4_write: error=0 (OK) fileid=00:2b:8889676 fhandle=0xb00f143d offset=1048576 count=1048576 res=294912 stateid=1:0xb93c1f29 layoutstateid=0:0x00000000 kworker/u32:0-2163 [001] .... 2540.154009: nfs_writeback_done: fileid=00:2b:8889676 fhandle=0xb00f143d offset=1048576 count=1048576 res=294912 status=294912 stable=FILE_SYNC verifier=0000000000000000 kworker/u32:0-2163 [001] .... 2540.160889: nfs_writeback_done: fileid=00:2b:8889676 fhandle=0xb00f143d offset=1343488 count=753664 res=0 status=-28 stable=UNSTABLE verifier=0000000000000000 kworker/u32:0-2163 [001] .... 2540.163633: nfs_initiate_write: fileid=00:2b:8889676 fhandle=0xb00f143d offset=1048576 count=1048576 stable=FILE_SYNC ...so it looks like the client is cycling over and over trying to write the same range. Maybe the DIO error handling is borked? I'm going to go ahead and open this bug up to the public. Note that the above is running against a small 10g partition on a separate physical NFS server. I think we need to disentangle the problems of running a loopback client and server for now. The ENOSPC result appears to get lost, which indeed indicates a bug in the error handling. Also, I don't understand why the stable_how flag is flipping between FILE_SYNC and UNSTABLE. That might be related, or it could be a separate misbehavior. Once I have a full partition, I can reproduce this with a command like: [jlayton@centos9 ~]$ sudo dd if=/dev/urandom of=/mnt/scratch/p0/d1/f31dd bs=100k count=1024 oflag=direct ...interestingly, 10k I/Os got an immediate error return: [jlayton@centos9 ~]$ sudo dd if=/dev/urandom of=/mnt/scratch/p0/d1/f31dd bs=10k count=1024 oflag=direct dd: error writing '/mnt/scratch/p0/d1/f31dd': No space left on device dd: closing output file '/mnt/scratch/p0/d1/f31dd': No space left on device I added some tracepoints to the direct write codepaths, and got this: We start by issuing a 100k write (matching the write from userland): dd-1324 [013] ..... 138.644456: nfs_initiate_write: fileid=00:31:377858 fhandle=0x4161b7a9 offset=1024000 count=102400 stable=FILE_SYNC It comes back short though (only 6 pages worth). Note that it came back FILE_SYNC though: kworker/u32:2-113 [008] ..... 138.652357: nfs4_write: error=0 (OK) fileid=00:31:377858 fhandle=0x4161b7a9 offset=1024000 count=102400 res=24576 stateid=1:0x6582c8e8 layoutstateid=0:0x00000000 kworker/u32:2-113 [008] ..... 138.652376: nfs_writeback_done: fileid=00:31:377858 fhandle=0x4161b7a9 offset=1024000 count=102400 res=24576 status=24576 stable=FILE_SYNC verifier=0000000000000000 kworker/u32:2-113 [008] ..... 138.652380: nfs_writeback_result: fileid=00:31:377858 fhandle=0x4161b7a9 offset=1024000 count=102400 res=24576 status=24576 stable=FILE_SYNC verifier=0000000000000000 We fall into the short write logic in nfs_writeback_result, and the RPC gets reissued, starting at the end of the prior short write. That returns -ENOSPC: kworker/u32:2-113 [008] ..... 138.658807: nfs4_write: error=-28 (ENOSPC) fileid=00:31:377858 fhandle=0x4161b7a9 offset=1048576 count=77824 res=0 stateid=1:0x6582c8e8 layoutstateid=0:0x00000000 kworker/u32:2-113 [008] ..... 138.658813: nfs_writeback_done: fileid=00:31:377858 fhandle=0x4161b7a9 offset=1048576 count=77824 res=0 status=-28 stable=UNSTABLE verifier=0000000000000000 kworker/u32:2-113 [008] ..... 138.658945: nfs_direct_write_completion: dreq=000000000d0ecac9 fileid=00:31:377858 fhandle=0x4161b7a9 ref=2 io_start=1024000 count=0 bytes_left=0 error=0 flags= First problem: we decide to issue a commit here -- why? kworker/u32:2-113 [008] ..... 138.659056: nfs_direct_write_complete: dreq=000000000d0ecac9 fileid=00:31:377858 fhandle=0x4161b7a9 ref=2 io_start=1024000 count=24576 bytes_left=0 error=-28 flags=DO_COMMIT kworker/u32:2-113 [008] ..... 138.677659: nfs_initiate_commit: fileid=00:31:377858 fhandle=0x4161b7a9 offset=0 count=0 kworker/u32:2-113 [008] ..... 138.681016: nfs4_commit: error=0 (OK) fileid=00:31:377858 fhandle=0x4161b7a9 offset=0 count=0 layoutstateid=0:0x00000000 kworker/u32:2-113 [008] ..... 138.681022: nfs_commit_done: fileid=00:31:377858 fhandle=0x4161b7a9 offset=0 status=0 stable=FILE_SYNC verifier=3cb4da627ca64c37 kworker/u32:2-113 [008] ..... 138.681135: nfs_direct_commit_complete: dreq=000000000d0ecac9 fileid=00:31:377858 fhandle=0x4161b7a9 ref=2 io_start=1024000 count=24576 bytes_left=0 error=-28 flags= ...and then it reschedules the writes and the cycle begins anew. kworker/u32:2-113 [008] ..... 138.681209: nfs_direct_write_complete: dreq=000000000d0ecac9 fileid=00:31:377858 fhandle=0x4161b7a9 ref=2 io_start=1024000 count=24576 bytes_left=0 error=-28 flags=RESCHED_WRITES kworker/u32:2-113 [008] ..... 138.681384: nfs_initiate_write: fileid=00:31:377858 fhandle=0x4161b7a9 offset=1024000 count=102400 stable=FILE_SYNC Initial patchset posted here. I'm not sure it's the correct approach, but seems to fix the issue. https://lore.kernel.org/linux-nfs/20220722181220.81636-1-jlayton@kernel.org/T/#t TEST PASS. Run generic/476 for 3 times within 36 hours on each system. The test didn't finish on kernel-5.14.0-145.el9, and finished on the MR kernel. Link to Beaker jobs: kernel-5.14.0-145.el9 https://url.corp.redhat.com/bz2028370-reproduce kernel-5.14.0-145.mr1258_220811_1150.el9 https://url.corp.redhat.com/bz2028370-verify-1 TEST PASS. Run generic/476 for 3 times within 36 hours on each system. The test didn't finish on kernel-5.14.0-145.el9, and finished on kernel-5.14.0-150.el9. Link to Beaker jobs: kernel-5.14.0-145.el9 https://url.corp.redhat.com/bz2028370-reproduce kernel-5.14.0-150.el9 https://url.corp.redhat.com/bz2028370-verify-final It turns out I spoke too soon. More test runs may uncover a (perhaps new) bug, that xfstests-multihost-nfsv3-over-ext4-ppc64le looks like hanging in its 3rd runs. Please check the Beaker jobs: https://beaker.engineering.redhat.com/jobs/6932794 https://beaker.engineering.redhat.com/jobs/6932798 Please note that it only happens on nfsv3 over ext4, ppc64le multi-host configuration among all other configurations, and the kernel version is kernel-5.14.0-150.el9. When this problem happens, I find the scratch dir on the server side is 100% full [root@ibm-p9z-25-lp8 ~]# df -h | grep xfstests /dev/sda5 9.8G 32K 9.3G 1% /mnt/xfstests/test /dev/sda3 9.8G 9.8G 0 100% /mnt/xfstests/scratch And nfsiostat shows 30-40% write error rate ``` <server side>:/mnt/xfstests/test/nfs-server mounted on /mnt/xfstests/test/nfs-client: ops/s rpc bklog 109.515 0.000 read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) avg queue (ms) errors 0.000 0.000 0.000 0 (0.0%) 0.000 0.000 0.000 0 (0.0%) write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) avg queue (ms) errors 0.000 0.000 0.000 0 (0.0%) 0.000 0.000 0.000 0 (0.0%) <server side>:/mnt/xfstests/scratch/nfs-server mounted on /mnt/xfstests/scratch/nfs-client: ops/s rpc bklog 113.424 0.000 read: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) avg queue (ms) errors 0.780 92.249 118.227 0 (0.0%) 1.017 1.043 0.016 0 (0.0%) write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms) avg queue (ms) errors 2.403 841.814 350.361 0 (0.0%) 44.552 44.576 0.019 35336 (39.2%) ``` The /proc/<fsstress PID>/stack shows as follows, and is not changing ``` [root@ibm-p9z-25-lp4 ~]# cat /proc/118520/stack [<0>] 0xc0000001ff990480 [<0>] __switch_to+0x12c/0x220 [<0>] rpc_wait_bit_killable+0x3c/0x110 [sunrpc] [<0>] __rpc_execute+0x154/0x530 [sunrpc] [<0>] rpc_execute+0x168/0x1d0 [sunrpc] [<0>] rpc_run_task+0x1f0/0x2a0 [sunrpc] [<0>] rpc_call_sync+0x80/0x100 [sunrpc] [<0>] nfs3_rpc_wrapper+0x64/0x130 [nfsv3] [<0>] nfs3_proc_access+0xb4/0x1b0 [nfsv3] [<0>] nfs_do_access+0xd8/0x2c0 [nfs] [<0>] nfs_permission+0xd8/0x268 [nfs] [<0>] inode_permission+0x174/0x290 [<0>] may_open+0xa4/0x260 [<0>] do_open+0x1a4/0x500 [<0>] path_openat+0x170/0x380 [<0>] do_filp_open+0xac/0x180 [<0>] do_sys_openat2+0x100/0x220 [<0>] sys_openat+0x74/0xd0 [<0>] system_call_exception+0x160/0x300 [<0>] system_call_vectored_common+0xe8/0x278 ``` Unlike the comment#26, dd 10k or 100k both fails with ENOSPC. Jeff, Could you please help check it? Thanks, Boyang Looking at this log: http://lab-02.hosts.prod.upshift.rdu2.redhat.com/beaker/logs/tasks/149048+/149048770/taskout.log ...which is on the server-side, it looks like maybe something screwed up with the test infrastructure: nfs mount will be run on client side, this's server side nfs mount will be run on client side, this's server side ** init_dev_param PASS Score: Uploading resultoutputfile.log .done ** pre_param PASS Score: Uploading resultoutputfile.log done ** Test_completed PASS Score: Uploading resultoutputfile.log done _test_online_fsck:nfs on /mnt/xfstests/test/nfs-client haven't implemented a way to scrub nfs nfs unmount will be run on client side, this's server side _test_fsck:nfs on ibm-p9z-25-lp8.virt.pnr.lab.eng.rdu2.redhat.com:/mnt/xfstests/test/nfs-server haven't implemented a way to check nfs /usr/bin/rhts-sync-block -s TEST__nfs_resume_device ibm-p9z-25-lp4.virt.pnr.lab.eng.rdu2.redhat.com -- Blocking state(s) = 4_TEST__nfs_resume_device Failed to connect to remote server ibm-p9z-25-lp4.virt.pnr.lab.eng.rdu2.redhat.com: Connection refused Retrying rstrnt-sync-block for state 4_TEST__nfs_resume_device Failed to connect to remote server ibm-p9z-25-lp4.virt.pnr.lab.eng.rdu2.redhat.com: Connection refused Retrying rstrnt-sync-block for state 4_TEST__nfs_resume_device Failed to connect to remote server ibm-p9z-25-lp4.virt.pnr.lab.eng.rdu2.redhat.com: Connection refused Retrying rstrnt-sync-block for state 4_TEST__nfs_resume_device Failed to connect to remote server ibm-p9z-25-lp4.virt.pnr.lab.eng.rdu2.redhat.com: Connection refused Retrying rstrnt-sync-block for state 4_TEST__nfs_resume_device Failed to connect to remote server ibm-p9z-25-lp4.virt.pnr.lab.eng.rdu2.redhat.com: Connection refused Retrying rstrnt-sync-block for state 4_TEST__nfs_resume_device ...are these Connection refused errors expected? If so, then is there a way for me to log into the client(s), so I can see what they're doing? Ok, you're right that that task is the one stuck. It's waiting on an ACCESS call to the server. Looking at the server though, all of the nfsd threads are sitting idle, waiting for RPCs. The client task is stuck here in __rpc_execute waiting for the RPC_TASK_QUEUED bit to clear: status = out_of_line_wait_on_bit(&task->tk_runstate, RPC_TASK_QUEUED, rpc_wait_bit_killable, TASK_KILLABLE); That means that it got added to an RPC wait queue and we're waiting for it to come off. Here's the rpc in the debugfs tasks file: 25226 8880 0 0x6 0xa442bfd9 0 rpc_default_ops [sunrpc] nfsv3 ACCESS a:call_status [sunrpc] q:none The "q:none" means that the task->tk_waitqueue pointer is NULL. The flags here (0x8800 in field #2) are: RPC_TASK_CRED_NOREF|RPC_TASK_SENT RPC_TASK_QUEUED is not set. This looks like there was a problem waking up this task -- probably a race of some sort? In any case, this is an entirely different issue from the original problem with O_DIRECT writes. Boyang, can you open a new bug to track this? I don't see a reason to hold up the other valid fixes for this. Ahh, my mistake. QUEUED and RUNNING are tk_runstate flags and we can't see those in debugfs (which we should fix). Maybe I can poke around with the debugger... Oops, RPC_TASK_DYNAMIC is also set in tk_flags (not that it matters much for this problem). Looking at the second example, there are 2 stuck GETATTR requests in a similar state: 21458 8880 0 0x6 0xc85e5da8 0 rpc_default_ops [sunrpc] nfsv3 GETATTR a:call_status [sunrpc] q:none 2341 8880 0 0x6 0x362569a8 0 rpc_default_ops [sunrpc] nfsv3 GETATTR a:call_status [sunrpc] q:none The last field in the output is "q:none", which means that either tk_waitqueue was NULL, or that we were no longer in the RPC_IS_QUEUED state. Probably, the latter since I don't see how we could get a NULL pointer there with rpc_set_queued having been called at least once for this. So, this may still be a problem with wakeups. What we probably ought to do is roll a patch that adds the tk_runstate to the output there, and then we can see whether the reply was ever received. I'll see what I can spin up. I don't think I can learn much more from the currently-stuck machines, so you can kill those off if you like. For now, I spun up a draft patch that adds the tk_runstate to the debugfs output: diff --git a/net/sunrpc/debugfs.c b/net/sunrpc/debugfs.c index 7dc9cc929bfd..879f0c618b8c 100644 --- a/net/sunrpc/debugfs.c +++ b/net/sunrpc/debugfs.c @@ -30,8 +30,8 @@ tasks_show(struct seq_file *f, void *v) if (task->tk_rqstp) xid = be32_to_cpu(task->tk_rqstp->rq_xid); - seq_printf(f, "%5u %04x %6d 0x%x 0x%x %8ld %ps %sv%u %s a:%ps q:%s\n", - task->tk_pid, task->tk_flags, task->tk_status, + seq_printf(f, "%5u %04x %6d 0x%lx 0x%x 0x%x %8ld %ps %sv%u %s a:%ps q:%s\n", + task->tk_pid, task->tk_flags, task->tk_status, task->tk_runstate, clnt->cl_clid, xid, rpc_task_timeout(task), task->tk_ops, clnt->cl_program->name, clnt->cl_vers, rpc_proc_name(task), task->tk_action, rpc_waitq); Kernel is building here: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=47257987 Boyang, could you try to reproduce the problem with this kernel once it's built? After it gets stuck, please run: # cat /sys/kernel/debug/sunrpc/rpc_clnt/*/tasks ...and collect the output. Knowing what the tk_runstate flags are set to might help us rule out some possibilities. Opening a new bug to track this would be best though since this doesn't look related to the original bug. (In reply to Jeff Layton from comment #43) > I don't think I can learn much more from the currently-stuck machines, so > you can kill those off if you like. For now, I spun up a draft patch that > adds the tk_runstate to the debugfs output: > > diff --git a/net/sunrpc/debugfs.c b/net/sunrpc/debugfs.c > index 7dc9cc929bfd..879f0c618b8c 100644 > --- a/net/sunrpc/debugfs.c > +++ b/net/sunrpc/debugfs.c > @@ -30,8 +30,8 @@ tasks_show(struct seq_file *f, void *v) > if (task->tk_rqstp) > xid = be32_to_cpu(task->tk_rqstp->rq_xid); > > - seq_printf(f, "%5u %04x %6d 0x%x 0x%x %8ld %ps %sv%u %s a:%ps > q:%s\n", > - task->tk_pid, task->tk_flags, task->tk_status, > + seq_printf(f, "%5u %04x %6d 0x%lx 0x%x 0x%x %8ld %ps %sv%u %s a:%ps > q:%s\n", > + task->tk_pid, task->tk_flags, task->tk_status, > task->tk_runstate, > clnt->cl_clid, xid, rpc_task_timeout(task), task->tk_ops, > clnt->cl_program->name, clnt->cl_vers, rpc_proc_name(task), > task->tk_action, rpc_waitq); > > > Kernel is building here: > > https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=47257987 > > Boyang, could you try to reproduce the problem with this kernel once it's > built? After it gets stuck, please run: > > # cat /sys/kernel/debug/sunrpc/rpc_clnt/*/tasks > > ...and collect the output. Knowing what the tk_runstate flags are set to > might help us rule out some possibilities. No problem. I'm running the tests now, and will report the result later today. > Opening a new bug to track this would be best though since this doesn't look > related to the original bug. I've opened the BZ as Bug 2120438 - [xfstests/nfs generic/476] nfsv3 over ext4 test gets stuck on multi-host ppc64le I just got the result. When it gets stuck, running "cat /sys/kernel/debug/sunrpc/rpc_clnt/*/tasks" on the client shows ibm-p9z-16-lp5.lab.eng.bos.redhat.com (https://beaker.engineering.redhat.com/jobs/6935876) (the output is changing) ``` [root@ibm-p9z-16-lp5 ~]# cat /sys/kernel/debug/sunrpc/rpc_clnt/*/tasks 5207 8801 0 0x16 0x6 0x92f92312 5994 nfs_pgio_common_ops [nfs] nfsv3 WRITE a:call_status [sunrpc] q:xprt_pending 5214 8880 0 0x16 0x6 0x99f92312 5996 rpc_default_ops [sunrpc] nfsv3 SYMLINK a:call_status [sunrpc] q:xprt_pending 5223 8880 0 0x16 0x6 0xa2f92312 5999 rpc_default_ops [sunrpc] nfsv3 SYMLINK a:call_status [sunrpc] q:xprt_pending 5226 8880 0 0x16 0x6 0xa5f92312 5999 rpc_default_ops [sunrpc] nfsv3 SYMLINK a:call_status [sunrpc] q:xprt_pending ``` ibm-p9z-20-lp24.fs.lab.eng.bos.redhat.com (https://beaker.engineering.redhat.com/jobs/6935874) (the output is changing) ``` [root@ibm-p9z-20-lp24 ~]# cat /sys/kernel/debug/sunrpc/rpc_clnt/*/tasks 20505 8801 0 0x16 0x6 0x8d3fee6d 5993 nfs_pgio_common_ops [nfs] nfsv3 WRITE a:call_status [sunrpc] q:xprt_pending 20511 8880 0 0x16 0x6 0x933fee6d 5994 rpc_default_ops [sunrpc] nfsv3 CREATE a:call_status [sunrpc] q:xprt_pending 20524 8880 0 0x16 0x6 0xa03fee6d 5995 rpc_default_ops [sunrpc] nfsv3 CREATE a:call_status [sunrpc] q:xprt_pending 20601 8880 0 0x16 0x6 0xed3fee6d 6000 rpc_default_ops [sunrpc] nfsv3 CREATE a:call_status [sunrpc] q:xprt_pending ``` I've reset the root passwords of ppc64le in the above two jobs, in case you would like to log in. I just prepared another two pairs of ppc64le for debug. I've extended the watchdog timer, so they should survive for as long as 10 days. https://beaker.engineering.redhat.com/jobs/6936603 https://beaker.engineering.redhat.com/jobs/6936604 As always, their root passwords have been reset. Looking at ibm-p9z-16-lp5, I'm not seeing anything that is stuck. As you say, the output is changing, so it seems like progress is being made (even if it is slow). ibm-p9z-20-lp24 is not up anymore, and it looks like the test completed successfully there. I don't think the same problem was reproduced in this run. Ditto for the others that you kicked off in comment #46. They don't seem to be hung, at least not so far. I'll plan to check on them later today. We'll pursue the latest issue with this test hanging in bug #2120438 and leave this bug to cover the DIO issue that was originally fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: kernel security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:8267 |