Bug 2138866

Summary: NFS server umount a filesystem fails with EBUSY after it's exported and a subdirectory is mounted via NFSv4.0
Product: Red Hat Enterprise Linux 9 Reporter: Yongcheng Yang <yoyang>
Component: kernelAssignee: Jeff Layton <jlayton>
kernel sub component: NFS QA Contact: Yongcheng Yang <yoyang>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: jlayton, nfs-team, xzhou
Version: 9.2Keywords: Regression, Reproducer, Triaged
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: kernel-5.14.0-202.el9 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-05-09 08:05:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yongcheng Yang 2022-10-31 14:20:53 UTC
Description of problem:
After upgrading to the kernel-5.14.0-178.el9, unmounting previously exported file systems fails with EBUSY sometimes. And here is one test scenario that 100% hit it.

The scenario is from Bug 1532786 (now only v4.0 hit). I.e. When an NFSv4.0 client mounts a subdirectory of an exported file system, on the NFS server umount of the filesystem fails with EBUSY even after stopping nfs.


Version-Release number of selected component (if applicable):
since kernel-5.14.0-178.el9

How reproducible:
always

Steps to Reproduce: (find reproducer in "Actual results" also)

1. On the server, export a mounted (ext4 or xfs) filesystem
2. On the client, mount the subdirectory via NFS v4.0
3. Do some test and then umount the NFS
4. Stop the NFS server and try to unmount /export
5. Repeat step 1-4 once again but now `umount /export` get failed


Actual results:
[root@hp-dl388g8-20 ~]# cat repro.sh
BZ=sepfs
nfsmp=/mnt/nfsmp-$BZ
LOOPIMG=/tmp/$BZ.img
truncate --size 4G $LOOPIMG
losetup -f
LOOPDEV=$(losetup -f)
losetup $LOOPDEV $LOOPIMG
mkdir $nfsmp
cp /etc/exports /etc/exports.back
echo '' > /etc/exports && systemctl restart nfs-server

for VERS in 4.0; do
        #for FSTYPE in ext4 ext4 ; do
        for FSTYPE in ext4 xfs ; do
                echo "{INFO} Prepare a $FSTYPE device"
                wipefs -a $LOOPDEV
                mkfs.${FSTYPE} $LOOPDEV
                expdir=/exportdir-$BZ-$RANDOM
                mkdir $expdir
                mount -t ${FSTYPE} $LOOPDEV $expdir
                mkdir $expdir/subdir
                echo "{INFO} Export the whole device via NFS"
                exportfs -vi -o rw,no_root_squash 127.0.0.1:$expdir
                echo "{INFO} But only mount its subdir"
                mount -t nfs -o vers=$VERS 127.0.0.1:$expdir/subdir $nfsmp
                cat /proc/mounts | grep $BZ
                date > $nfsmp/tfile1
                cat $nfsmp/tfile1
                umount $nfsmp
                echo "{INFO} Recovering it."
                exportfs -ua && sleep 5
                # also need restarting nfs service
                systemctl restart nfs-server && sleep 1
                umount $expdir
                if [ $? -eq 0 ]; then
                        echo -e "Sucess.\n"
                else
                        echo -e "Failed.\n"
                fi
                rm -rf $expdir
        done
done
rm -rf $nfsmp
losetup -d $LOOPDEV
rm $LOOPIMG
cp /etc/exports.back /etc/exports
[root@hp-dl388g8-20 ~]#
[root@hp-dl388g8-20 ~]# ./repro.sh
...
...
{INFO} Recovering it.
umount: /exportdir-sepfs-5321: target is busy.   <<<<<<<
Failed.

rm: cannot remove '/exportdir-sepfs-5321': Device or resource busy
[root@hp-dl388g8-20 ~]# mount | grep sep
/tmp/sepfs.img (deleted) on /exportdir-sepfs-5321 type ext4 (rw,relatime,seclabel)
[root@hp-dl388g8-20 ~]# umount /exportdir-sepfs-5321
umount: /exportdir-sepfs-5321: target is busy.
[root@hp-dl388g8-20 ~]# lsof /exportdir-sepfs-5321
[root@hp-dl388g8-20 ~]# fuser -vm /exportdir-sepfs-5321
                     USER        PID ACCESS COMMAND
/exportdir-sepfs-5321:
                     root     kernel mount /exportdir-sepfs-5321
[root@hp-dl388g8-20 ~]#


Expected results:
`umount` success

Additional info:
a. this only happens if mounting in NFS v4.0 while v3, v4.1, and v4.2 can pass
b. this only start since kernel-5.14.0-178.el9 and also exist in upstream. E.g.
    Pass in kernel-5.14.0-177.el9 https://beaker.engineering.redhat.com/jobs/7180182
    Fail in kernel-5.14.0-178.el9 https://beaker.engineering.redhat.com/jobs/7180023
    Fail in current upstream 6.1.0-rc2+ https://beaker.engineering.redhat.com/jobs/7180026

Comment 1 Jeff Layton 2022-11-08 15:42:18 UTC
Bisect landed here:

commit 876c553cb41026cb6ad3cef970a35e5f69c42a25
Author: Jeff Layton <jlayton>
Date:   Tue Jul 26 16:45:30 2022 +1000

    NFSD: verify the opened dentry after setting a delegation

I think I see the bug. We're not putting the export reference like we should in nfsd4_verify_deleg_dentry.

Comment 2 Jeff Layton 2022-11-08 16:46:41 UTC
Making this bug public. Patch sent upstream:

    https://lore.kernel.org/linux-nfs/20221108162311.320755-1-jlayton@kernel.org/T/#u

I'll probably fold this fix into our larger bugfix MR for 9.2.

Comment 13 Yongcheng Yang 2022-11-29 06:56:36 UTC
Verified in kernel-5.14.0-202.el9:

https://beaker.engineering.redhat.com/jobs/7285842

Comment 16 errata-xmlrpc 2023-05-09 08:05:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: kernel security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2023:2458