Bug 1349647

Summary:	NFS client may keep phantom directory entry in dcache when rename is canceled
Product:	Red Hat Enterprise Linux 7	Reporter:	Frank Sorenson <fsorenso>
Component:	kernel	Assignee:	Benjamin Coddington <bcodding>
kernel sub component:	NFS	QA Contact:	JianHong Yin <jiyin>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	medium
Priority:	medium	CC:	bcodding, dwysocha, eguan, jiyin, jlayton, swhiteho, yoyang
Version:	7.2	Keywords:	Reproducer
Target Milestone:	rc
Target Release:	---
Hardware:	All
OS:	Linux
Whiteboard:
Fixed In Version:	kernel-3.10.0-658.el7	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-08-01 20:12:35 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1298243, 1385242

Description Frank Sorenson 2016-06-23 21:13:57 UTC

Description of problem:

If a file rename is interrupted, it is possible for the nfs client to have a dentry for both the old and new file.  The inode number remains the same, and the new file only exists on that particular nfs client.


Version-Release number of selected component (if applicable):

nfs server and client are both RHEL 7.2

kernel 3.10.0-327.10.1.el7.x86_64


How reproducible:

trivial, see below


Steps to Reproduce:

/proc/self/mountinfo from test nfs server vm:
59 1 253:2 / / rw,relatime shared:1 - xfs /dev/vda2 rw,attr2,inode64,noquota

/etc/exports
/exports *(rw,no_root_squash)


# mkdir -p /mnt/test
# mount server:/export /mnt/test


/proc/self/mountinfo from test nfs client vm:

82 59 0:38 / /mnt/test rw,relatime shared:33 - nfs vm1:/exports rw,vers=3,rsize=32768,wsize=32768,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.122.73,mountvers=3,mountport=20048,mountproto=udp,local_lock=none,addr=192.168.122.73


# cd /mnt/test
# echo testing > f1
# while cp f1 f2 && mv f2 f1 ; do : ; done
let run for a couple seconds, then interrupt with ctrl-c

re-run the while loop and interrupt again (may require several tries), until the cp fails with:

# while cp f1 f2 && mv f2 f1 ; do : ; done
^C
# while cp f1 f2 && mv f2 f1 ; do : ; done
^C
# while cp f1 f2 && mv f2 f1 ; do : ; done
^C
# while cp f1 f2 && mv f2 f1 ; do : ; done
cp: skipping file ‘f1’, as it was replaced while being copied


Actual results:

only the destination file of the mv appears in a directory listing, however a directory entry for the source file (with the same inode number) can also be found on the nfs client.  The nfs server only sees the destination file:


on the nfs client
# ls -l
drwxr-xr-x. 4 root root      4096 Jun 23 15:23 .
drwxr-xr-x. 9 root root       103 May 11 07:22 ..
drwxr-xr-x. 2 root root        19 Feb 17 12:17 dir1
drwxr-xr-x. 2 root root         6 Feb 17 12:17 dir2
----------. 1 root root         8 Jun 23 15:23 f1

# ls -li f1 f2
403435171 ----------. 1 root root 8 Jun 23 15:23 f1
403435171 ----------. 1 root root 8 Jun 23 15:23 f2

# cat f1 f2
testing
testing

# echo hello > f2

# cat f1 f2
hello
hello


on the nfs server:

# ls -li f1 f2
ls: cannot access f2: No such file or directory
403435171 ---------- 1 root root 6 Jun 23 15:23 f1


Expected results:

interrupted mv results in just one directory entry, whichever is consistent with the server


Additional info:

From the customer:
	Our hypothesis is that this is happening when a rename on an NFS-hosted file is interrupted in rpc_wait_for_completion_task(), called from nfs_rename() (fs/nfs/dir.c:2057 in the 3.10.0-327.13.1.el7 kernel), i.e. before nfs_mark_for_revalidate() can be called.
	The NFS client is RHEL7, the NFS server is RHEL7 exporting an xfs filesystem.  The problem can also happen with Netapp NFS servers, but does not happen with RHEL7 NFS servers exporting GPFS.  The problem also does not happen with RHEL6 clients.
	It appears to happen when a rename() call is interrupted.  It's fairly easy to reproduce by interrupting a mv command at the right time, as in the example pasted below.  It seems easier to reproduce the more network latency that exists between the server and client.  In the example below, the client and server are about 500us apart.

Comment 3 JianHong Yin 2016-06-24 09:58:12 UTC

Reproduced by beaker auto test job:

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:: [   LOG    ] :: do-client-Test-3
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

[05:40:13 root@ ~~]# rhts-sync-block -s servReady hp-bl660cgen8-01.rhts.eng.bos.redhat.com
servReady
:: [   PASS   ] :: Running 'rhts-sync-block -s servReady hp-bl660cgen8-01.rhts.eng.bos.redhat.com' (Expected 0, got 0)
--------------------------------------------------------------------------------
[05:40:27 root@ ~~]# mount -o vers=3 hp-bl660cgen8-01.rhts.eng.bos.redhat.com:/exportdir-bz1349647 /mnt/nfsmp-bz1349647
--------------------------------------------------------------------------------
/mnt/nfsmp-bz1349647 /mnt/tests/kernel/filesystems/nfs/regression/bz1349647-NFS-client-may-keep-phantom-directory-entry-in-dcache
[05:40:28 root@ /mnt/nfsmp-bz1349647]# echo testing >f1
:: [   PASS   ] :: Running 'echo testing >f1' (Expected 0, got 0)
--------------------------------------------------------------------------------

MARK-LWD-LOOP -- 2016-06-24 05:42:38 --
cp: skipping file ‘f1’, as it was replaced while being copied
No screen session found.
[05:46:38 root@ /mnt/nfsmp-bz1349647]# test $(stat -c %i f1) -ne $(stat -c %i f2)
:: [   FAIL   ] :: Running 'test $(stat -c %i f1) -ne $(stat -c %i f2)' (Expected 0, got 1)
--------------------------------------------------------------------------------
[05:46:38 root@ /mnt/nfsmp-bz1349647]# ls -l
total 8
----------. 1 root root  8 Jun 24 05:46 f1
-rw-r--r--. 1 root root 66 Jun 24 05:46 stderr.log
--------------------------------------------------------------------------------
[05:46:38 root@ /mnt/nfsmp-bz1349647]# ls -li f1 f2
234981382 ----------. 1 root root 8 Jun 24 05:46 f1
234981382 ----------. 1 root root 8 Jun 24 05:46 f2
--------------------------------------------------------------------------------
[05:46:38 root@ /mnt/nfsmp-bz1349647]# ssh $SERVER ls -li $expdir
total 8
234981382 ----------. 1 root root  8 Jun 24 05:46 f1
234981381 -rw-r--r--. 1 root root 66 Jun 24 05:46 stderr.log

Comment 7 Steve Whitehouse 2016-12-09 12:40:13 UTC

Is it possible to do something about this I wonder? Not sure how much the protocol might be an issue here.

Comment 8 Benjamin Coddington 2016-12-15 14:52:28 UTC

Here's a more reliable single-system reproducer - but it relies on the 'netem' kernel modules to introduce a network delay in order to signal the process while it waits in the rename's __rpc_wait_for_completion_task():


#!/bin/bash
TESTDIR=/mnt/fedora/bz1349647
touch ${TESTDIR}/f1
tc qdisc add dev lo root netem delay 200ms
sleep 1
mv -f ${TESTDIR}/f{1,2} &
pid=$!
sleep .5
kill $pid;
tc qdisc del dev lo root netem

After this, ${TESTDIR}/f1 will be the left-behind dentry.

Proposed fix upstream posting:
http://marc.info/?l=linux-nfs&m=148181328615582&w=2

Comment 9 Benjamin Coddington 2017-02-01 11:02:32 UTC

upstream v2:
http://marc.info/?l=linux-nfs&m=148594600229049&w=2

Comment 12 Rafael Aquini 2017-03-18 00:53:20 UTC

Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 14 Rafael Aquini 2017-03-20 20:44:16 UTC

Patch(es) available on kernel-3.10.0-616.el7

Comment 16 Benjamin Coddington 2017-04-03 09:54:43 UTC

This one needs a follow-up fix.  The patch that was added to fix this problem will rehash the old dentry after a rename, so an open by the old name will succeed when it should fail.

The fix is upstream:
d4ea7e3c5c0e NFS: Fix old dentry rehash after move

Comment 18 Rafael Aquini 2017-04-27 03:57:00 UTC

Patch(es) committed on kernel repository and an interim kernel build is undergoing testing

Comment 20 Rafael Aquini 2017-05-01 13:17:31 UTC

Patch(es) available on kernel-3.10.0-658.el7

Comment 24 errata-xmlrpc 2017-08-01 20:12:35 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1842

Comment 25 errata-xmlrpc 2017-08-02 00:38:09 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1842