Bug 625078

Summary: [NFS]: silly renamed .nfs0000* files can be left on fs forever
Product: Red Hat Enterprise Linux 6 Reporter: Fabio Olive Leite <fleite>
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.0CC: aveseb, bfields, fleite, jlayton, khorenko, rwheeler, steved, tao, trond.myklebust
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 511901 Environment:
Last Closed: 2010-11-15 14:03:49 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 511901    
Bug Blocks:    
Attachments:
Description Flags
patch -- make sillyrename an async operation none

Description Fabio Olive Leite 2010-08-18 14:34:43 UTC
+++ This bug was initially created as a clone of Bug #511901 +++

Description of problem:

Parallels Virtuozzo Containers/OpenVZ linux kernel team found that sometimes NFS temporary (silly-rename: .nfs0000*) files can be left on a filesystem when no process has them open.
Kernel version affected: checked 2.6.18-8.el5, 2.6.18-128.2.1.el5 x86_64, both are affected.

Reproducer source (test_nfs_exit_thr4.c) is in attach.
Brief description: create a thread which will create and unlink files in a cycle, then exit main thread without waiting for children.

How to reproduce:

[root@tom nfs]# uname -a
Linux HOSTNAME 2.6.18-128.2.1.el5 #1 SMP Wed Jul 8 11:54:47 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
[root@tom ~]# mount NFSSERVERNAME:/vz/export/tom /mnt/nfs/
[root@tom ~]# cd /mnt/nfs/
[root@tom nfs]# mount |grep nfs
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
NFSSERVERNAME:/vz/export/tom on /mnt/nfs type nfs (rw,addr=NFSSERVERIP)
[root@tom nfs]# gcc /tmp/test_nfs_exit_thr4.c -pthread -o /tmp/test
[root@tom nfs]# ls -Flaio
total 12
13403720 drwxr-xr-x  2 root 4096 Jul 15 18:00 ./
 1498497 drwxr-xr-x  3 root 4096 Jul 15 16:40 ../
[root@tom nfs]# /tmp/test
main: exit ok!
[root@tom nfs]# /tmp/test
main: exit ok!
[root@tom nfs]# ls -Flaio
total 12
13403720 drwxr-xr-x  2 root 4096 Jul 15 17:55 ./
 1498497 drwxr-xr-x  3 root 4096 Jul 15 16:40 ../
 13403721 ----------  1 root    0 Jan  5  1970 ashfilecs2QiM
 13403722 -rw-------  1 root    0 Jul 15 17:55 .nfs0000000000cc864a000005a8
[root@tom nfs]# ps axf |grep test
 3147 pts/0    S+     0:00          \_ grep test

Note1: ashfiles* files left are ok, they were created but not unlinked before the thread killed, but .nfs* files left seems to be wrong.
Note2: the reproducer does not trigger .nfs* files appearance in 100% cases but still very often.

--- Additional comment from jlayton on 2010-06-09 10:37:44 EDT ---

FWIW, I've been able to reproduce this on 2.6.34-ish kernels too. The race window there seems to be slightly smaller for reasons that aren't exactly clear to me, but it's still present there.

--> CLONING FOR RHEL-6

--- Additional comment from jlayton on 2010-06-28 14:26:46 EDT ---

cc'ing Trond in case he has thoughts on this...

We can't easily make this function use an uninterruptible sleep. What we may actually need to do is make the rename asynchronous, and have the sillyrename thread wait on its completion. That way, if the thread is killed, everything still should proceed to completion.

Comment 2 Jeff Layton 2010-08-18 15:15:14 UTC
I own the RHEL-5 one, so I'll grab the RHEL6 one as well. We'll definitely want to fix this in RHEL6, but a fix will need to go upstream first. I sent a note about this problem to the upstream ML along with what I think is the best way to fix it:

    http://www.spinics.net/lists/linux-nfs/msg15082.html

I'll look at this as soon as I have some time to spend on it.

Comment 3 Jeff Layton 2010-09-10 14:03:01 UTC
Created attachment 446520 [details]
patch -- make sillyrename an async operation

This patchset seems to fix the problem for me. The basic approach is to make sillyrename an asynchronous operation. The caller just waits for the task to complete. If a task is interrupted via SIGKILL, the sillyrename operation will still continue in the background.

This still needs more testing and I need to clean up the commitlog contents, but I'll probably post this upstream within the next week or so. Any testing feedback would be appreciated...

Comment 4 Jeff Layton 2010-09-10 17:24:06 UTC
Ran reproducer in a loop over several hours and ended up with 0 leftover .nfs* files. Seems to work as expected. I'll clean up the set a bit and plan to send it upstream soon.

Comment 5 Jeff Layton 2010-09-19 22:28:46 UTC
Trond is planning to push the patchset for 2.6.37. If all goes well, we should be able to make 6.1 with this.

Comment 6 RHEL Program Management 2010-10-05 01:57:06 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.