Bug 565974

Summary:

[5.6 FEAT] NFSv4 remove does not wait for close. Silly rename

Product:

Red Hat Enterprise Linux 5

Reporter:

IBM Bug Proxy <bugproxy>

Component:

kernel

Assignee:

Jeff Layton <jlayton>

Status:

CLOSED ERRATA

QA Contact:

Red Hat Kernel QE team <kernel-qe>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

5.6

CC:

chrisvogan, dhoward, jburke, jjarvis, jlayton, jpirko, nobody+PNT0273897, rwheeler, sbest, steved, yanwang

Target Milestone:

Keywords:

FutureFeature, ZStream

Target Release:

5.6

Hardware:

All

OS:

All

Whiteboard:

Fixed In Version:

Doc Type:

Enhancement

Doc Text:

Previously, Connectathon test cases performed on a z/OS NFSv4 server were regularly failing. While the file was being closed prior to the unlink call, the client did not wait for the close to complete before proceeding. This caused it to perform an inappropriate rename instead of unlinking the file, even though it was not required. With this update, removals on NFSv4 mounts should wait for outstanding close calls to complete before proceeding.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-01-13 21:06:21 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

531800, 557597, 642628, 702355

Attachments:

Description	Flags
Backported patch to resolve issue	none
Patch backported for RHEL 5.5	none

Description IBM Bug Proxy 2010-02-16 20:31:31 UTC

1. Feature Overview:
Feature Id: [60829]
a. Name of Feature: [5.6 FEAT] NFSv4 remove does not wait for close. Silly rename
b. Feature Description
Connectathon test cases to z/OS NFS Server are failing because of Silly rename being used before the
file was removed. This is a failure because we closed the file before we removed it. A network trace
shows the close being sent but before the close reply from the server the client sent a lookup to
rename the file to .nfsxxxxxx.


2. Feature Details:
Sponsor: LTC Filesystems
Architectures:
  zSeries - 31/32 Native

Arch Specificity: purely common code
Affects Kernel Modules: Yes
Delivery Mechanism: Backport
Category: kernel
Request Type: Package - Version Update
d. Upstream Acceptance: Accepted
Sponsor Priority P3
f. Severity: normal
IBM Confidential: No
Code Contribution: unsure
g. Component Version Target:
---

3. Business Case
Needed because this is causing System z errors on removing files.

4. Primary contact at Red Hat:
John Jarvis
jjarvis

5. Primary contacts at Partner:
Project Management Contact:
Stephanie A. Glass, sglass.com

Technical contact(s):
Frank Filz, ffilz.com

=Comment: #1=================================================
   Stephanie A. Glass <sglass&#64;us.ibm.com> - 

---Problem Description---
Connectathon test cases to z/OS NFS Server are failing because of Silly rename being used before the
file was removed. This is a failure because we closed the file before we removed it. A network trace
shows the close being sent but before the close reply from the server the client sent a lookup to
rename the file to .nfsxxxxxx.
It looks like Redhat has not picked up the patch that is available.
http://git.linux-nfs.org/?p=trondmy/nfs-2.6.git&a=commitdiff&h=a49c3c7736a2e77931dabc5bc4a83fb4b2da013e

 
Contact Information = Chris Vogan/Cvogan.com 
 
---uname output---
Linux reddy 2.6.18-164.11.1.el5 #1 SMP Wed Jan 6 13:26:04 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
 
Machine Type = HS20 Blade 
 
---Debugger---
A debugger is not configured
 
---Steps to Reproduce---
 C program:
open()
close()
stat()
remove()

Network trace will show the following in this order
Close call
lookup call for rename
close reply
lookup reply    Failured NAMETOOLONG (z/OS exlusive because of 8char name limitation).

Look at packets 122- 128 on the attached network trace.

Also Seems Linux is not obeying  size limits from server.
 
---NFS Component Data--- 
Client grep nfs /proc/mounts output:
 does not exist on the server
 
NFS local mount: Legacy PDS PS or PDSE 
 
NFS local mount: 
 
-Note: Linux NFS client is not mounting via an automounter.
 
NFS utils version: na
 
Output of nfsstat:
 [root@reddy nfs]# nfsstat
Server rpc stats:
calls      badcalls   badauth    badclnt    xdrcall
0          0          0          0          0

Client rpc stats:
calls      retrans    authrefrsh
1673       0          0

Client nfs v3:
null         getattr      setattr      lookup       access       readlink
0         0% 13       16% 4         5% 28       35% 8        10% 0         0%
read         write        create       mkdir        symlink      mknod
1         1% 5         6% 1         1% 0         0% 0         0% 0         0%
remove       rmdir        rename       link         readdir      readdirplus
0         0% 0         0% 0         0% 0         0% 0         0% 8        10%
fsstat       fsinfo       pathconf     commit
2         2% 8        10% 0         0% 1         1%

Client nfs v4:
null         read         write        commit       open         open_conf
0         0% 48        3% 92        5% 7         0% 185      11% 114       7%
open_noat    open_dgrd    close        setattr      fsinfo       renew
0         0% 0         0% 145       9% 27        1% 50        3% 1         0%
setclntid    confirm      lock         lockt        locku        access
26        1% 26        1% 0         0% 0         0% 0         0% 225      14%
getattr      lookup       lookup_root  remove       rename       link
63        4% 221      14% 25        1% 78        4% 0         0% 0         0%
symlink      create       pathconf     statfs       readlink     readdir
0         0% 0         0% 0         0% 1         0% 0         0% 44        2%
server_caps  delegreturn
75        4% 112       7%

 
 
Output from "rpcinfo -p" on the server machine and client:
  rpcinfo -p sjsqa2
   program vers proto   port
    100000    4   tcp    111  portmapper
    100000    3   tcp    111  portmapper
    100000    2   tcp    111  portmapper
    100000    4   udp    111  portmapper
    100000    3   udp    111  portmapper
    100000    2   udp    111  portmapper
    150001    1   udp   2048  pcnfsd
    150001    2   udp   2048  pcnfsd
    100024    1   udp   2043  status
    100024    1   tcp   2043  status
    100021    1   udp   2044  nlockmgr
    100021    1   tcp   2044  nlockmgr
    100021    3   tcp   2044  nlockmgr
    100021    3   udp   2044  nlockmgr
    100021    4   tcp   2044  nlockmgr
    100021    4   udp   2044  nlockmgr
    100003    2   tcp   2049  nfs
    100003    2   udp   2049  nfs
    100003    3   tcp   2049  nfs
    100003    3   udp   2049  nfs
    100003    4   tcp   2049  nfs
    100059    2   udp   2047
    100059    2   tcp   2047
    100044    1   udp   2046
    100044    1   tcp   2046
    100005    1   udp   2045  mountd
    100005    1   tcp   2045  mountd
    100005    3   tcp   2045  mountd
    100005    3   udp   2045  mountd 
 
Server grep nfs /proc/mounts output:
 
 
Architecture and OS of the NFS client: x86_64
 
Output from "ps -ef | grep rpc" on the server and client:
 [root@reddy nfs]# ps -ef|grep rpc
root      1529    56  0 14:56 ?        00:00:00 [rpciod/0]
root      1531    56  0 14:56 ?        00:00:00 [rpciod/1]
root      1532    56  0 14:56 ?        00:00:00 [rpciod/2]
root      1533    56  0 14:56 ?        00:00:00 [rpciod/3]
rpc       2235     1  0 14:56 ?        00:00:00 portmap
rpcuser   2264     1  0 14:56 ?        00:00:00 rpc.statd
root      2298     1  0 14:56 ?        00:00:00 rpc.idmapd
root      2364     1  0 14:56 ?        00:00:00 rpc.gssd -vvv
root      2389     1  0 14:56 ?        00:00:00 rpc.svcgssd -vvv
root      2725     1  0 14:56 ?        00:00:00 rpc.rquotad
root      2740     1  0 14:56 ?        00:00:00 rpc.mountd
root     10668  4669  0 20:16 pts/6    00:00:00 grep rpc
 
 
exportfs -v output:
 exportfs -v does not exist on the server
 
Architecture and OS of the NFS server: z/OS
 
*Additional Instructions for Chris Vogan/Cvogan.com: 
-Post a private note with access information to the server and client if available.
-Attach /var/log/messages

Comment 1 Peter Staubach 2010-02-19 02:09:05 UTC

If the zOS server can not handle this particular situation, how does it
handle unlinked opened files or open files which are the targets of
rename operations?

Comment 2 IBM Bug Proxy 2010-02-19 02:50:51 UTC

------- Comment From cvogan.com 2010-02-18 21:47 EDT-------
Hello Peter,  This is a tricky area of the z/OS NFS server legacy file systems (VSAM, PDS, PDSe and PS type datasets). The dataset format is ABCDEFGH.12345678.abcdefgh.12345678.   so you have multiple 8 char or less qualifiers with a max of 48 chars including the periods.
So its a known restriction when working with these datasets that you must close files before they are removed or risk an error during remove.

Comment 3 Peter Staubach 2010-02-19 03:16:05 UTC

It seems to me that we can fix the sequencing here, but isn't there
going to be a general problem with the unlinked open files and
open files which are the targets of rename?

What if we changed the name of the silly renamed files to .nfsXXXXX?
Would that work?  It doesn't exactly match the filename pattern listed
in Comment #2...

Comment 4 IBM Bug Proxy 2010-02-19 16:10:52 UTC

------- Comment From cvogan.com 2010-02-19 11:06 EDT-------
I saw your comment about rename in your first post.  I do not fully understand how rename or move would be subject to the hidden .nfs file issue.  I ran tests with rename and move which both completed successfully. Granted the files were closed. As I said  The application must be designed to close files before removing them when working with MVS datasets.
As for the change you proposed.  It will not work. The reason is the name is still to long. I counted 9 characters including the period.  I am not looking for linux to resolve the length of the file used for a rename. But if we could prevent the client from getting to far ahead of its self and sending ops before getting replys, that  would help prevent this issue.   This rename issue of open files is not just limited to Linux.

Comment 5 Peter Staubach 2010-02-19 16:36:27 UTC

The rename system call will remove the target filename if it exists.
If that file happened to be open already, then removing it would cause
it to be unaccessible.  Therefore, we use the sillyrename support to
keep the target file existing until it is no longer accessible via
the an open file.

Sorry, perhaps I wasn't be high level enough.  Is it possible to
construct a filename, beginning with .nfs, which would work for the
zOS server?

I believe that focusing on fixing this issue in the NFSv4 code is
not good.  There are two problems that I see.  One is that there are
specific tests in the testsuite which explicitly cause the sillyrename
code to be invoked.  I suppose that these test failures could be
ignored, and are probably already being ignored, but that does not
mean that the lack of correct support for sillyrename is a good thing.
Second, NFS clients, as a matter of normal operations, will occasionally
generate these sillyrename files.  If the server can't handle them,
then that server will not be generally useful as an NFS server.

Comment 6 IBM Bug Proxy 2010-02-19 17:21:18 UTC

------- Comment From cvogan.com 2010-02-19 12:13 EDT-------
It is possible to construct a name that would be compatible.
.nfs1234 would be 1 solution or we can follow the z/OS Naming convention.
.nfs1234.A1234567.B1234567

Note, The NFS server takes the leading period and maps it to an @ so it may store hidden files.

Each qualifier is a 1- to 8-character name. These characters may be alphanumeric, national ($, #, @), or the hyphen. The first character should be either alphabetic or national.

You can join qualifiers with periods. The maximum length of a data set name is as follows:
* Generally, 44 characters, including periods.

If you recall., The z/OS NFS server has 2 sides.  It has a Traditional MVS dataset side and a Unix Hierarchical file system Side. Only the Traditional MVS dataset side is affected.

Comment 7 IBM Bug Proxy 2010-03-02 16:21:03 UTC

------- Comment From cvogan.com 2010-03-02 11:13 EDT-------
I am going to try and recreate with RHEL6.0 Alpha3.

I have another question on the silly rename of a file.  What is the Minimum and Maximum length for a file that is renamed to .nfsxxxxxxxxxxxxxxxxxxxx ?
We are looking into ways to accommodate these hidden file renames.
The maximum rename file size is 36 (All chars/numbers including the leading period). Since it would be masked:
so .nfsxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx(36) would be mapped to
@nfsxxxx.axxxxxxx.axxxxxxx.axxxxxxx.axxxxxxx(44)

Comment 8 Peter Staubach 2010-03-02 16:43:51 UTC

The limits are as identified.  There are three parts, ".nfs", the
NFS_FILEID converted to hex, and then a static counter converted
to hex as well.  The fileid is used to reduce collisions when
multiple clients are creating sillyrename files in the same
directory.

Sorry, I only count 28 bytes, 4 + 16 + 8 in the current version
of the sillyrename code.  Where did the 36 come from?

A better solution might be use open(O_EXCL) to create the unique
name and then use rename.  By doing this, the NFS_FILEID portion
of the filename could be eliminated.  This would shrink the length
of the name down to 12, 4 + 8.

I suspect that this last solution may have a decent chance of
being accepted upstream.

Comment 9 IBM Bug Proxy 2010-03-02 17:20:49 UTC

------- Comment From cvogan.com 2010-03-02 12:15 EDT-------
The 36 came from the maximum length z/OS can accept  fit were to mask the silly rename object.  So if you you are always going to be close to 28 chars then z/OS may be able to make a change to handle these renames.  Other NFS client start at length 8 and then increment.
Unfortunately if NFS changed to length 12 (4+8) z/OS will still have to implement some sort of change to accommodate.  I will talk with development and see how likely it would be to mask these renamed files to something we can handle.

What if the fileID was length 64? Would that increase the size?

Comment 10 Peter Staubach 2010-03-02 17:56:07 UTC

The size is _always_ 28.

The size of the fileid field is 64 bit.  The code to sprintf the
fileid specifies the minimum and maximum size to be printed and
those are 16 and 16, respectively.

The size of the counter is 32 bits.  It is handled exactly the
same way as the fileid.  It always prints 8 hex digits.

I don't see an easy change that would not require the zOS server
to be changed.  I also don't know of any NFS clients which would
not encounter a problem with the current zOS server.  Even Solaris
would eventually encounter a problem if enough open files were
either removed to renamed to.  It uses ".nfs" and then the counter,
although the counter is probably converted to decimal instead of
hex.

Comment 11 IBM Bug Proxy 2010-03-02 19:21:49 UTC

------- Comment From cvogan.com 2010-03-02 14:17 EDT-------
U are correct, We have have seen both AIX and Sun increment beyond 8 chars.
Since the confidence is high that Linux will not produce a silly rename >28 I will work on z/OS and see if we can get a change in place.

Comment 13 IBM Bug Proxy 2010-03-25 18:01:11 UTC

------- Comment From cvogan.com 2010-03-25 13:50 EDT-------
After some discussions with z/OS NFS team, It does not look like Redhat can change the silly rename behavior to address all instances on the z/OS NFS server.  But this is a known issue. One way to prevent silly renames from Linux is to have proper close/remove semantics which was addressed up stream a few years ago.

Comment 14 IBM Bug Proxy 2010-03-25 18:10:33 UTC

------- Comment From ffilz.com 2010-03-25 14:02 EDT-------
Ok, so it sounds like the way we want to proceed here is to have this patch included in RHEL 5.6:

http://git.linux-nfs.org/?p=trondmy/nfs-2.6.git&a=commitdiff&h=a49c3c7736a2e77931dabc5bc4a83fb4b2da013e

We understand silly renames will still occur in other situations, and that zOS will not be able to handle all silly rename variations.

Comment 15 IBM Bug Proxy 2010-04-06 20:01:22 UTC

------- Comment From ffilz.com 2010-04-06 16:00 EDT-------
I have backported the following patch:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a49c3c7736a2e77931dabc5bc4a83fb4b2da013e

In the process, to make the backport sane, I also backported this patch
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b39e625b6e75aa70e26c13f9378756bb5f2af032

Comment 16 IBM Bug Proxy 2010-04-06 20:01:29 UTC

Created attachment 404779 [details]
Backported patch to resolve issue


------- Comment (attachment only) From ffilz.com 2010-04-06 15:52 EDT-------

Comment 17 Jeff Layton 2010-04-12 12:34:11 UTC

Thanks Frank, but the patch doesn't seem to apply cleanly to my RHEL5 tree. On what RHEL version did you base this?

Comment 18 IBM Bug Proxy 2010-04-13 21:01:19 UTC

------- Comment From ffilz.com 2010-04-13 16:57 EDT-------
I had done the patch against RHEL 5.4, we have updated to RHEL 5.5 and I redid the patch. We will test it later today or tomorow.

It looks like hunk1 for nfs4proc.c didn't apply because nfs4_call_async was changed to call rpc_new_task_wq instead of rpc_new_task.

Comment 19 IBM Bug Proxy 2010-04-13 21:01:26 UTC

Created attachment 406346 [details]
Patch backported for RHEL 5.5


------- Comment (attachment only) From ffilz.com 2010-04-13 16:52 EDT-------

Comment 20 Jeff Layton 2010-04-14 10:56:17 UTC

Oops, probably should have mentioned that I backported some of the same patches for 2.6.18 and put them into the test kernels on my people page:

    http://people.redhat.com/jlayton/

...unless you see anything wrong with them, I'll probably go with those as they parallel the upstream fixes more closely. It looks like this also fixes an unrelated problem where the open_files list was being modified without holding the inode lock.

Comment 21 IBM Bug Proxy 2010-04-14 14:21:07 UTC

------- Comment From ffilz.com 2010-04-14 10:17 EDT-------
Oh, thanks Jeff.

Chris, could you verify Jeff's test kernel fixes your problem.

Comment 22 IBM Bug Proxy 2010-04-14 15:20:48 UTC

------- Comment From cvogan.com 2010-04-14 11:18 EDT-------
The latest patched kernel (2.6.18-194.el560829) on a RH EL 5.5 system was successful.

Comment 23 IBM Bug Proxy 2010-04-14 15:41:34 UTC

------- Comment From ffilz.com 2010-04-14 11:34 EDT-------
Great!

Jeff, please let us know when your patch set is accepted into the stream for RHEL 5.6.

Thanks everyone

Comment 24 Jeff Layton 2010-05-03 13:57:33 UTC

I've got a newer version of this patch in my test kernels that fixes a possible regression in the old patch (the close would have been queued to rpciod rather than nfsiod).

Comment 25 Jeff Layton 2010-05-03 14:58:55 UTC

It would be nice if IBM could test that patch too. The differences are minor, but I'd like to confirm that you don't see any problems from it.

Comment 26 IBM Bug Proxy 2010-05-03 18:12:42 UTC

------- Comment From ffilz.com 2010-05-03 14:00 EDT-------
Jeff, your kernels here:


have this update correct?

Comment 27 Jeff Layton 2010-05-04 08:06:50 UTC

That's correct.

Comment 28 IBM Bug Proxy 2010-05-04 13:51:15 UTC

------- Comment From cvogan.com 2010-05-04 09:49 EDT-------
I applied kernel  2.6.18-197.el5.jtltest.101.
I was unable to re-recreate the silly rename issue with a z/OS NFS server that was under heavy load. It looks like the test kernel from Jeff also resolves the issue.

Comment 31 Jarod Wilson 2010-07-19 21:14:21 UTC

in kernel-2.6.18-207.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 33 IBM Bug Proxy 2010-07-20 03:41:12 UTC

------- Comment From cvogan.com 2010-07-19 23:38 EDT-------
Hello I just tried to recreate the failing case with kernel:
Linux reddy1 2.6.18-207.el5 #1 SMP Thu Jul 15 18:22:40 EDT 2010 i686 i686 i386 GNU/Linux

My test completed successfully.

I wrote a simple test case to test the failing condition to kernel:
2.6.18-194.el5PAE

openclose.c
#include<stdio.h>
#include<stdlib.h>
#include<fcntl.h>
#include<string.h>
#include<ctype.h>
#include<time.h>
#include<errno.h>

int main(int argc, char *argv[])
{
int fd=0;                        /*file descriptor*/
char startDir[256];
char mntpt[256];
char inputfile[256];

if (argc != 3) {
printf("Usage: %s <mount point> <data file>\n", argv[0]);
exit(1);
}
strcpy(mntpt, argv[1]);
strcpy(inputfile, argv[2]);
if (chdir(mntpt) < 0) {
printf("Could not Change dir to mount point\n");
exit(1);
}
if ((fd=fopen( inputfile, "w+")) == 0) {
printf("ERROR NO. %d\n",fd);
printf("FAILED on open file %s\n",inputfile);
exit(1);
}
if ((fclose(fd)) < 0 ) {
printf("Could Not Close file %s\n",inputfile);
exit(1);
}
printf("I just tried to remove file: %s\n",inputfile);
if ((remove(inputfile)) < 0 ) {
printf("Failed to remove file %s\n",inputfile);
exit(1);
}
printf("I just tried to close file: %s\n",inputfile);
}

I just wrapped this C program around a looping shell script to run it many times in quick succession.

Comment 37 Steve Dickson 2010-10-21 13:53:59 UTC

*** Bug 439093 has been marked as a duplicate of this bug. ***

Comment 38 yanfu,wang 2010-10-28 03:05:05 UTC

test against comment #33 on RHEL5.6-Server-20101014.0 on i386 and x86_64, run it in loop and can completed successfully.

Comment 39 Martin Prpič 2010-11-11 13:59:10 UTC

    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Previously, Connectathon test cases performed on a z/OS NFSv4 server were regularly failing. While the file was being closed prior to the unlink call, the client did not wait for the close to complete before proceeding. This caused it to perform an inappropriate rename instead of unlinking the file, even though it was not required. With this update, removals on NFSv4 mounts should wait for outstanding close calls to complete before proceeding.

Comment 40 John Jarvis 2010-11-18 21:02:41 UTC

This enhancement request was evaluated by the full Red Hat Enterprise Linux 
team for inclusion in a Red Hat Enterprise Linux minor release.   As a 
result of this evaluation, Red Hat has tentatively approved inclusion of 
this feature in the next Red Hat Enterprise Linux Update minor release.   
While it is a goal to include this enhancement in the next minor release 
of Red Hat Enterprise Linux, the enhancement is not yet committed for 
inclusion in the next minor release pending the next phase of actual 
code integration and successful Red Hat and partner testing.

Comment 41 IBM Bug Proxy 2010-11-30 20:51:35 UTC

------- Comment From sglass.com 2010-11-30 15:40 EDT-------
This feature has been verified.  Thanks

Comment 43 errata-xmlrpc 2011-01-13 21:06:21 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html