Bug 736394

Summary: Files not visible by nfs-clients
Product: Red Hat Enterprise Linux 5 Reporter: swtest
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 5.7CC: bfields, jlayton, mishu, pasteur, steved, toracat
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-13 11:51:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
NFS:Create a file Reply , Status = OK
none
the network trace with empty dir_wcc after attributes
none
frame193031_wcc_after_is_empty_ms_netmon
none
frame193031_wcc_after_is_empty_wireshark
none
a sample testscript we used
none
a sample network trace with inner-loop of 1500 files none

Description swtest 2011-09-07 15:18:22 UTC
Created attachment 521909 [details]
NFS:Create a file Reply , Status = OK

Description of problem:
Sometimes are files not visible by nfs-clients (nfs client cache is active)

Version-Release number of selected component (if applicable):
NFS-Server is: built-in NFS-Server in MS Windows Server 2008 R2
NFS-Clients: any nfs client in RHEL 5.x or 6.x (tested with 32bit and 64bit)

How reproducible:
In some cases after creating/deleting some files in a directory,
the nfs client does not show the correct files (one or more files are missing)
After "touch some_new_file_in_the_directory" the cache is updated and the missing file is visible.


Steps to Reproduce:
1. create some files and check via "ls" command if the file is visible
2. delete some files and check via "ls" command if the file was deleted successfully
3. loop 1. and 2. if no error and stop if there is an error

The testscript looks like the following:
#!/bin/sh

echo "Testing nfs... (Press Ctrl-C to abort)"
dst="testfile.txt"
src="sourcefile"
echo "content" >$src
big=0
bigmax=500000

while [ $big -lt $bigmax ]; do
big=`expr $big + 1`
echo -n .
c=0
cmax=5

wcstart=`ls -1 $dst* 2>/dev/null|wc |awk '{print $1}'`

while [ $c -lt $cmax ]; do
    c=`expr $c + 1`
    w=`expr $wcstart + $c`

######## copy ########
    /bin/cp $src $dst.$c
    dl=`ls -al $dst* 2>/dev/null`
    l=`ls -1 $dst* 2>/dev/null|wc |awk '{print $1}'`
    if [ $l -eq $w ]; then
        echo -n c >>/dev/null
    else
        echo "!!!Create-copy-Error at (`date` - big:$big)! loopcount:$w listcount:$l dirlist:"; echo "$dl"
        exit 1
    fi

######## echo ########
    echo $c >>$dst.$c
    dl=`ls -al $dst* 2>/dev/null`
    l=`ls -1 $dst* 2>/dev/null|wc |awk '{print $1}'`
    if [ $l -eq $w ]; then
        echo -n e >>/dev/null
    else
        echo "!!!Create-echo-Error at (`date` - big:$big)! loopcount:$w listcount:$l dirlist:"; echo "$dl"
        exit 1
    fi
done


######### delete ###########
c=0
wcstart=`ls -1 $dst* 2>/dev/null|wc |awk '{print $1}'`

while [ $c -lt $cmax ]; do
    c=`expr $c + 1`
    w=`expr $wcstart - $c`
    /bin/rm $dst.$c
    dl=`ls -al $dst* 2>/dev/null`
    l=`ls -1 $dst* 2>/dev/null|wc |awk '{print $1}'`
    if [ $l -eq $w ]; then
        echo -n .
    else
        echo "!!!Delete-Error at (`date` - big:$big)! loopcount:$w listcount:$l dirlist:"; echo "$dl"
        exit 1
    fi
done

done

/bin/rm $src

echo " "
exit 0


  
Actual results:
(see dumped nfs-packed result in the attached file)


Expected results:
The nfs-sever should give the information in the reply packet,
or the nfs-client should ask a directory listing after such an answer packet.
The problem is, we do not know if this is a nfs-server or a nfs-client error.


Additional info:
In RFC1813 on page 22/23 (depends on page length) is written:

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
post_op_attr
union post_op_attr switch (bool attributes_follow) {
case TRUE:
fattr3 attributes;
case FALSE:
void;
};
This structure is used for returning attributes in those
operations that are not directly involved with manipulating
attributes. One of the principles of this revision of the NFS
protocol is to return the real value from the indicated
operation and not an error from an incidental operation. The
post_op_attr structure was designed to allow the server to
recover from errors encountered while getting attributes.
This appears to make returning attributes optional. However,
server implementors are strongly encouraged to make best effort
to return attributes whenever possible, even when returning an
error.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


Does it mean that this structure can be void in case of a true reply?
Or with other words, is the reply packet from an nfs-server correct or not?
(please look at the attached file “no_attributes.PNG”)
The directoryWCC after attribute has no value in case of a positive reply answer.


We opend a case by Microsoft, but they answered us (about the post_op_attr):
// so this is not mandatory. Under heavy load, the server may choose to skip this data

Is this answer from Microsoft really correct?

Comment 1 Steve Dickson 2011-09-28 17:34:28 UTC
Would it be possible to get a binary network trace. Something like:

yum install tshark
tshark -w /tmp/data.pcap host <server>
[in another terminal generate the traffic]
bzip2 /tmp/data.pcap

Then please attache the bzipped file to this bz.

Comment 2 swtest 2011-09-29 08:24:42 UTC
Created attachment 525481 [details]
the network trace with empty dir_wcc after attributes

Comment 3 swtest 2011-09-29 08:26:01 UTC
Created attachment 525482 [details]
frame193031_wcc_after_is_empty_ms_netmon

Comment 4 swtest 2011-09-29 08:26:50 UTC
Created attachment 525483 [details]
frame193031_wcc_after_is_empty_wireshark

Comment 5 swtest 2011-09-29 08:27:56 UTC
Hello Steve,
attached to this comment you'll find an example of a network trace between
nfs-server 192.168.4.15 and nfs-client 192.168.4.199.
We did the trace on the microsoft nfs-server, the client sends a ping after
the first error (if the assumend number of files are not correct in the directory from the view of the script in the nfs-client directory)

We think frame #193031 in this example is the most interesting.
(see attached *.png files)
The server sends an nfsv3-create-reply-ok packet back to the client, but without
a dir_wcc after attribute (this is RFC 1813 compliant as you can see above). 
The nfs-client-cache follows to this in a "wrong way". As a result of this the file is not visible in the nfs-client cache and our script stops.

If we modify our script to create more than 1000 files in a directory,
some other errors occours. We can see (on the nfs-client) some duplicate filenames.

In all that cases at the nfs-server all files are ok.

Many thanks and best regards
Rüdiger Hartmann

Comment 6 Steve Dickson 2011-09-29 11:51:48 UTC
(In reply to comment #5)
> 
> We think frame #193031 in this example is the most interesting.
> (see attached *.png files)
> The server sends an nfsv3-create-reply-ok packet back to the client, but
> without a dir_wcc after attribute (this is RFC 1813 compliant as you can
> see above).
This should not be problem since the client will only use these post attrs
if they exist. 

> The nfs-client-cache follows to this in a "wrong way". As a result of this the
> file is not visible in the nfs-client cache and our script stops.
This is where you lose me. I don't understand your first sentence.
What do you mean by "nfs-client-cache follows to this in a "wrong way""?

Also, are you saying out of the 500 files create, only testfile.txt.0
(the one with the missing directory post opts) is not seen in the ls -s ?

> 
> If we modify our script to create more than 1000 files in a directory,
> some other errors occours. We can see (on the nfs-client) some duplicate
> filenames.
Could you please attach those errors as well?

Comment 7 swtest 2011-09-29 13:02:25 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > 
> > We think frame #193031 in this example is the most interesting.
> > (see attached *.png files)
> > The server sends an nfsv3-create-reply-ok packet back to the client, but
> > without a dir_wcc after attribute (this is RFC 1813 compliant as you can
> > see above).
> This should not be problem since the client will only use these post attrs
> if they exist. 
Yes, that would be fine.

> 
> > The nfs-client-cache follows to this in a "wrong way". As a result of this the
> > file is not visible in the nfs-client cache and our script stops.
> This is where you lose me. I don't understand your first sentence.
> What do you mean by "nfs-client-cache follows to this in a "wrong way""?
I can see only the result in the directory that one or more files are
not visible at the client (after a lot of main-loops of the script).

The main tasks in the script are
o create a file
o check if the file is visible
o write some content in this file
o check if the file is visible
o---o loop for xxx files (this is called the inner loop)
o delete a file
o check if the file was deleted 
o---o loop for xxx files
o if xxx files were created and deleted successful goto start (this is called a main loop)

(a full version of the script you can find in the uploads)

We started with inner-loops of 500. 

For filecheck we compare the counter in the script with the result of the
ls -1 counting, like "ls -1 $dst* 2>/dev/null|wc |awk '{print $1}'".


> 
> Also, are you saying out of the 500 files create, only testfile.txt.0
> (the one with the missing directory post opts) is not seen in the ls -s ?
> 
> > 
> > If we modify our script to create more than 1000 files in a directory,
> > some other errors occurs. We can see (on the nfs-client) some duplicate
> > filenames.
> Could you please attach those errors as well?

Yes, I'll create a new dump. In a few minutes I'll upload it.

Comment 8 swtest 2011-09-29 13:03:33 UTC
Created attachment 525563 [details]
a sample testscript we used

Comment 9 swtest 2011-09-29 14:07:04 UTC
Created attachment 525581 [details]
a sample network trace with inner-loop of 1500 files

Here is an other network trace with inner-loop of 1500 files

At the end (before the ping) we saw a lot of duplicate files, e.g.
ls -1 gave the listing
...
testfile.txt.649
testfile.txt.650
testfile.txt.650
testfile.txt.650
testfile.txt.651
...

After a touch qqq in that directory (I think the nfs-cache was updated)
all was fine again:
ls -1 gave the listing
...
testfile.txt.649
testfile.txt.650
testfile.txt.651
...

At the nfs-server-side we saw never duplicated or missing files.

Comment 10 swtest 2011-09-29 14:10:50 UTC
Comment on attachment 525581 [details]
a sample network trace with inner-loop of 1500 files

Here is an other network trace with inner-loop of 1500 files

At the end (before the ping) we saw a lot of duplicate files, e.g.
ls -1 gave the listing
...
testfile.txt.649
testfile.txt.650
testfile.txt.650
testfile.txt.650
testfile.txt.651
...

After a touch qqq in that directory (I think the nfs-cache was updated)
all was fine again:
ls -1 gave the listing
...
testfile.txt.649
testfile.txt.650
testfile.txt.651
...

At the nfs-server-side we saw never duplicated or missing files.

Comment 11 Jeff Layton 2011-12-13 10:57:40 UTC
> 
> Does it mean that this structure can be void in case of a true reply?
> Or with other words, is the reply packet from an nfs-server correct or not?
> (please look at the attached file “no_attributes.PNG”)
> The directoryWCC after attribute has no value in case of a positive reply
> answer.
> 
> 
> We opend a case by Microsoft, but they answered us (about the post_op_attr):
> // so this is not mandatory. Under heavy load, the server may choose to skip
> this data
> 
> Is this answer from Microsoft really correct?

Yes, they're correct. Pre and post op attrs are entirely optional. They are strongly recommended, but the client should properly handle the case where they are not present.

I'll look over the traces.

Comment 12 swtest 2011-12-13 11:31:39 UTC
Thank you for your reply.

I'm looking forward to the answer of your tracing-investigation.

Best regards

Rüdiger Hartmann

Comment 13 Jeff Layton 2011-12-13 11:51:31 UTC
Looking at the capture in comment #9, there are some irregularities.
testfile.txt.650 is created and is given fileid 45270164. The first READDIRPLUS
reply from the server that contains that file gives it cookie #1516706264.
Later READDIRPLUS calls however give that file an entirely different cookie --
289117061.

When an entire directory's contents won't fit in a single READDIRPLUS reply,
the server will set a flag indicating that there are more entries to follow.
The client will then ask for the next set of entries by issuing another
READDIRPLUS request starting at a given cookie.

The NFS protocol and servers are generally stateless however, so there's
nothing that prevents changes to that directory between the two calls. If the
cookies on files change, then the client can get confused and display duplicate
files, skip others, etc...

I think the main problem is that you're expecting consistency here where there
really isn't any guarantee of that.

If you really need consistency in a readdir() call, then you need to use
some sort of locking to ensure it. In this case, you'd want to ensure that no
files are created, deleted or renamed in the directory while the readdir() call
runs.

I see no evidence of a bug here, so I'm going to close this case with a
resolution of NOTABUG. If you want to discuss this further, then I suggest
opening a support case and working with our support people to further narrow the cause.

Comment 14 Akemi Yagi 2013-03-28 17:39:46 UTC
The output in comment #9 looks similar to what is described in:

https://access.redhat.com/knowledge/solutions/306063

except the latter is specific for NFS servers running RHEL 5.9 32-bit.