Bug 194088 - fstat() on NFS filesystem doesn't reflect recent changes
Summary: fstat() on NFS filesystem doesn't reflect recent changes
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Peter Staubach
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 176344
TreeView+ depends on / blocked
 
Reported: 2006-06-05 18:00 UTC by jas
Modified: 2007-11-30 22:07 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-09-10 12:58:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
demonstrate fstat bug (995 bytes, application/octet-stream)
2006-06-05 18:00 UTC, jas
no flags Details
tethereal output running fstat_bug (1.48 KB, application/x-bzip2)
2006-06-06 16:31 UTC, jas
no flags Details
tcpdump output of running fstat_bug on an unknown file (2.38 KB, application/x-bzip2)
2006-10-11 15:25 UTC, jas
no flags Details
tcpdump output of running fstat_bug on a previously non-existant file (2.38 KB, application/x-bzip2)
2006-10-11 15:25 UTC, jas
no flags Details

Description jas 2006-06-05 18:00:21 UTC
Description of problem:

- open a file on the NFS filesystem
- write a small chunk of data (e.g. 1000 bytes)
- call fstat() on the file descriptor you just wrote to

The "size" field in fstat() should reflect the data you just wrote.  The
attached program does the above operations and prints out how much it wrote and
how big fstat() says the file is.  The output looks something like: 

    Wrote 1000 of 1000 bytes - fstat says 1000

when fstat() works and

    Wrote 1000 of 1000 bytes - fstat says 0

when it doesn't.

The test "works" for both local filesystems and NFS filesystems for 
RedHat 7.3, and even Sun Solaris.  It also works for that latest Linux kernels.
 It "works" for local filesystem on RedHat Enterprise, but does not work for NFS
filesystems (fstat says 0) which is interesting.

Comment 1 jas 2006-06-05 18:00:21 UTC
Created attachment 130527 [details]
demonstrate fstat bug

Comment 2 Steve Dickson 2006-06-06 15:46:01 UTC
Please post an bzip2-ed binary tethereal network trace of the 
traffic between the client and server. Something similar
to:

    tethereal -w /tmp/data.pcap host <server>
    bzip2 /tmp/data.pcap

also what are the mount options your using? 
'cat /proc/mounts | grep <mntpoint>' will show 
the options.

Finally, who is the server? 

Comment 3 jas 2006-06-06 16:31:14 UTC
Created attachment 130621 [details]
tethereal output running fstat_bug

Comment 4 jas 2006-06-06 16:32:46 UTC
mount options are:
forest-mrpriv:/obj/home9 /cs/home/jas nfs
rw,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=forest-mrpriv 0 0

Comment 5 jas 2006-06-06 16:34:48 UTC
server is redhat 7.3 system running 2.4.32 kernel.
(problem supposedly does not occur from FC5).



Comment 6 jas 2006-06-06 16:47:19 UTC
In addition ...

The problem does not occur between RedHat Enterprise systems, but the Redhat
Enterprise system and stock 2.4.32 NFS server.

The problem does not occur between Solaris, FC5, or stock 2.6 kernel and 2.4.32
NFS server.


Comment 7 RHEL Program Management 2006-09-07 19:15:04 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 8 RHEL Program Management 2006-09-07 19:15:34 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 12 jas 2006-09-11 13:01:09 UTC
If more info is needed in order to solve the problem, please let me know what I
need to provide.

Comment 13 Peter Staubach 2006-09-11 13:52:50 UTC
Just to be sure that I understand the situation, what is the client
operating system and what is the server operating system when the
situation is reproducible?

Comment 14 jas 2006-09-11 20:10:06 UTC
The NFS server is RedHat 7.3, but running stock Linux 2.4.32 kernel.

The problem does not occur when the client is stock 2.6 kernel, or the client is
Solaris 8, but it does occur when the client is RedHat Enterprise 4.  I am told
that the problem does not occur when the client is Fedora Core 5, but I cannot
verify that fact since I don't run Fedora Core 5.





Comment 15 Peter Staubach 2006-09-11 20:57:33 UTC
Does this problem occur against any other servers or just the one,
customized, not really RHL 7.3 server?

Are all of these various clients running on the same hardware or on
different hardware?  I assume that the failing client is running on
an i386 system?

Comment 16 jas 2006-09-27 13:35:44 UTC
With Sun (ultrasparc) as server, and RedHat Enterprise as client, fstat works.
With RedHat Enterprise as server and RedHat Enterprise as client, it works.
With stock 2.4.32 kernel as server, and RedHat Enterprise as client, fstat fails.

Can you try with a different version of RedHat Enterprise that uses the 2.4
kernel along with RedHat Enterprise 4 as client? You must have that setup in a
test lab.  The result of this would certainly be interesting since at this
point, to me, it doesn't seem like it's the server.

If the above fails, it would also be interesting to see the result of 2.4-based
server, and Fedora Core 5 client.  Again, I don't have this setup, but you must
have a lab where you can try this out.

If the problem doesn't occur with 2.4-based RedHat Linux as server, that's
interesting, but I still wonder why other clients do work in this configuration.

Comment 17 Peter Staubach 2006-09-27 14:02:47 UTC
Actually, I don't think that I have access to all that equipment and
configurations.  We just don't have that much equipment laying around
with very old releases on it and especially non Red Hat releases.

Could you attach a raw tethereal capture file of a failing situation,
please?  The currently attached pcap file does not contain enough
information to be able to tell much of anything, other than the
client generated some GETATTR and ACCESS calls for some directories
and files.  In particular, there was no WRITE call or any LOOKUP
operations to be be able to connect file handles to names.

Comment 18 jas 2006-09-27 19:28:24 UTC
Can you provide me with the code to run that will provide the necessary details,
and I will run it and capture the required output?
the tethereal output captured was running the initial code attached to this
report that simply did an fstat...




Comment 19 Peter Staubach 2006-09-27 20:05:57 UTC
Unfortunately, that fstat_bug run may have been on an existing file
because there are no LOOKUP operations or any CREATE operations.  I
need this sort of thing in order to be able to tell which file handle
refers to the file which is being opened, written to, and then
fstat'd.  From this, I can look at the attributes that the server
was sending back to see what they look like.

So, if you'd run tethereal or tcpdump as before, but start it first
and run fstat_bug on a new filename and send me the raw capture file,
I would appreciate it.

Just out of curiosity, does this reproduce differently depending upon
whether the target file exists or not first?

Comment 20 jas 2006-10-11 15:25:15 UTC
Created attachment 138244 [details]
tcpdump output of running fstat_bug on an unknown file

Comment 21 jas 2006-10-11 15:25:42 UTC
Created attachment 138245 [details]
tcpdump output of running fstat_bug on a previously non-existant  file

Comment 22 Hansjoerg Maurer 2006-10-18 06:52:43 UTC
Hi

we experienced a  problem which may be related after an upgrade RHEL4U3 to RHEL4U4.
After the Update, a nastran run on the RHEL4U4 Client
fails to run the nastran analysis program.
(against a RHEL4u4, rh8, netapp nfs-server)
The reason is an IO Error.
Nastran complains, that the data it reads ar not reasonable.

We did an strace on nastran in both cases and recogniced,
that after the Upgrade, a read operation reads only zero bytes, whereas
before the upgrade, it reads 32768 bytes at the same point of the nastran run.

The interesting part after the upgrade is here:
_llseek(12, 131072, [131072], SEEK_SET) = 0
read(12, "", 32768)               = 0

before the upgrade it looks like:
_llseek(12, 131072, [131072], SEEK_SET) = 0
read(12, "\1\0\0\0\30\0\0\0\0\0\0\0\22\0\0\21\0\0\0\0\0\0\0\0\0\0"..., 32768) =
32768

(longer parts are attached)

As you can see, that read in the file returns zero bytes.
If we mount the filesystem with -o sync or -o noac
the application starts working again.



cat /proc/mounts
rmcs33:/export /net/rmcs33/export nfs
rw,sync,nosuid,nodev,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=rmcs33 0 0

But this option slows down all nfs operations.

There must have been a significant change in the nfs client caching behaviour,
which makes a write with a subsequent read to fail in some cases.
My understanding of posix is
(according to the german read manpage), that a read operation after a write
operation inside one program has to return the new (written) data.



Greetings

Hansjörg




strace after upgrade
5602  write(13, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"...,
32768) = 32768
5602  _llseek(14, 0, [0], SEEK_SET)     = 0
5602  write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"...,
32768) = 32768
5602  _llseek(15, 0, [0], SEEK_SET)     = 0
5602  write(15, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"...,
32768) = 32768
5602  _llseek(16, 0, [0], SEEK_SET)     = 0
5602  write(16, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"...,
32768) = 32768
5602  _llseek(12, 98304, [98304], SEEK_SET) = 0
5602  write(12, "\1\0\0\0\35\0\0\0\0\0\0\0\2\0\0\21PROJVERS\4\0\0001\f\0"...,
32768) = 32768
5602  write(12, "\1\0\0\0\30\0\0\0\0\0\0\0\22\0\0\21\0\0\0\0\0\0\0\0\0\0"...,
32768) = 32768
5602  _llseek(12, 131072, [131072], SEEK_SET) = 0
5602  read(12, "", 32768)               = 0


29552 write(13, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"...,
32768) = 32768
29552 _llseek(14, 0, [0], SEEK_SET)     = 0
29552 write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"...,
32768) = 32768
29552 _llseek(15, 0, [0], SEEK_SET)     = 0
29552 write(15, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"...,
32768) = 32768
29552 _llseek(16, 0, [0], SEEK_SET)     = 0
29552 write(16, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"...,
32768) = 32768
29552 _llseek(12, 98304, [98304], SEEK_SET) = 0
29552 write(12, "\1\0\0\0\35\0\0\0\0\0\0\0\2\0\0\21PROJVERS\4\0\0001\f\0"...,
32768) = 32768
29552 write(12, "\1\0\0\0\30\0\0\0\0\0\0\0\22\0\0\21\0\0\0\0\0\0\0\0\0\0"...,
32768) = 32768
29552 _llseek(12, 131072, [131072], SEEK_SET) = 0
29552 read(12, "\1\0\0\0\30\0\0\0\0\0\0\0\22\0\0\21\0\0\0\0\0\0\0\0\0\0"...,
32768) = 32768
29552 _llseek(12, 131072, [131072], SEEK_SET) = 0
29552 write(12, "\1\0\0\0\370\37\0\0\0\0\0\0\22\0\0\21\0\0\0\0\0\0\0\0\0"...,
32768) = 32768
29552 _llseek(12, 131072, [131072], SEEK_SET) = 0
29552 read(12, "\1\0\0\0\370\37\0\0\0\0\0\0\22\0\0\21\0\0\0\0\0\0\0\0\0"...,
32768) = 32768
29552 write(12, "\1\0\0\0\256\0\0\0\0\0\0\0\2\0\0\21DBSPACE \4\0\0001\244"...,
32768) = 32768

Comment 26 Devin Bougie 2007-04-14 00:38:11 UTC
Hi All,

It looks like Bug 236308 is a duplicate of this.

Thanks,
Devin

Comment 27 RHEL Program Management 2007-05-09 10:15:49 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 29 RHEL Program Management 2007-09-07 19:44:36 UTC
This request was previously evaluated by Red Hat Product Management
for inclusion in the current Red Hat Enterprise Linux release, but
Red Hat was unable to resolve it in time.  This request will be
reviewed for a future Red Hat Enterprise Linux release.

Comment 30 jas 2007-09-10 12:53:43 UTC
The original fstat problem does not occur anymore after upgrading to RedHat
Enterprise 4.5:

hop 310 % ./fstat /cs/home/jas/bugi 1000
Wrote 1000 of 1000 bytes - fstat says 1000

(previous, this would have displayed fstat says 0)




Comment 31 Peter Staubach 2007-09-10 12:58:40 UTC
If the bug reoccurs, then please reopen this report.


Note You need to log in before you can comment on or make changes to this bug.