Description of problem: - open a file on the NFS filesystem - write a small chunk of data (e.g. 1000 bytes) - call fstat() on the file descriptor you just wrote to The "size" field in fstat() should reflect the data you just wrote. The attached program does the above operations and prints out how much it wrote and how big fstat() says the file is. The output looks something like: Wrote 1000 of 1000 bytes - fstat says 1000 when fstat() works and Wrote 1000 of 1000 bytes - fstat says 0 when it doesn't. The test "works" for both local filesystems and NFS filesystems for RedHat 7.3, and even Sun Solaris. It also works for that latest Linux kernels. It "works" for local filesystem on RedHat Enterprise, but does not work for NFS filesystems (fstat says 0) which is interesting.
Created attachment 130527 [details] demonstrate fstat bug
Please post an bzip2-ed binary tethereal network trace of the traffic between the client and server. Something similar to: tethereal -w /tmp/data.pcap host <server> bzip2 /tmp/data.pcap also what are the mount options your using? 'cat /proc/mounts | grep <mntpoint>' will show the options. Finally, who is the server?
Created attachment 130621 [details] tethereal output running fstat_bug
mount options are: forest-mrpriv:/obj/home9 /cs/home/jas nfs rw,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=forest-mrpriv 0 0
server is redhat 7.3 system running 2.4.32 kernel. (problem supposedly does not occur from FC5).
In addition ... The problem does not occur between RedHat Enterprise systems, but the Redhat Enterprise system and stock 2.4.32 NFS server. The problem does not occur between Solaris, FC5, or stock 2.6 kernel and 2.4.32 NFS server.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
If more info is needed in order to solve the problem, please let me know what I need to provide.
Just to be sure that I understand the situation, what is the client operating system and what is the server operating system when the situation is reproducible?
The NFS server is RedHat 7.3, but running stock Linux 2.4.32 kernel. The problem does not occur when the client is stock 2.6 kernel, or the client is Solaris 8, but it does occur when the client is RedHat Enterprise 4. I am told that the problem does not occur when the client is Fedora Core 5, but I cannot verify that fact since I don't run Fedora Core 5.
Does this problem occur against any other servers or just the one, customized, not really RHL 7.3 server? Are all of these various clients running on the same hardware or on different hardware? I assume that the failing client is running on an i386 system?
With Sun (ultrasparc) as server, and RedHat Enterprise as client, fstat works. With RedHat Enterprise as server and RedHat Enterprise as client, it works. With stock 2.4.32 kernel as server, and RedHat Enterprise as client, fstat fails. Can you try with a different version of RedHat Enterprise that uses the 2.4 kernel along with RedHat Enterprise 4 as client? You must have that setup in a test lab. The result of this would certainly be interesting since at this point, to me, it doesn't seem like it's the server. If the above fails, it would also be interesting to see the result of 2.4-based server, and Fedora Core 5 client. Again, I don't have this setup, but you must have a lab where you can try this out. If the problem doesn't occur with 2.4-based RedHat Linux as server, that's interesting, but I still wonder why other clients do work in this configuration.
Actually, I don't think that I have access to all that equipment and configurations. We just don't have that much equipment laying around with very old releases on it and especially non Red Hat releases. Could you attach a raw tethereal capture file of a failing situation, please? The currently attached pcap file does not contain enough information to be able to tell much of anything, other than the client generated some GETATTR and ACCESS calls for some directories and files. In particular, there was no WRITE call or any LOOKUP operations to be be able to connect file handles to names.
Can you provide me with the code to run that will provide the necessary details, and I will run it and capture the required output? the tethereal output captured was running the initial code attached to this report that simply did an fstat...
Unfortunately, that fstat_bug run may have been on an existing file because there are no LOOKUP operations or any CREATE operations. I need this sort of thing in order to be able to tell which file handle refers to the file which is being opened, written to, and then fstat'd. From this, I can look at the attributes that the server was sending back to see what they look like. So, if you'd run tethereal or tcpdump as before, but start it first and run fstat_bug on a new filename and send me the raw capture file, I would appreciate it. Just out of curiosity, does this reproduce differently depending upon whether the target file exists or not first?
Created attachment 138244 [details] tcpdump output of running fstat_bug on an unknown file
Created attachment 138245 [details] tcpdump output of running fstat_bug on a previously non-existant file
Hi we experienced a problem which may be related after an upgrade RHEL4U3 to RHEL4U4. After the Update, a nastran run on the RHEL4U4 Client fails to run the nastran analysis program. (against a RHEL4u4, rh8, netapp nfs-server) The reason is an IO Error. Nastran complains, that the data it reads ar not reasonable. We did an strace on nastran in both cases and recogniced, that after the Upgrade, a read operation reads only zero bytes, whereas before the upgrade, it reads 32768 bytes at the same point of the nastran run. The interesting part after the upgrade is here: _llseek(12, 131072, [131072], SEEK_SET) = 0 read(12, "", 32768) = 0 before the upgrade it looks like: _llseek(12, 131072, [131072], SEEK_SET) = 0 read(12, "\1\0\0\0\30\0\0\0\0\0\0\0\22\0\0\21\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768 (longer parts are attached) As you can see, that read in the file returns zero bytes. If we mount the filesystem with -o sync or -o noac the application starts working again. cat /proc/mounts rmcs33:/export /net/rmcs33/export nfs rw,sync,nosuid,nodev,v3,rsize=8192,wsize=8192,hard,intr,tcp,lock,addr=rmcs33 0 0 But this option slows down all nfs operations. There must have been a significant change in the nfs client caching behaviour, which makes a write with a subsequent read to fail in some cases. My understanding of posix is (according to the german read manpage), that a read operation after a write operation inside one program has to return the new (written) data. Greetings Hansjörg strace after upgrade 5602 write(13, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"..., 32768) = 32768 5602 _llseek(14, 0, [0], SEEK_SET) = 0 5602 write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"..., 32768) = 32768 5602 _llseek(15, 0, [0], SEEK_SET) = 0 5602 write(15, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"..., 32768) = 32768 5602 _llseek(16, 0, [0], SEEK_SET) = 0 5602 write(16, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"..., 32768) = 32768 5602 _llseek(12, 98304, [98304], SEEK_SET) = 0 5602 write(12, "\1\0\0\0\35\0\0\0\0\0\0\0\2\0\0\21PROJVERS\4\0\0001\f\0"..., 32768) = 32768 5602 write(12, "\1\0\0\0\30\0\0\0\0\0\0\0\22\0\0\21\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768 5602 _llseek(12, 131072, [131072], SEEK_SET) = 0 5602 read(12, "", 32768) = 0 29552 write(13, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"..., 32768) = 32768 29552 _llseek(14, 0, [0], SEEK_SET) = 0 29552 write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"..., 32768) = 32768 29552 _llseek(15, 0, [0], SEEK_SET) = 0 29552 write(15, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"..., 32768) = 32768 29552 _llseek(16, 0, [0], SEEK_SET) = 0 29552 write(16, "\0\0\0\0\0\0\0\0\0\0\0\0\n\0\0\0\0 \0\0s\0\0\0\n\0\0\0"..., 32768) = 32768 29552 _llseek(12, 98304, [98304], SEEK_SET) = 0 29552 write(12, "\1\0\0\0\35\0\0\0\0\0\0\0\2\0\0\21PROJVERS\4\0\0001\f\0"..., 32768) = 32768 29552 write(12, "\1\0\0\0\30\0\0\0\0\0\0\0\22\0\0\21\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768 29552 _llseek(12, 131072, [131072], SEEK_SET) = 0 29552 read(12, "\1\0\0\0\30\0\0\0\0\0\0\0\22\0\0\21\0\0\0\0\0\0\0\0\0\0"..., 32768) = 32768 29552 _llseek(12, 131072, [131072], SEEK_SET) = 0 29552 write(12, "\1\0\0\0\370\37\0\0\0\0\0\0\22\0\0\21\0\0\0\0\0\0\0\0\0"..., 32768) = 32768 29552 _llseek(12, 131072, [131072], SEEK_SET) = 0 29552 read(12, "\1\0\0\0\370\37\0\0\0\0\0\0\22\0\0\21\0\0\0\0\0\0\0\0\0"..., 32768) = 32768 29552 write(12, "\1\0\0\0\256\0\0\0\0\0\0\0\2\0\0\21DBSPACE \4\0\0001\244"..., 32768) = 32768
Hi All, It looks like Bug 236308 is a duplicate of this. Thanks, Devin
This request was previously evaluated by Red Hat Product Management for inclusion in the current Red Hat Enterprise Linux release, but Red Hat was unable to resolve it in time. This request will be reviewed for a future Red Hat Enterprise Linux release.
The original fstat problem does not occur anymore after upgrading to RedHat Enterprise 4.5: hop 310 % ./fstat /cs/home/jas/bugi 1000 Wrote 1000 of 1000 bytes - fstat says 1000 (previous, this would have displayed fstat says 0)
If the bug reoccurs, then please reopen this report.