Bug 206954

Summary: "open" calls on nfs file-systems result in file deletion
Product: [Fedora] Fedora Reporter: Joel Eidsath <thras>
Component: kernelAssignee: Steve Dickson <steved>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 5CC: davej, esandeen, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-02-20 22:20:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
pcap of bug none

Description Joel Eidsath 2006-09-18 13:59:25 UTC
Description of problem:

  When any program (and there are a lot of them!) uses "open" from fcntl.h with
the O_WRONLY and O_TRUNC flags on a file where the group permissions allow
writing but the owner permissions do not, the file is deleted on the server
(like the O_TRUNC flag says to do). However, the open call returns an "EPERM"
error, causing the program to think that the call has failed. This generally
leaves the file zeroed out.
  This problem does not affect nfs shares orginating from FC5 machines. Only
from our freebsd server (5.2.1-RELEASE-p9). However, this problem did not occur
with FC2 or RHEL3.

Version-Release number of selected component (if applicable):
uname -r gives 2.6.17-1.2139_FC5smp for our FC5 machines

How reproducible:
Every time.

Steps to Reproduce:
1. Create file "deadmeat.txt" -- compose your graduate thesis there.
2. Set file owner to foo. Set group to bar. Set permissions to 660.
  
(The next step can be done from any application that uses open with the O_TRUNC
flag. I chose vim here.)
3. From user notfoo in group bar, open deadmeat.txt in vim.
4. Run ":wq"
  
Actual results:
vim gives error: "E212: Can't open file for writing"
The contents of deadmeat.txt (your graduate thesis) are zeroed out.

Expected results:
No error from vim, the save should have proceeded normally.

Additional info:
  Fedora Core 2 and all of our RHEL 3 servers do not have this problem with our
FreeBSD server. This points to a Fedora Core 5 problem. 

You can write a custom C program to test this more directly. Just verify that
this call returns -1 on a file owned by someone else, but in the user's group:
open("filename", O_WRONLY|O_TRUNC, 0660); 
Make sure to include fcntl.h

Comment 1 Joel Eidsath 2006-09-18 14:09:36 UTC
Sorry. Steps to reproduce 1.) should have said to create deadmeat.txt in a
directory from a BSD server shared over nfs.

Comment 2 Eric Sandeen 2006-10-04 19:12:09 UTC
Steve, are you already looking into this?  Someone here offered to set up a bsd
server for me to investigate with if not.

Comment 3 Steve Dickson 2006-10-04 20:26:18 UTC
No... have not looked into this...

Buts lets take up the offer on the server get an 
tethereal trace of the problem.Something similar to:

    tethereal -w /tmp/data.pcap  host <server> ;  bzip2 /tmp/data.pcap

 

Comment 4 Joel Eidsath 2006-10-04 20:46:58 UTC
Created attachment 137772 [details]
pcap of bug

Heres a pcap of the bug. 

The file in question was ~/thras/test

The contents (before deletion) were something along the line of:
hello
world
good

The server is "userhost" 

The client (from which tethereal was being run) is speare5-1-17

Comment 5 Eric Sandeen 2006-10-05 15:24:24 UTC
Joel, I'll try this here too, but is there any chance you can do a test with a
newer FBSD server (6.1)?  It seems like it's the server's responsibility to get
this right, in the end...

Comment 6 Joel Eidsath 2006-10-05 15:41:05 UTC
I'd have to set up a 6.1 server, I don't have one around. If the other person
mentioned in comment #2 already has one up, that might be the easiest way.

I don't see that it can be blamed on the server though, since Ubuntu (6.06 LTS),
RHEL 3, and Fedora Core 2 all get this right. And the server seems to be
responding the O_TRUNC call correctly and zeroing out the file.

Comment 7 Eric Sandeen 2006-10-05 15:58:39 UTC
That's fine, we have a 6.1 server set up here now, I'll give it a shot soon (but
something urgent came up, so this needs to wait just a little bit).

Out of curiosity, is the file zeroed out when viewed on the server as well?

Comment 8 Joel Eidsath 2006-10-05 16:17:05 UTC
>Out of curiosity, is the file zeroed out when viewed on the server as well?

Yes.

Comment 9 Dave Jones 2006-10-16 17:25:32 UTC
A new kernel update has been released (Version: 2.6.18-1.2200.fc5)
based upon a new upstream kernel release.

Please retest against this new kernel, as a large number of patches
go into each upstream release, possibly including changes that
may address this problem.

This bug has been placed in NEEDINFO state.
Due to the large volume of inactive bugs in bugzilla, if this bug is
still in this state in two weeks time, it will be closed.

Should this bug still be relevant after this period, the reporter
can reopen the bug at any time. Any other users on the Cc: list
of this bug can request that the bug be reopened by adding a
comment to the bug.

In the last few updates, some users upgrading from FC4->FC5
have reported that installing a kernel update has left their
systems unbootable. If you have been affected by this problem
please check you only have one version of device-mapper & lvm2
installed.  See bug 207474 for further details.

If this bug is a problem preventing you from installing the
release this version is filed against, please see bug 169613.

If this bug has been fixed, but you are now experiencing a different
problem, please file a separate bug for the new problem.

Thank you.

Comment 10 Eric Sandeen 2006-10-16 17:32:16 UTC
One other note on this one; I tested against freebsd 6.2, and I did not see the
problem.

Steve, have you had a chance to look over the network traces yet?

Comment 11 Joel Eidsath 2006-10-16 17:54:09 UTC
We've tested 2.6.18-1.2200.fc5 and verified that the problem still exists.

Comment 12 Steve Dickson 2006-12-07 00:45:11 UTC
First of all... sorry for taking so long to get back to this...

In the network trace, looking at packets 60 and 61 is appear
the client is truncating the file and the server is returning
EPERM. So it appears the server is zeroing out the file
but also returning  EPERM. Looking at the before and
after attributes (which are part of NFS SETATTR proc)
the size is 21 before the SETATTR and 0 after the
SETATTR....

So looks like its server issue because either the server should
do the truncation and return success or don't do the
truncation and return EPERM... not both... 

Comment 13 Joel Eidsath 2008-02-20 21:17:01 UTC
We solved the problem by replacing our BSD NFS server with an RHEL server.

Comment 14 Steve Dickson 2008-02-20 22:20:02 UTC
Obviously, I think that as the best move... :-)

Thank you for our business!