Bug 508873

Summary: Fedora's NFS (v3) server badly broken when exporting an ext4 filesystem
Product: [Fedora] Fedora Reporter: Colin.Simpson
Component: kernelAssignee: Jeff Layton <jlayton>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: low    
Version: 11CC: itamar, kernel-maint, quintela, staubach, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-02 10:49:52 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Colin.Simpson 2009-06-30 11:17:46 UTC
Description of problem:

When exporting an ext4 via NFS, then mounting this on RH4 or RH5 system, any compiles on the remote system fail badly with a 

"final close failed: Input/output error"


Version-Release number of selected component (if applicable):

This seems to have only started happening with 2.6.29.5-191.fc11. Doesn't happen if exporting an ext3 filesystem. 

How reproducible:
Every time 

Steps to Reproduce:
1. Get a nontrivial C program (I tested with File System Exerciser I was playing with but unimportant)
2.gcc the program ie gcc fsx-linux.c

  
Actual results:
[rh5-64bit]colin: gcc fsx-linux.c 
���: final close failed: Input/output error
collect2: ld returned 1 exit status

[rh4-32bit]colin: gcc fsx-linux.c 
��X: final close failed: Input/output error
collect2: ld returned 1 exit status

Strace gives:
[pid 21682] close(9)                    = -1 EIO (Input/output error)

Expected results:
No error, which is what I get by running this on the local machine. 

Additional info:
Seems to work to mounted on another F11 machine.

Comment 2 Eric Sandeen 2009-06-30 17:29:58 UTC
Am I reading it correctly that it only fails if the client is RHEL, and F11 clients are fine?

Ok, I'll look into this, thanks.

-Eric

Comment 3 Colin.Simpson 2009-06-30 18:27:10 UTC
You are correct. It fails consistently when the client is a 32 or 64 bit RHEL 4.8 or 5.3 system. F11 clients seem fine. Perhaps interestingly a compile from a Solaris 8 client to the F11 NFS server is actually fine. 

I've just discovered that this isn't related to ext4, it also fails from an ext3 exported file system using 2.6.29.5-191.fc11 too, again using a RHEL 4.8/5.3 client but fine from F11 clients. Sorry for that little piece of incorrect info, I assumed it was ext4 related but our other F11 machines hadn't yet had a reboot into this kernel newer.

Comment 4 Eric Sandeen 2009-06-30 18:34:48 UTC
Fails on xfs too.  Punting to Jeff!  :)

Comment 5 Jeff Layton 2009-06-30 19:50:34 UTC
This looks like a server-side problem...

Looks like the server is responding with success to the write, but with a count of 0. The client then figures this to mean that the write was short and returns -EIO on the close().

I'll look over the server side changes in this area. Colin, what was the last known "good" kernel?

Comment 6 Colin.Simpson 2009-06-30 20:16:59 UTC
The kernel 2.6.29.4-167.fc11 is fine, which I think was the last released one.

Comment 7 Jeff Layton 2009-06-30 20:24:10 UTC
This might be a duplicate of bug 508174. Keeping an eye on koji now and when a -207 or later kernel pops out, I'll plan to test it out.

Comment 8 Jeff Layton 2009-07-01 19:57:39 UTC
Looks like there's a -209 kernel in koji now. Colin, when you get a chance could you test that kernel and let me know if it resolves the problem? If so, then I'll close this as a duplicate of bug 508174.

http://koji.fedoraproject.org/koji/buildinfo?buildID=112420

Comment 9 Colin.Simpson 2009-07-02 09:14:00 UTC
The 2.6.29.6-209.rc1.fc11 kernel does indeed appear to fix this issue. So I guess does look like a dup.

Thanks

Comment 10 Jeff Layton 2009-07-02 10:49:52 UTC

*** This bug has been marked as a duplicate of bug 508174 ***