Bug 867649

Summary: nfsd readdir cookies problematic for some 32-bit clients (Solaris, AIX)
Product: [Fedora] Fedora Reporter: Ahmon Dancy <dancy>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16CC: bfields, jlayton, redhat-bugzilla, rvandolson, steved
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-13 17:28:47 EST Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description Ahmon Dancy 2012-10-17 19:35:16 EDT
After upgrading a few of our Fedora hosts to Fedora 16, some of our older NFS clients have been having problems reading directories on mounted NFS exports.  The common factor seems to be the 32-bitness of the client (user mode) software.  E.g. a 32-bit build of a client program would have a problem but a 64-bit build of the same program on the same client would be ok.   One client is Solaris 10 and the other is AIX 5.3.

The exports in question are ext3 filesystems with dir_index enabled (possibly relevant?).  The NFS server is an x86_64 F16 system.

The symptom of the problem is a client program getting EOVERFLOW errors from readdir.   I believe the cause of this is NFSv3 READDIR(PLUS) returning directory entry cookies that require all 64-bits instead of ones that fit into 32-bits.  To test the theory, I built an nfsd.ko with this change:

diff -u -r vanilla-3.4/fs/nfsd/vfs.c linux-3.4.x86_64/fs/nfsd/vfs.c
--- vanilla-3.4/fs/nfsd/vfs.c   2012-05-20 15:29:13.000000000 -0700
+++ linux-3.4.x86_64/fs/nfsd/vfs.c      2012-10-15 13:05:42.000000000 -0700
@@ -799,9 +799,11 @@
        else {
                host_err = ima_file_check(*filp, may_flags);
 
+#if 0 /* DANCY */
                if (may_flags & NFSD_MAY_64BIT_COOKIE)
                        (*filp)->f_mode |= FMODE_64BITHASH;
                else
+#endif
                        (*filp)->f_mode |= FMODE_32BITHASH;
        }

After that the older NFS clients were happy again. 

You may find that this smells similar to the old problems between IRIX and the workaround of using the -32bitclients export option.  It seems like Linux NFS server needs such an option now.
Comment 1 Kevin Layer 2012-10-17 20:08:54 EDT
I would really like to see a fix for this before F16 support runs out.  There have to be a lot of us out there that have these old AIX and Solaris machines.  Thanks.
Comment 2 J. Bruce Fields 2012-10-17 20:41:55 EDT
This is basically the same as the rhel bug 857525.

My own preference would be for an ext4 mount option, as that would also be useful for a couple userland applications (gluster and samba) running into similar trouble.

And, yes, for now you can turn off dir_index as a workaround.
Comment 3 Ahmon Dancy 2012-10-18 10:31:11 EDT
(In reply to comment #2)
> This is basically the same as the rhel bug 857525.

Hmm. Indeed it is!  You can't imagine mow much googling I did w/o finding matching complaints!

> My own preference would be for an ext4 mount option, as that would also be
> useful for a couple userland applications (gluster and samba) running into
> similar trouble.

For the record, we're suffering this problem against ext3 as well.  

So how do we make the mount option happen?  Who do we nag?  

> And, yes, for now you can turn off dir_index as a workaround.

Nod.  In which case I'll just stick w/ my module hack.
Comment 4 J. Bruce Fields 2012-10-18 15:59:36 EDT
(In reply to comment #3)
> For the record, we're suffering this problem against ext3 as well.  
> 
> So how do we make the mount option happen?  Who do we nag?

Probably the nfs and ext4 mailing lists: linux-nfs@vger.kernel.org and linux-ext4@vger.kernel.org.  Probably you'd want to just present your problem and let people argue about the possible solution.
Comment 5 Fedora End Of Life 2013-01-16 12:38:20 EST
This message is a reminder that Fedora 16 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 16. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '16'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 16's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 16 is end of life. If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora, you are encouraged to click on 
"Clone This Bug" and open it against that version of Fedora.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Comment 6 Fedora End Of Life 2013-02-13 17:28:51 EST
Fedora 16 changed to end-of-life (EOL) status on 2013-02-12. Fedora 16 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.
Comment 7 Ray Van Dolson 2013-08-21 20:52:41 EDT
I know this was closed out, but am trying to determine if there ever was a resolution.  I have older Solaris 10 clients experiencing similar symptoms and have discovered that by creating file systems for NFS export on my RHEL6 machines with dir_index disabled that things work as expected.

Does this still need followed-up on?  I'll throw something out on the linux-nfs list most likely as well and possibly open a support call with RH.