Bug 426138 - NFS server causes EIO errors from older Linux clients
Summary: NFS server causes EIO errors from older Linux clients
Keywords:
Status: CLOSED DUPLICATE of bug 429109
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.1
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Steve Dickson
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-12-18 19:16 UTC by Jonathan Peatfield
Modified: 2008-01-19 12:24 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-01-19 12:24:38 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Neil Brown suggested essentially this fix for 2.6.19 (1.39 KB, patch)
2007-12-18 19:16 UTC, Jonathan Peatfield
no flags Details | Diff

Description Jonathan Peatfield 2007-12-18 19:16:18 UTC
Description of problem:


Version-Release number of selected component (if applicable):

kernel-2.6.18-53.1.4.el5

(tested on both x86_64 and i686).

How reproducible:

100%

Steps to Reproduce:
1.  On an EL3 machine running 2.4.21-53.EL mount an NFS fs from server, do an ls -l
2.  e.g. mount  zex:/local/scratch/ /mnt/testing/
3.  ls -al /mnt/testing/test
  
Actual results:

$ ls -al /mnt/testing/test           
ls: /mnt/testing/test: Input/output error
-rw-r--r--    1 jp107    other         367 Dec 18 16:24 /mnt/testing/test

Expected results:

$ ls -al /mnt/testing/test           
-rw-r--r--    1 jp107    other         367 Dec 18 16:24 /mnt/testing/test

Additional info:

This seems to be caused by a bug in the nfs acl support code and mounting
without acls makes the error go away, e.g.

umount /mnt/testing
mount -o noacl zex:/local/scratch/ /mnt/testing/
ls -al /mnt/testing/test

doesn't show the problem.  The problem also doesn't seem to happen when using
newer (2.6) kernels so EL4/EL5 clients arn't affected.  This is using v3 mounts btw.

Looking at the logs on the server and tracing the traffic suggested the acl
issue and a few web searches showed up a discussion on the kernel list for a
similar sounding issue in (plain) 2.6.19.

  http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-01/msg03478.html

(for example) describes a fix which ought to help, and adding something
equivalent to the kernel-2.6.18-53.1.4.el5 srpm does seem to fix it for my
simple tests.

Without that patch it seems to not fill in the nfsd_acl_versions[vers] fields so
probably doesn't handle ACLs at all as far as I can follow, but you probably
understand this stuff much better than I do!

I'll attach the patch I used to test, I re-did the patch from his post but
should just have the two changes he suggests.  I don't know if that made it into
the plain kernel tree or not...

Comment 1 Jonathan Peatfield 2007-12-18 19:16:18 UTC
Created attachment 289919 [details]
Neil Brown suggested essentially this fix for 2.6.19

Comment 2 Jonathan Peatfield 2007-12-18 19:21:53 UTC
I should add that the EIO errors are no produced when the server is running
kernel-2.6.18-8.1.15 or earlier.  I didn't test the ones between that and
kernel-2.6.18-53.1.4 so don't know when this regression appeared.

 -- Jon


Comment 3 Andrew C Aitchison 2007-12-19 16:52:17 UTC
Solaris 7 and 8 clients also show this problem:
# ls -l ~/powercut.prod 
NFS getacl failed for server moa: error 9 (RPC: Program/version mismatch)
-rw-r--r--   1 werdna   staff         29 Nov 28 14:53 /homes/werdna/powercut.prod
# getfacl ~/powercut.prod 
NFS getacl failed for server moa: error 9 (RPC: Program/version mismatch)
/homes/werdna/powercut.prod: failed to get acl count
#

Comment 5 Jeff Layton 2008-01-19 12:24:38 UTC

*** This bug has been marked as a duplicate of 429109 ***


Note You need to log in before you can comment on or make changes to this bug.