Bug 211006 - exportfs will not reload /etc/exports (RPC error messages)
Summary: exportfs will not reload /etc/exports (RPC error messages)
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: nfs-utils
Version: 4.4
Hardware: ia64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Steve Dickson
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-10-16 19:47 UTC by Franco M. Bladilo
Modified: 2012-06-20 13:30 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-20 13:30:57 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
output from tethereal (384.28 KB, application/x-bzip)
2006-12-07 16:19 UTC, Thomas Walker
no flags Details

Description Franco M. Bladilo 2006-10-16 19:47:36 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7

Description of problem:
After we have received the first of these messages from our nfs clients (all RHEL4 u4):
Oct 16 13:43:01 n110 kernel: RPC: error 512 connecting to server rtcfiler 
Oct 16 13:43:01 n40 kernel: RPC: error 512 connecting to server rtcfiler 
Oct 16 13:43:01 n98 kernel: RPC: error 512 connecting to server rtcfiler 
Oct 16 13:43:01 n22 kernel: RPC: error 512 connecting to server rtcfiler 
Oct 16 13:43:01 n30 kernel: RPC: error 512 connecting to server rtcfiler 

The nfs server starts misbehaving, something as trivial as a re-export of nfs shares doesn't even work : 

[root@rtcfiler /]# cat /etc/exports 
/users  192.168.0.0/255.255.255.0(rw,async) 192.168.0.250(rw,async,no_root_squash)
/apps   192.168.0.0/255.255.255.0(async) 192.168.0.250(rw,async,no_root_squash) 192.168.0.240(rw,async,no_root_squash)
/projects 192.168.0.0/255.255.255.0(rw,async,fsid=20) 192.168.0.250(rw,async,no_root_squash,fsid=20)
/backups rcsg.rice.edu(rw,async,no_root_squash,fsid=40) 128.42.6.40(rw,async,fsid=40) neptuno.rice.edu(rw,async,fsid=40)

(I added my workstation neptuno to the /backups share ACL for this example)

[root@rtcfiler /]# service nfs reload
[root@rtcfiler /]# showmount -e localhost
Export list for localhost:
/apps     192.168.0.0/255.255.255.0,management.rtc
/users    192.168.0.0/255.255.255.0
/backups  hera.cs.rice.edu,rcsg.rice.edu
/projects 192.168.0.0/255.255.255.0

On the server , /etc/sysconfig/nfs looks like : 
[root@rtcfiler /]# cat /etc/sysconfig/nfs 
RPCNFSDCOUNT=32
RPCNFSDARGS="--no-nfs-version 4"
MOUNTD_NFS_V3=default

Server NFS stats: 

Server packet stats:
packets    udp        tcp        tcpconn
308225124   50886640   257312455   86472   
Server rpc stats:
calls      badcalls   badauth    badclnt    xdrcall
308202850   0          0          0          0       
Server reply cache:
hits       misses     nocache
53970      135127025   173021908
Server file handle cache:
lookup     anon       ncachedir ncachedir  stale
0          0          0          0          111448  
Server nfs v2:
null       getattr    setattr    root       lookup     readlink   
0       0% 0       0% 0       0% 0       0% 0       0% 0       0% 
read       wrcache    write      create     remove     rename     
0       0% 0       0% 0       0% 0       0% 0       0% 0       0% 
link       symlink    mkdir      rmdir      readdir    fsstat     
0       0% 0       0% 0       0% 0       0% 0       0% 0       0% 

Server nfs v3:
null       getattr    setattr    lookup     access     readlink   
1397    0% 99325945 32% 5573248  1% 28216968  9% 17945580  5% 51162   0% 
read       write      create     mkdir      symlink    mknod      
9045297  2% 112054739 36% 9917846  3% 298554  0% 97662   0% 13      0% 
remove     rmdir      rename     link       readdir    readdirplus
6377358  2% 67129   0% 420595  0% 373811  0% 7628    0% 861807  0% 
fsstat     fsinfo     pathconf   commit     
29184   0% 1718    0% 0       0% 17531392  5% 



These are the client mount options : 

rtcfiler:/users on /users type nfs (rw,bg,noacl,hard,intr,rsize=32768,wsize=32768,timeo=50,retrans=10,addr=192.168.0.247)
rtcfiler:/projects on /projects type nfs (rw,bg,noacl,hard,intr,rsize=32768,wsize=32768,timeo=50,retrans=10,addr=192.168.0.247)
rtcfiler:/apps on /opt/apps type nfs (rw,bg,noacl,hard,intr,rsize=32768,wsize=32768,timeo=50,retrans=10,addr=192.168.0.247)

Client NFS stats: 

[root@n97 ~]# nfsstat -c
Client rpc stats:
calls      retrans    authrefrsh
528922     1          0       
Client nfs v3:
null       getattr    setattr    lookup     access     readlink   
0       0% 52631   9% 1760    0% 7908    1% 22234   4% 76      0% 
read       write      create     mkdir      symlink    mknod      
15390   2% 367759 69% 2347    0% 3       0% 0       0% 0       0% 
remove     rmdir      rename     link       readdir    readdirplus
51      0% 3       0% 0       0% 0       0% 0       0% 52      0% 
fsstat     fsinfo     pathconf   commit     
18      0% 4       0% 0       0% 58686  11% 

The ethernet switch used on this cluster is healthy, we ran several mtr sessions from different clients to the server for days without packet loss.

There's a strange pattern in when these errors are triggered, it seems like it happens always when the clients close() a file. This is the main fileserver for a beowulf cluster and we've had a lot of failures along with these error messages always at the end of jobs which is when the job output gets transfered and FH(s) get released.

Thanks in advance,

Version-Release number of selected component (if applicable):
nfs-utils-1.0.6-70.EL4

How reproducible:
Always


Steps to Reproduce:
1.Install and use NFS on an homogenous RHEL4 u4 IA64 platform
2.Wait until the first client reports a RPC "Error 512 connecting to server"
3.Modify /etc/exports and re-export it


Actual Results:
I couldn't re-export nfs shares without restarting the entire service. Other apps/services using NFS needed to be restarted. (PBS mom in our case)


Expected Results:
The filesystems should have been re-exported reflecting the changes made /etc/exports. Services on the compute nodes shouldn't need to be restarted when this happens.

Additional info:

Comment 1 Steve Dickson 2006-10-19 22:44:11 UTC
Would it be possible to post a bzip2 binary ethereal network
trace between the server and client? Somthing similar to:
    tethereal -w /tmp/data.pcap host <server> 
    bzip2 /tmp/data.pcap

Comment 2 Thomas Walker 2006-12-05 13:01:17 UTC
  Since there doesn't seem to be any activity with this bug, I will add that we
are seeing a similar problem.

  Our server is running RHEL4 i386, kernel 2.6.9-42.0.2.ELsmp and the version of
nfs is nfs-utils-1.0.6-70.EL4.  We are reasonably up2date on patches. 
Specifically, when I make a change to /etc/exports on the server the changes to
not appear to take effect when doing a "service nfs reload" like they have in
the past.  However, a "service nfs restart" did seem to work.  I have not seen
any errors in the messages files on either the client or server.

  Thomas Walker

Comment 3 Steve Dickson 2006-12-07 02:11:49 UTC
Thomas,

Would it be possible to get a tethereal network trace as described
in Comment #1?

Comment 4 Thomas Walker 2006-12-07 16:19:07 UTC
Created attachment 143062 [details]
output from tethereal


  output of;

redhat1.stsci.edu> tethereal -w /tmp/data.pcap host redhat-srvr2.stsci.edu
redhat1.stsci.edu> bzip2 /tmp/data.pcap

Comment 5 Steve Dickson 2006-12-08 12:26:50 UTC
Just to be clear, your getting the same RPC errors as described in
the first bug comment?

Also, does doing a 'exportfs -arv' instead of a 'service reload' make any
difference? 

Comment 6 Thomas Walker 2006-12-08 14:42:28 UTC
> Just to be clear, your getting the same RPC errors as described in
> the first bug comment?
>
> Also, does doing a 'exportfs -arv' instead of a 'service reload' make any
> difference? 

  I do not see any RPC errors on either the client or server.  My symptom is
that changes to /etc/exports are not being picked up by a "service nfs reload"
like they used to be.  I would rather not do an "exportfs -arv" at this time,
since this is a production server and I can't risk causing a problem.  I can try
that command at a more quiet time of day but it won't be till next week.

  Thomas Walker

Comment 7 Michael Kingsbury 2006-12-13 16:50:59 UTC
This defect is describing a problem I'm seeing now on a production box. 

Kernel:Linux george 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:54:53 EST 2006 i686
athlon i386 GNU/Linux
nfs-utils-1.0.6-65.EL4


The interesting part is I don't see /proc/fs/nfsd/exports get updated after
attempting to export a new fs w/in the realm of an already running nfs server. 
I haven't spotted the RPC errors, but I have 400-500 clients & can not check
them all.  

If you have suggestions on how to manually manipulate the exports via /proc w/o
using exportfs, I'm willing to give that a try if it helps.  

-mike



[root@george nfs]# cat /proc/fs/nfsd/exports
# Version 1.1
# Path Client(Flags) # IPs
/emc100 *(rw,no_root_squash,sync,wdelay)
/viewstore      *(rw,no_root_squash,sync,wdelay)
[root@george nfs]# cat /etc/exports
/viewstore *(rw,no_root_squash)
/emc100 *(rw,no_root_squash)
/emc102 *(rw,no_root_squash)



[root@george nfs]# exportfs -a
exportfs: /etc/exports [1]: No 'sync' or 'async' option specified for export
"*:/viewstore".
  Assuming default behaviour ('sync').
  NOTE: this default has changed from previous versions
exportfs: /etc/exports [2]: No 'sync' or 'async' option specified for export
"*:/emc100".
  Assuming default behaviour ('sync').
  NOTE: this default has changed from previous versions
exportfs: /etc/exports [3]: No 'sync' or 'async' option specified for export
"*:/emc102".
  Assuming default behaviour ('sync').
  NOTE: this default has changed from previous versions
[root@george nfs]# cat /proc/fs/nfsd/exports
# Version 1.1
# Path Client(Flags) # IPs
/emc100 *(rw,no_root_squash,sync,wdelay)
/viewstore      *(rw,no_root_squash,sync,wdelay)


--- 

And looking from a Solaris client (less issues w/ new NFS shares on a
pre-existing server): 
> ls /net/george
emc100     viewstore
> ls /net/george/emc102
/net/george/emc102: No such file or directory
>



Comment 9 Jiri Pallich 2012-06-20 13:30:57 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.


Note You need to log in before you can comment on or make changes to this bug.