Bug 166543

Summary: NFS Stale locks when NFS export moves to another node
Product: Red Hat Enterprise Linux 4 Reporter: Alex Samad <alexander.samad>
Component: nfs-utilsAssignee: Steve Dickson <steved>
Status: CLOSED DUPLICATE QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.0CC: axel.thimm, kanderso, lhh, poelstra, rkenna, steved
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-10-20 20:50:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
cluster conf file
none
rmtab file from 4 nodes none

Description Alex Samad 2005-08-23 05:11:46 UTC
Description of problem:
I have setup a 4 node cluster with gfs (AS4, Cluster Suite4, GFS 6.1).  I have 
3 gfs partitions that I am trying to export via nfs. This has been done and I 
can mount them on another redhat AS3 box. When I move the nfs shares to other 
nodes to simulate a failover I get stale nfs handle errors on the client 
machine

Upon investigation I found a article for cluster suite 3, about setting FS for 
nfs shares as these need to be the same for each node.  This seems to have 
been removed (the ability to set options, through the gui), but the 
documentation states that clurmtabd should handle this.

I have run cat /var/lib/nfs/rmtab on all the nodes when it is working (before 
the move and after, when I am getting the stale nfs handles) and the numbers 
are  not similiar or consistant across the nodes. NOTE- I am making assumption 
the numbers in these files are the FS numbers.

I have checked the major minor numbers for all the base devices (lvm 
partitions) and they are the same on all the machines!

here is an example output pre move
---
forall cat /var/lib/nfs/rmtab \; echo 
172.20.0.0/16:/mnt/gfs/lv1:0x00000014
172.20.0.0/16:/mnt/gfs/lv2:0x00000014
172.20.0.0/16:/mnt/gfs/lv3:0x0000001a

172.20.0.0/16:/mnt/gfs/lv1:0x00000014
172.20.0.0/16:/mnt/gfs/lv2:0x00000012
172.20.0.0/16:/mnt/gfs/lv3:0x00000017
172.20.231.100:172.20.0.0/16:0x0000000a

172.20.0.0/16:/mnt/gfs/lv1:0x00000014
172.20.0.0/16:/mnt/gfs/lv2:0x00000014
172.20.0.0/16:/mnt/gfs/lv3:0x0000001a

172.20.0.0/16:/mnt/gfs/lv1:0x00000014
172.20.0.0/16:/mnt/gfs/lv2:0x00000014
---

and after 

[samad@rhnsat test]$ forall cat /var/lib/nfs/rmtab \; echo 
172.20.0.0/16:/mnt/gfs/lv1:0x00000014
172.20.0.0/16:/mnt/gfs/lv2:0x00000014
172.20.0.0/16:/mnt/gfs/lv3:0x0000001a

172.20.0.0/16:/mnt/gfs/lv1:0x00000017
172.20.0.0/16:/mnt/gfs/lv2:0x00000012
172.20.0.0/16:/mnt/gfs/lv3:0x00000017
172.20.231.100:172.20.0.0/16:0x0000000b

172.20.0.0/16:/mnt/gfs/lv1:0x00000016
172.20.0.0/16:/mnt/gfs/lv2:0x00000017
172.20.231.100:172.20.0.0/16:0x00000004
172.20.0.0/16:/mnt/gfs/lv3:0x00000002

172.20.0.0/16:/mnt/gfs/lv1:0x00000014
172.20.0.0/16:/mnt/gfs/lv2:0x00000017




Version-Release number of selected component (if applicable):
rgmanager-1.9.34-1

How reproducible:
everytime

Steps to Reproduce:
1.start cluster
2. mount nfs exports
3. move nfs export to new node
 
  
Actual results:
I get nfs stale locks on the client machine

Expected results:
the nfs shafre should seemlessly failover to new node

Additional info:
na

Comment 1 Alex Samad 2005-08-23 05:11:46 UTC
Created attachment 117989 [details]
cluster conf file

Comment 2 Alex Samad 2005-08-23 05:13:17 UTC
Created attachment 117990 [details]
rmtab file from 4 nodes

this shows the rmtab file from all the nodes

Comment 5 Lon Hohberger 2005-10-04 21:28:45 UTC
Assigning to NFS maintainer, but staying on CC list for now.

Comment 9 Kiersten (Kerri) Anderson 2005-10-20 20:50:32 UTC
We think this is a dup of the NFS Failover defect so are going to close it as
such and link it to that one- BZ 132823.

*** This bug has been marked as a duplicate of 132823 ***

Comment 10 Axel Thimm 2005-10-20 21:35:51 UTC
bug #132823 is protected. Could it be opened to the public? Otherwise it would
be good to keep this one open to have something to track.