Bug 1344675 - Stale file handle seen on the mount of dist-disperse volume when doing IOs with nfs-ganesha protocol
Summary: Stale file handle seen on the mount of dist-disperse volume when doing IOs wi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: common-ha
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: RHGS 3.2.0
Assignee: Jiffin
QA Contact: nchilaka
URL:
Whiteboard:
Depends On: 1348949
Blocks: 1311843 1330997 1351503 1351530
TreeView+ depends on / blocked
 
Reported: 2016-06-10 10:16 UTC by nchilaka
Modified: 2019-04-03 09:28 UTC (History)
11 users (show)

Fixed In Version: glusterfs-3.8.4-1
Doc Type: Bug Fix
Doc Text:
Export configuration files were stored separately in each node. This could lead to situations where a volume was being exported differently on different nodes, and therefore developed inconsistencies after failing over to another node. Configuration details are now stored in a shared meta volume so that volumes are exported identically by all nodes.
Clone Of:
Environment:
Last Closed: 2017-03-23 05:35:34 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description nchilaka 2016-06-10 10:16:29 UTC
Description of problem:
=======================
As Part of verification of the bug 1330997:[Disperse volume]: IO hang seen on mount with file ops 
I hit a stale file handle issue on the mount of a ganesha volume

MOre information will be updated

Comment 2 nchilaka 2016-06-10 10:31:46 UTC
Following are the details:
I have a rhel7.2 6 node cluster(node1..6) with a 3x(4+2) ec vol
Node3,4,5,6 were used to create a ganesha cluster using one VIP for each

NOw mounted the ganesha vol in following fashion and IOs:

client1(103):rhel7.2-->node6VIP:
IO: 1.First created a dir for this mount specific IOs
    2. then another dir under it "mkdir -p d1/d2/d3/d4/d5/d6/d7/d8/d9/d10/   
    3. File create in following fashion "for i in {1..10000}; do for j in {1...10000};do dd if=/dev/urandom of=dir-$i/d1/d2/d3/d4/d5/d6/d7/d8/d9/d10/file.$j bs=1024 count=1 |& tee -a filecreatesindeepdirs.log;done;done"


client2(126):rhel7.2-->node3VIP:
IO: dd of 1K size files of about 100000

client3(64):rhel6.8--->node4VIP:
IO:untar tar in 10loops 
for i in {1..10};do  echo "=============== LOOP $i  ================================"  |& tee -a tar.log ;date  |&  tee -a tar.log;echo "=============== LOOP $i  ================================" >>tar.err.log;date  >>tar.err.log;mkdir kern.$i |& tee -a tar.log;cp  linux-4.6.1.tar.xz kern.$i/ |& tee -a tar.log;tar -xvf  kern.$i/linux-4.6.1.tar.xz -C kern.$i/ >> tar.log  2>>tar.err.log; tar -cvf kern.$i/linux-4.6.1-mytar.tar  kern.$i/linux-4.6.1 >> tar.log 2>>tar.err.log;echo  "=======================" |& tee -a tar.log ;echo  "=======================" >> tar.err.log ;echo  "###########################################################" |& tee  -a tar.log ;echo  "###########################################################" |& tee  -a tar.log ;echo  "###########################################################" >>  tar.err.log;echo  "###########################################################" >>  tar.err.log;sleep 10;done

Problem-->After about 5 loops, during 6th loop started to get "bash: tar.log: Remote I/O error"
and IO failed 
checked mount and when trying to cd or do an ls of the mount point it says "stale file handle"




client4(162):rhel6.8--->node3VIP:
  IO: sleep for about 2Hours(so that enough data is populated) and then trigger lookups using ls -lRt *

Problem---> the ls -lrt o/p was started to display only after 15 hours




CLuster side:
Saw an OOM kill of ganesha process on node4
[Thu Jun  9 16:45:34 2016] Out of memory: Kill process 19757 (ganesha.nfsd) score 481 or sacrifice child
[Thu Jun  9 16:45:34 2016] Killed process 19757 (ganesha.nfsd) total-vm:8550104kB, anon-rss:3792524kB, file-rss:288kB


MOre information:
the IO failed on the mount which used ganesha process which got oom killed(node4)
I then saw that the VIP was migrated to NOde3 as part of HA
However IO has failed on this client(tar load) where the volume was mounted using node4 VIP)
Also, the mount command displays the volume mounted on node3


Version-Release number of selected component (if applicable):

[root@dhcp35-98 log]# rpm -qa|grep gluster
glusterfs-client-xlators-3.7.9-9.el7rhgs.x86_64
nfs-ganesha-gluster-2.3.1-8.el7rhgs.x86_64
glusterfs-server-3.7.9-9.el7rhgs.x86_64
glusterfs-api-3.7.9-9.el7rhgs.x86_64
glusterfs-debuginfo-3.7.9-9.el7rhgs.x86_64
glusterfs-libs-3.7.9-9.el7rhgs.x86_64
glusterfs-fuse-3.7.9-9.el7rhgs.x86_64
glusterfs-cli-3.7.9-9.el7rhgs.x86_64
glusterfs-3.7.9-9.el7rhgs.x86_64
glusterfs-ganesha-3.7.9-9.el7rhgs.x86_64
python-gluster-3.7.9-9.el7rhgs.noarch
[root@dhcp35-98 log]# rpm -qa|grep ganesha
nfs-ganesha-gluster-2.3.1-8.el7rhgs.x86_64
nfs-ganesha-2.3.1-8.el7rhgs.x86_64
nfs-ganesha-debuginfo-2.3.1-8.el7rhgs.x86_64
glusterfs-ganesha-3.7.9-9.el7rhgs.x86_64
[root@dhcp35-98 log]# rpm -qa|grep selinux
selinux-policy-mls-3.13.1-60.el7_2.6.noarch
selinux-policy-minimum-3.13.1-60.el7_2.6.noarch
selinux-policy-devel-3.13.1-60.el7_2.6.noarch
libselinux-2.2.2-6.el7.x86_64
selinux-policy-3.13.1-60.el7_2.6.noarch
selinux-policy-sandbox-3.13.1-60.el7_2.6.noarch
libselinux-utils-2.2.2-6.el7.x86_64
libselinux-python-2.2.2-6.el7.x86_64
selinux-policy-doc-3.13.1-60.el7_2.6.noarch
selinux-policy-targeted-3.13.1-60.el7_2.6.noarch
[root@dhcp35-98 log]# rpm -qa|grep pcs
pcs-0.9.143-15.el7.x86_64
[root@dhcp35-98 log]# rpm -qa|grep pace
pacemaker-cluster-libs-1.1.13-10.el7_2.2.x86_64
pacemaker-cli-1.1.13-10.el7_2.2.x86_64
pacemaker-1.1.13-10.el7_2.2.x86_64
userspace-rcu-0.7.9-2.el7rhgs.x86_64
pacemaker-libs-1.1.13-10.el7_2.2.x86_64



How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Soumya Koduri 2016-06-10 10:50:04 UTC
RCA: 

There was a IP failover happened from the node dhcp35-64 to dhcp35-98 and the issue here is that the export-IDs for this volume were different on both these nodes. This resulted in the filehandle of the nfs-client connected becoming stale.

[root@dhcp35-64 ~]# cat /etc/ganesha/exports/export.ec-nfsganesha.conf 
# WARNING : Using Gluster CLI will overwrite manual
# changes made to this file. To avoid it, edit the
# file and run ganesha-ha.sh --refresh-config.
EXPORT{
      Export_Id= 2 ;
      Path = "/ec-nfsganesha";
      FSAL {
           name = GLUSTER;
           hostname="localhost";
          volume="ec-nfsganesha";
           }
      Access_type = RW;
      Disable_ACL = true;
      Squash="No_root_squash";
      Pseudo="/ec-nfsganesha";
      Protocols = "3", "4" ;
      Transports = "UDP","TCP";
      SecType = "sys";
     }
[root@dhcp35-64 ~]# 


[root@dhcp35-98 ~]# cat /etc/ganesha/exports/export.ec-nfsganesha.conf 
# WARNING : Using Gluster CLI will overwrite manual
# changes made to this file. To avoid it, edit the
# file and run ganesha-ha.sh --refresh-config.
EXPORT{
      Export_Id= 3 ;
      Path = "/ec-nfsganesha";
      FSAL {
           name = GLUSTER;
           hostname="localhost";
          volume="ec-nfsganesha";
           }
      Access_type = RW;
      Disable_ACL = true;
      Squash="No_root_squash";
      Pseudo="/ec-nfsganesha";
      Protocols = "3", "4" ;
      Transports = "UDP","TCP";
      SecType = "sys";
     }
[root@dhcp35-98 ~]#

Comment 5 nchilaka 2016-06-10 11:32:34 UTC
client2(126)IOs:
[root@dhcp35-126 126]#  for i in {1..100000}; do dd if=/dev/urandom of=file-$i bs=1k count=512 |& tee -a dd.log; done


[root@dhcp35-126 126]#  for i in {1..1000000}; do dd if=/dev/urandom of=small-$i bs=1k count=1 |& tee -a dd.log; done

Comment 7 Jiffin 2016-09-16 09:53:44 UTC
Upstream patch : http://review.gluster.org/#/c/14908/

Comment 8 Jiffin 2016-09-16 10:05:18 UTC
Downstream patch https://code.engineering.redhat.com/gerrit/#/c/84780/

Comment 9 Jiffin 2016-09-16 12:49:38 UTC
The fix is now available in glusterfs 3.2 rebase

Comment 12 nchilaka 2016-12-26 12:45:57 UTC
verified the same testcase on 3.8.4-10
glusterfs-geo-replication-3.8.4-10.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.1-3.el7rhgs.x86_64
glusterfs-3.8.4-10.el7rhgs.x86_64
python-gluster-3.8.4-10.el7rhgs.noarch
glusterfs-cli-3.8.4-10.el7rhgs.x86_64
glusterfs-rdma-3.8.4-10.el7rhgs.x86_64
glusterfs-api-3.8.4-10.el7rhgs.x86_64
glusterfs-server-3.8.4-10.el7rhgs.x86_64
glusterfs-api-devel-3.8.4-10.el7rhgs.x86_64
glusterfs-libs-3.8.4-10.el7rhgs.x86_64
glusterfs-devel-3.8.4-10.el7rhgs.x86_64
glusterfs-fuse-3.8.4-10.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-10.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-10.el7rhgs.x86_64
glusterfs-events-3.8.4-10.el7rhgs.x86_64
[root@dhcp35-116 ~]# rpm -qa|grep pace
pacemaker-cli-1.1.15-11.el7.x86_64
pacemaker-libs-1.1.15-11.el7.x86_64
pacemaker-1.1.15-11.el7.x86_64
pacemaker-cluster-libs-1.1.15-11.el7.x86_64
userspace-rcu-0.7.9-2.el7rhgs.x86_64
[root@dhcp35-116 ~]# rpm -qa|grep ganesha
nfs-ganesha-gluster-2.4.1-3.el7rhgs.x86_64
nfs-ganesha-2.4.1-3.el7rhgs.x86_64
glusterfs-ganesha-3.8.4-10.el7rhgs.x86_64




I even did a halt of one of the node hosting vip (and on which a client was mounted),the IOs paused for sometime and then proceeeded.
i brought back the node after 20min and started ganesha and pacemaker, again then the IOs paused for sometime before resuming

Comment 13 nchilaka 2016-12-26 12:46:37 UTC
[root@dhcp35-37 ~]# gluster v status
Status of volume: disperse
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.37:/rhs/brick2/disperse      49153     0          Y       15813
Brick 10.70.35.116:/rhs/brick2/disperse     49153     0          Y       12385
Brick 10.70.35.239:/rhs/brick2/disperse     49153     0          Y       13643
Brick 10.70.35.135:/rhs/brick2/disperse     49152     0          Y       13969
Brick 10.70.35.8:/rhs/brick2/disperse       49152     0          Y       13926
Brick 10.70.35.196:/rhs/brick2/disperse     49152     0          Y       1778 
Brick 10.70.35.37:/rhs/brick3/disperse      49154     0          Y       15832
Brick 10.70.35.116:/rhs/brick3/disperse     49154     0          Y       12404
Brick 10.70.35.239:/rhs/brick3/disperse     49154     0          Y       13665
Brick 10.70.35.135:/rhs/brick3/disperse     49153     0          Y       13988
Brick 10.70.35.8:/rhs/brick3/disperse       49153     0          Y       13945
Brick 10.70.35.196:/rhs/brick3/disperse     49153     0          Y       1771 
Self-heal Daemon on localhost               N/A       N/A        Y       15093
Self-heal Daemon on dhcp35-196.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       15627
Self-heal Daemon on dhcp35-116.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       11620
Self-heal Daemon on dhcp35-239.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       16477
Self-heal Daemon on dhcp35-8.lab.eng.blr.re
dhat.com                                    N/A       N/A        Y       14191
Self-heal Daemon on dhcp35-135.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       14260
 
Task Status of Volume disperse
------------------------------------------------------------------------------
There are no active volume tasks
 
Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick dhcp35-116.lab.eng.blr.redhat.com:/va
r/lib/glusterd/ss_brick                     49152     0          Y       20848
Brick dhcp35-239.lab.eng.blr.redhat.com:/va
r/lib/glusterd/ss_brick                     49152     0          Y       22100
Brick dhcp35-37.lab.eng.blr.redhat.com:/var
/lib/glusterd/ss_brick                      49152     0          Y       22873
Self-heal Daemon on localhost               N/A       N/A        Y       15093
Self-heal Daemon on dhcp35-135.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       14260
Self-heal Daemon on dhcp35-8.lab.eng.blr.re
dhat.com                                    N/A       N/A        Y       14191
Self-heal Daemon on dhcp35-116.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       11620
Self-heal Daemon on dhcp35-239.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       16477
Self-heal Daemon on dhcp35-196.lab.eng.blr.
redhat.com                                  N/A       N/A        Y       15627
 
Task Status of Volume gluster_shared_storage
------------------------------------------------------------------------------
There are no active volume tasks
 
[root@dhcp35-37 ~]# gluster v info
 
Volume Name: disperse
Type: Distributed-Disperse
Volume ID: ea0b329a-f2ff-462d-a3a2-457a80b35e08
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (4 + 2) = 12
Transport-type: tcp
Bricks:
Brick1: 10.70.35.37:/rhs/brick2/disperse
Brick2: 10.70.35.116:/rhs/brick2/disperse
Brick3: 10.70.35.239:/rhs/brick2/disperse
Brick4: 10.70.35.135:/rhs/brick2/disperse
Brick5: 10.70.35.8:/rhs/brick2/disperse
Brick6: 10.70.35.196:/rhs/brick2/disperse
Brick7: 10.70.35.37:/rhs/brick3/disperse
Brick8: 10.70.35.116:/rhs/brick3/disperse
Brick9: 10.70.35.239:/rhs/brick3/disperse
Brick10: 10.70.35.135:/rhs/brick3/disperse
Brick11: 10.70.35.8:/rhs/brick3/disperse
Brick12: 10.70.35.196:/rhs/brick3/disperse
Options Reconfigured:
ganesha.enable: on
features.cache-invalidation: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable
 
Volume Name: gluster_shared_storage
Type: Replicate
Volume ID: 9a35f266-5140-4dc3-8949-7adc4eac93c8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: dhcp35-116.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick
Brick2: dhcp35-239.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick
Brick3: dhcp35-37.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick
Options Reconfigured:
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
nfs-ganesha: enable
cluster.enable-shared-storage: enable

Comment 16 errata-xmlrpc 2017-03-23 05:35:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Comment 17 Jiffin 2017-04-10 11:13:37 UTC
clearing needinfo from closed bug


Note You need to log in before you can comment on or make changes to this bug.