Description of problem: ======================= As Part of verification of the bug 1330997:[Disperse volume]: IO hang seen on mount with file ops I hit a stale file handle issue on the mount of a ganesha volume MOre information will be updated
Following are the details: I have a rhel7.2 6 node cluster(node1..6) with a 3x(4+2) ec vol Node3,4,5,6 were used to create a ganesha cluster using one VIP for each NOw mounted the ganesha vol in following fashion and IOs: client1(103):rhel7.2-->node6VIP: IO: 1.First created a dir for this mount specific IOs 2. then another dir under it "mkdir -p d1/d2/d3/d4/d5/d6/d7/d8/d9/d10/ 3. File create in following fashion "for i in {1..10000}; do for j in {1...10000};do dd if=/dev/urandom of=dir-$i/d1/d2/d3/d4/d5/d6/d7/d8/d9/d10/file.$j bs=1024 count=1 |& tee -a filecreatesindeepdirs.log;done;done" client2(126):rhel7.2-->node3VIP: IO: dd of 1K size files of about 100000 client3(64):rhel6.8--->node4VIP: IO:untar tar in 10loops for i in {1..10};do echo "=============== LOOP $i ================================" |& tee -a tar.log ;date |& tee -a tar.log;echo "=============== LOOP $i ================================" >>tar.err.log;date >>tar.err.log;mkdir kern.$i |& tee -a tar.log;cp linux-4.6.1.tar.xz kern.$i/ |& tee -a tar.log;tar -xvf kern.$i/linux-4.6.1.tar.xz -C kern.$i/ >> tar.log 2>>tar.err.log; tar -cvf kern.$i/linux-4.6.1-mytar.tar kern.$i/linux-4.6.1 >> tar.log 2>>tar.err.log;echo "=======================" |& tee -a tar.log ;echo "=======================" >> tar.err.log ;echo "###########################################################" |& tee -a tar.log ;echo "###########################################################" |& tee -a tar.log ;echo "###########################################################" >> tar.err.log;echo "###########################################################" >> tar.err.log;sleep 10;done Problem-->After about 5 loops, during 6th loop started to get "bash: tar.log: Remote I/O error" and IO failed checked mount and when trying to cd or do an ls of the mount point it says "stale file handle" client4(162):rhel6.8--->node3VIP: IO: sleep for about 2Hours(so that enough data is populated) and then trigger lookups using ls -lRt * Problem---> the ls -lrt o/p was started to display only after 15 hours CLuster side: Saw an OOM kill of ganesha process on node4 [Thu Jun 9 16:45:34 2016] Out of memory: Kill process 19757 (ganesha.nfsd) score 481 or sacrifice child [Thu Jun 9 16:45:34 2016] Killed process 19757 (ganesha.nfsd) total-vm:8550104kB, anon-rss:3792524kB, file-rss:288kB MOre information: the IO failed on the mount which used ganesha process which got oom killed(node4) I then saw that the VIP was migrated to NOde3 as part of HA However IO has failed on this client(tar load) where the volume was mounted using node4 VIP) Also, the mount command displays the volume mounted on node3 Version-Release number of selected component (if applicable): [root@dhcp35-98 log]# rpm -qa|grep gluster glusterfs-client-xlators-3.7.9-9.el7rhgs.x86_64 nfs-ganesha-gluster-2.3.1-8.el7rhgs.x86_64 glusterfs-server-3.7.9-9.el7rhgs.x86_64 glusterfs-api-3.7.9-9.el7rhgs.x86_64 glusterfs-debuginfo-3.7.9-9.el7rhgs.x86_64 glusterfs-libs-3.7.9-9.el7rhgs.x86_64 glusterfs-fuse-3.7.9-9.el7rhgs.x86_64 glusterfs-cli-3.7.9-9.el7rhgs.x86_64 glusterfs-3.7.9-9.el7rhgs.x86_64 glusterfs-ganesha-3.7.9-9.el7rhgs.x86_64 python-gluster-3.7.9-9.el7rhgs.noarch [root@dhcp35-98 log]# rpm -qa|grep ganesha nfs-ganesha-gluster-2.3.1-8.el7rhgs.x86_64 nfs-ganesha-2.3.1-8.el7rhgs.x86_64 nfs-ganesha-debuginfo-2.3.1-8.el7rhgs.x86_64 glusterfs-ganesha-3.7.9-9.el7rhgs.x86_64 [root@dhcp35-98 log]# rpm -qa|grep selinux selinux-policy-mls-3.13.1-60.el7_2.6.noarch selinux-policy-minimum-3.13.1-60.el7_2.6.noarch selinux-policy-devel-3.13.1-60.el7_2.6.noarch libselinux-2.2.2-6.el7.x86_64 selinux-policy-3.13.1-60.el7_2.6.noarch selinux-policy-sandbox-3.13.1-60.el7_2.6.noarch libselinux-utils-2.2.2-6.el7.x86_64 libselinux-python-2.2.2-6.el7.x86_64 selinux-policy-doc-3.13.1-60.el7_2.6.noarch selinux-policy-targeted-3.13.1-60.el7_2.6.noarch [root@dhcp35-98 log]# rpm -qa|grep pcs pcs-0.9.143-15.el7.x86_64 [root@dhcp35-98 log]# rpm -qa|grep pace pacemaker-cluster-libs-1.1.13-10.el7_2.2.x86_64 pacemaker-cli-1.1.13-10.el7_2.2.x86_64 pacemaker-1.1.13-10.el7_2.2.x86_64 userspace-rcu-0.7.9-2.el7rhgs.x86_64 pacemaker-libs-1.1.13-10.el7_2.2.x86_64 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
RCA: There was a IP failover happened from the node dhcp35-64 to dhcp35-98 and the issue here is that the export-IDs for this volume were different on both these nodes. This resulted in the filehandle of the nfs-client connected becoming stale. [root@dhcp35-64 ~]# cat /etc/ganesha/exports/export.ec-nfsganesha.conf # WARNING : Using Gluster CLI will overwrite manual # changes made to this file. To avoid it, edit the # file and run ganesha-ha.sh --refresh-config. EXPORT{ Export_Id= 2 ; Path = "/ec-nfsganesha"; FSAL { name = GLUSTER; hostname="localhost"; volume="ec-nfsganesha"; } Access_type = RW; Disable_ACL = true; Squash="No_root_squash"; Pseudo="/ec-nfsganesha"; Protocols = "3", "4" ; Transports = "UDP","TCP"; SecType = "sys"; } [root@dhcp35-64 ~]# [root@dhcp35-98 ~]# cat /etc/ganesha/exports/export.ec-nfsganesha.conf # WARNING : Using Gluster CLI will overwrite manual # changes made to this file. To avoid it, edit the # file and run ganesha-ha.sh --refresh-config. EXPORT{ Export_Id= 3 ; Path = "/ec-nfsganesha"; FSAL { name = GLUSTER; hostname="localhost"; volume="ec-nfsganesha"; } Access_type = RW; Disable_ACL = true; Squash="No_root_squash"; Pseudo="/ec-nfsganesha"; Protocols = "3", "4" ; Transports = "UDP","TCP"; SecType = "sys"; } [root@dhcp35-98 ~]#
client2(126)IOs: [root@dhcp35-126 126]# for i in {1..100000}; do dd if=/dev/urandom of=file-$i bs=1k count=512 |& tee -a dd.log; done [root@dhcp35-126 126]# for i in {1..1000000}; do dd if=/dev/urandom of=small-$i bs=1k count=1 |& tee -a dd.log; done
Upstream patch : http://review.gluster.org/#/c/14908/
Downstream patch https://code.engineering.redhat.com/gerrit/#/c/84780/
The fix is now available in glusterfs 3.2 rebase
verified the same testcase on 3.8.4-10 glusterfs-geo-replication-3.8.4-10.el7rhgs.x86_64 nfs-ganesha-gluster-2.4.1-3.el7rhgs.x86_64 glusterfs-3.8.4-10.el7rhgs.x86_64 python-gluster-3.8.4-10.el7rhgs.noarch glusterfs-cli-3.8.4-10.el7rhgs.x86_64 glusterfs-rdma-3.8.4-10.el7rhgs.x86_64 glusterfs-api-3.8.4-10.el7rhgs.x86_64 glusterfs-server-3.8.4-10.el7rhgs.x86_64 glusterfs-api-devel-3.8.4-10.el7rhgs.x86_64 glusterfs-libs-3.8.4-10.el7rhgs.x86_64 glusterfs-devel-3.8.4-10.el7rhgs.x86_64 glusterfs-fuse-3.8.4-10.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-10.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-10.el7rhgs.x86_64 glusterfs-events-3.8.4-10.el7rhgs.x86_64 [root@dhcp35-116 ~]# rpm -qa|grep pace pacemaker-cli-1.1.15-11.el7.x86_64 pacemaker-libs-1.1.15-11.el7.x86_64 pacemaker-1.1.15-11.el7.x86_64 pacemaker-cluster-libs-1.1.15-11.el7.x86_64 userspace-rcu-0.7.9-2.el7rhgs.x86_64 [root@dhcp35-116 ~]# rpm -qa|grep ganesha nfs-ganesha-gluster-2.4.1-3.el7rhgs.x86_64 nfs-ganesha-2.4.1-3.el7rhgs.x86_64 glusterfs-ganesha-3.8.4-10.el7rhgs.x86_64 I even did a halt of one of the node hosting vip (and on which a client was mounted),the IOs paused for sometime and then proceeeded. i brought back the node after 20min and started ganesha and pacemaker, again then the IOs paused for sometime before resuming
[root@dhcp35-37 ~]# gluster v status Status of volume: disperse Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.37:/rhs/brick2/disperse 49153 0 Y 15813 Brick 10.70.35.116:/rhs/brick2/disperse 49153 0 Y 12385 Brick 10.70.35.239:/rhs/brick2/disperse 49153 0 Y 13643 Brick 10.70.35.135:/rhs/brick2/disperse 49152 0 Y 13969 Brick 10.70.35.8:/rhs/brick2/disperse 49152 0 Y 13926 Brick 10.70.35.196:/rhs/brick2/disperse 49152 0 Y 1778 Brick 10.70.35.37:/rhs/brick3/disperse 49154 0 Y 15832 Brick 10.70.35.116:/rhs/brick3/disperse 49154 0 Y 12404 Brick 10.70.35.239:/rhs/brick3/disperse 49154 0 Y 13665 Brick 10.70.35.135:/rhs/brick3/disperse 49153 0 Y 13988 Brick 10.70.35.8:/rhs/brick3/disperse 49153 0 Y 13945 Brick 10.70.35.196:/rhs/brick3/disperse 49153 0 Y 1771 Self-heal Daemon on localhost N/A N/A Y 15093 Self-heal Daemon on dhcp35-196.lab.eng.blr. redhat.com N/A N/A Y 15627 Self-heal Daemon on dhcp35-116.lab.eng.blr. redhat.com N/A N/A Y 11620 Self-heal Daemon on dhcp35-239.lab.eng.blr. redhat.com N/A N/A Y 16477 Self-heal Daemon on dhcp35-8.lab.eng.blr.re dhat.com N/A N/A Y 14191 Self-heal Daemon on dhcp35-135.lab.eng.blr. redhat.com N/A N/A Y 14260 Task Status of Volume disperse ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: gluster_shared_storage Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick dhcp35-116.lab.eng.blr.redhat.com:/va r/lib/glusterd/ss_brick 49152 0 Y 20848 Brick dhcp35-239.lab.eng.blr.redhat.com:/va r/lib/glusterd/ss_brick 49152 0 Y 22100 Brick dhcp35-37.lab.eng.blr.redhat.com:/var /lib/glusterd/ss_brick 49152 0 Y 22873 Self-heal Daemon on localhost N/A N/A Y 15093 Self-heal Daemon on dhcp35-135.lab.eng.blr. redhat.com N/A N/A Y 14260 Self-heal Daemon on dhcp35-8.lab.eng.blr.re dhat.com N/A N/A Y 14191 Self-heal Daemon on dhcp35-116.lab.eng.blr. redhat.com N/A N/A Y 11620 Self-heal Daemon on dhcp35-239.lab.eng.blr. redhat.com N/A N/A Y 16477 Self-heal Daemon on dhcp35-196.lab.eng.blr. redhat.com N/A N/A Y 15627 Task Status of Volume gluster_shared_storage ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp35-37 ~]# gluster v info Volume Name: disperse Type: Distributed-Disperse Volume ID: ea0b329a-f2ff-462d-a3a2-457a80b35e08 Status: Started Snapshot Count: 0 Number of Bricks: 2 x (4 + 2) = 12 Transport-type: tcp Bricks: Brick1: 10.70.35.37:/rhs/brick2/disperse Brick2: 10.70.35.116:/rhs/brick2/disperse Brick3: 10.70.35.239:/rhs/brick2/disperse Brick4: 10.70.35.135:/rhs/brick2/disperse Brick5: 10.70.35.8:/rhs/brick2/disperse Brick6: 10.70.35.196:/rhs/brick2/disperse Brick7: 10.70.35.37:/rhs/brick3/disperse Brick8: 10.70.35.116:/rhs/brick3/disperse Brick9: 10.70.35.239:/rhs/brick3/disperse Brick10: 10.70.35.135:/rhs/brick3/disperse Brick11: 10.70.35.8:/rhs/brick3/disperse Brick12: 10.70.35.196:/rhs/brick3/disperse Options Reconfigured: ganesha.enable: on features.cache-invalidation: on transport.address-family: inet performance.readdir-ahead: on nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable Volume Name: gluster_shared_storage Type: Replicate Volume ID: 9a35f266-5140-4dc3-8949-7adc4eac93c8 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: dhcp35-116.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick Brick2: dhcp35-239.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick Brick3: dhcp35-37.lab.eng.blr.redhat.com:/var/lib/glusterd/ss_brick Options Reconfigured: transport.address-family: inet performance.readdir-ahead: on nfs.disable: on nfs-ganesha: enable cluster.enable-shared-storage: enable
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html
clearing needinfo from closed bug