Description of problem: IO hangs on ganesha mount during remove brick operation. Version-Release number of selected component (if applicable): glusterfs-3.7.9-5 ganesha-2.3.1-7 How reproducible: Always Steps to Reproduce: pasting the automation logs output: >>>>>> 2016-05-26 17:29:44,503 INFO removeBrick450999 450999- nfs-ganesha: remove brick while I/O going on, volume being mounted, version=3 >>>> create a volume gluster volume create newvolume replica 2 transport tcp dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick0 dhcp37-220.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick1 dhcp37-111.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick2 dhcp37-173.lab.eng.blr.redhat.com:/bricks/brick0/newvolume_brick3 dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick4 dhcp37-220.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick5 dhcp37-111.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick6 dhcp37-173.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick7 dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick3/newvolume_brick8 dhcp37-220.lab.eng.blr.redhat.com:/bricks/brick3/newvolume_brick9 dhcp37-111.lab.eng.blr.redhat.com:/bricks/brick3/newvolume_brick10 dhcp37-173.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick11 >>>> gluster volume start newvolume >>>> gluster volume set newvolume ganesha.enable on >>>>> do a showmount 2016-05-26 17:30:17,660 INFO run root.eng.blr.redhat.com: showmount -e localhost 2016-05-26 17:30:17,949 INFO run RETCODE: 0 2016-05-26 17:30:17,949 INFO run STDOUT: Export list for localhost: /newvolume (everyone) >>>>> mount the volume on client 2016-05-26 17:30:18,529 INFO run root.eng.blr.redhat.com: mount -t nfs -o vers=3 10.70.36.217:/newvolume /mnt1464263984.5 2016-05-26 17:30:18,875 INFO run RETCODE: 0 >>>>> start creation of directories inside the mount point: 2016-05-26 17:30:18,875 DEBUG run_async root.eng.blr.redhat.com: for i in {1..25}; do mkdir /mnt1464263984.5/a$i; for j in {1..50}; do mkdir /mnt1464263984.5/a$i/b$j; for k in {1..50}; do touch /mnt1464263984.5/a$i/b$j/c$k; done done done >>>>> Start the remove brick process 2016-05-26 17:30:49,336 INFO run root.eng.blr.redhat.com: gluster volume remove-brick newvolume replica 2 dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick0 dhcp37-220.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick1 start 2016-05-26 17:31:00,156 INFO run RETCODE: 0 2016-05-26 17:31:00,157 INFO run STDOUT: volume remove-brick start: success ID: b9d8c23c-8e53-4fc7-884d-c94450102cc2 >>>>> commit the remove brick operation 2016-05-26 17:31:42,753 INFO remove_brick_commit rebalance is completed, starting the remove-brick commmit process 2016-05-26 17:31:43,143 INFO run root.eng.blr.redhat.com: gluster volume remove-brick newvolume replica 2 dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick0 dhcp37-220.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick1 commit --mode=script 2016-05-26 17:31:45,917 INFO run RETCODE: 0 2016-05-26 17:31:45,917 INFO run STDOUT: volume remove-brick commit: success Check the removed bricks to ensure all files are migrated. If files with data are found on the brick path, copy them via a gluster mount point before re-purposing the removed brick. >>>>> Observe that the IO on the mount point is hanged [root@dhcp43-129 ~]# ps aux|grep /mnt root 9251 0.0 0.0 107924 676 ? S 23:01 0:00 touch /mnt1464263984.5/a13/b31/c13 root 9333 0.0 0.0 112648 976 pts/1 S+ 23:23 0:00 grep --color=auto /mnt root 9428 0.4 0.0 113384 1768 ? Ss 23:00 0:06 bash -c cd /root && for i in {1..25}; do mkdir /mnt1464263984.5/a$i; for j in {1..50}; do mkdir /mnt1464263984.5/a$i/b$j; for k in {1..50}; do touch /mnt1464263984.5/a$i/b$j/c$k; done done done >>>>>> Following errors are seen in ganesha.log on the mounted node: 26/05/2016 06:52:51 : epoch 98870000 : dhcp37-44.lab.eng.blr.redhat.com : ganesha.nfsd-32360[work-14] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_mkdir converted to NFS3ERR_IO but was set non-retryable 26/05/2016 06:52:53 : epoch 98870000 : dhcp37-44.lab.eng.blr.redhat.com : ganesha.nfsd-32360[work-15] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_mkdir converted to NFS3ERR_IO but was set non-retryable 26/05/2016 06:52:55 : epoch 98870000 : dhcp37-44.lab.eng.blr.redhat.com : ganesha.nfsd-32360[work-2] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_mkdir converted to NFS3ERR_IO but was set non-retryable 26/05/2016 06:52:57 : epoch 98870000 : dhcp37-44.lab.eng.blr.redhat.com : ganesha.nfsd-32360[work-3] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_mkdir converted to NFS3ERR_IO but was set non-retryable Actual results: IO hangs on ganesha mount during remove brick operation. Expected results: There should not be any IO hang. Additional info:
sosreports, logs and packet_trace can be found under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1340085
Why can't we have the back traces of the relevant processes to narrow down the issue?
These are my initial findings : On v3 mount, when we perform "gluster v remove-brick <volname> start" all the ongoing mkdir call will fail with "mkdir: cannot create directory `new': Input/output error". Creation of files is fine. I hit didn't any IO hang in my setup.(I got simialr log message in ganesha.log and ganesha-gfapi.log) All the further mkdir request will fail until gluster v remove-brick <volname> commit is performed
Copying the RCA from the cloned upstream bug: "DHT expects GF_PREOP_CHECK_FAILED to be present in xdata_rsp in case of mkdir failures because of stale layout. But AFR was unwinding null xdata_rsp in case of failures. This was leading to mkdir failures just after remove-brick. Unwind the xdata_rsp in case of failures to make sure the response from brick reaches dht." Upstream patch http://review.gluster.org/14553 posted for review.
Verified this bug with latest glusterfs-3.7.9-7 and nfs-ganesha-2.3.1-7 build and during remove-brick operations, no failures for directories are seen. Ran all the automated remove brick related cases which earlier reported this problem and all are passed without any issues. based on above observation, marking this bug as Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2016:1240