Bug 1340085

Summary:	Directory creation(mkdir) fails when the remove brick is initiated for replicated volumes accessing via nfs-ganesha
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Shashank Raj <sraj>
Component:	replicate	Assignee:	Jiffin <jthottan>
Status:	CLOSED ERRATA	QA Contact:	Shashank Raj <sraj>
Severity:	urgent	Docs Contact:
Priority:	unspecified
Version:	rhgs-3.1	CC:	amukherj, jthottan, kkeithle, nchilaka, ndevos, pkarampu, rcyriac, rgowdapp, rhinduja, rhs-bugs, sashinde, skoduri, sraj, storage-qa-internal
Target Milestone:	---	Keywords:	Regression, ZStream
Target Release:	RHGS 3.1.3
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:	glusterfs-3.7.9-7	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1340623 (view as bug list)		Environment:
Last Closed:	2016-06-23 05:24:33 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1311817, 1340623, 1340992, 1342178

Description Shashank Raj 2016-05-26 12:28:12 UTC

Description of problem:

IO hangs on ganesha mount during remove brick operation.

Version-Release number of selected component (if applicable):

glusterfs-3.7.9-5
ganesha-2.3.1-7

How reproducible:

Always

Steps to Reproduce:

pasting the automation logs output:

>>>>>> 2016-05-26 17:29:44,503 INFO removeBrick450999 450999- nfs-ganesha: remove brick while I/O going on, volume being mounted, version=3

>>>> create a volume

gluster volume create newvolume replica 2     transport tcp  dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick0 dhcp37-220.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick1 dhcp37-111.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick2 dhcp37-173.lab.eng.blr.redhat.com:/bricks/brick0/newvolume_brick3 dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick4 dhcp37-220.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick5 dhcp37-111.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick6 dhcp37-173.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick7 dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick3/newvolume_brick8 dhcp37-220.lab.eng.blr.redhat.com:/bricks/brick3/newvolume_brick9 dhcp37-111.lab.eng.blr.redhat.com:/bricks/brick3/newvolume_brick10 dhcp37-173.lab.eng.blr.redhat.com:/bricks/brick2/newvolume_brick11

>>>> gluster volume start newvolume

>>>> gluster volume set newvolume ganesha.enable on

>>>>> do a showmount

2016-05-26 17:30:17,660 INFO run root.eng.blr.redhat.com: showmount -e localhost
2016-05-26 17:30:17,949 INFO run RETCODE: 0
2016-05-26 17:30:17,949 INFO run STDOUT:
Export list for localhost:
/newvolume (everyone)

>>>>> mount the volume on client

2016-05-26 17:30:18,529 INFO run root.eng.blr.redhat.com: mount -t nfs -o vers=3 10.70.36.217:/newvolume /mnt1464263984.5
2016-05-26 17:30:18,875 INFO run RETCODE: 0


>>>>> start creation of directories inside the mount point:

2016-05-26 17:30:18,875 DEBUG run_async root.eng.blr.redhat.com: for i in {1..25}; do mkdir /mnt1464263984.5/a$i;  for j in {1..50}; do mkdir /mnt1464263984.5/a$i/b$j; for k in {1..50}; do touch /mnt1464263984.5/a$i/b$j/c$k; done done done


>>>>> Start the remove brick process

2016-05-26 17:30:49,336 INFO run root.eng.blr.redhat.com: gluster volume remove-brick newvolume replica 2  dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick0 dhcp37-220.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick1 start
2016-05-26 17:31:00,156 INFO run RETCODE: 0
2016-05-26 17:31:00,157 INFO run STDOUT:
volume remove-brick start: success
ID: b9d8c23c-8e53-4fc7-884d-c94450102cc2


>>>>> commit the remove brick operation

2016-05-26 17:31:42,753 INFO remove_brick_commit rebalance is completed, starting the remove-brick commmit process

2016-05-26 17:31:43,143 INFO run root.eng.blr.redhat.com: gluster volume  remove-brick newvolume replica 2  dhcp37-44.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick0 dhcp37-220.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick1 commit --mode=script
2016-05-26 17:31:45,917 INFO run RETCODE: 0
2016-05-26 17:31:45,917 INFO run STDOUT:
volume remove-brick commit: success
Check the removed bricks to ensure all files are migrated.
If files with data are found on the brick path, copy them via a gluster mount point before re-purposing the removed brick.

>>>>> Observe that the IO on the mount point is hanged

[root@dhcp43-129 ~]# ps aux|grep /mnt
root      9251  0.0  0.0 107924   676 ?        S    23:01   0:00 touch /mnt1464263984.5/a13/b31/c13
root      9333  0.0  0.0 112648   976 pts/1    S+   23:23   0:00 grep --color=auto /mnt
root      9428  0.4  0.0 113384  1768 ?        Ss   23:00   0:06 bash -c cd /root && for i in {1..25}; do mkdir /mnt1464263984.5/a$i;  for j in {1..50}; do mkdir /mnt1464263984.5/a$i/b$j; for k in {1..50}; do touch /mnt1464263984.5/a$i/b$j/c$k; done done done

>>>>>> Following errors are seen in ganesha.log on the mounted node:

26/05/2016 06:52:51 : epoch 98870000 : dhcp37-44.lab.eng.blr.redhat.com : ganesha.nfsd-32360[work-14] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_mkdir converted to NFS3ERR_IO but was set non-retryable
26/05/2016 06:52:53 : epoch 98870000 : dhcp37-44.lab.eng.blr.redhat.com : ganesha.nfsd-32360[work-15] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_mkdir converted to NFS3ERR_IO but was set non-retryable
26/05/2016 06:52:55 : epoch 98870000 : dhcp37-44.lab.eng.blr.redhat.com : ganesha.nfsd-32360[work-2] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_mkdir converted to NFS3ERR_IO but was set non-retryable
26/05/2016 06:52:57 : epoch 98870000 : dhcp37-44.lab.eng.blr.redhat.com : ganesha.nfsd-32360[work-3] nfs3_Errno_verbose :NFS3 :CRIT :Error CACHE_INODE_IO_ERROR in nfs3_mkdir converted to NFS3ERR_IO but was set non-retryable

Actual results:

IO hangs on ganesha mount during remove brick operation.

Expected results:

There should not be any IO hang.

Additional info:

Comment 2 Shashank Raj 2016-05-26 12:33:41 UTC

sosreports, logs and packet_trace can be found under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1340085

Comment 3 Atin Mukherjee 2016-05-26 15:09:54 UTC

Why can't we have the back traces of the relevant processes to narrow down the issue?

Comment 4 Jiffin 2016-05-26 18:27:02 UTC

These are my initial findings :

On v3 mount, when we perform "gluster v remove-brick <volname> start" all the ongoing mkdir call will fail with "mkdir: cannot create directory `new': Input/output error". Creation of files is fine.

I hit didn't any IO hang in my setup.(I got simialr log message in ganesha.log and ganesha-gfapi.log)

All the further mkdir request will fail until gluster v remove-brick <volname> commit is performed

Comment 8 Atin Mukherjee 2016-05-30 04:35:56 UTC

Copying the RCA from the cloned upstream bug:

"DHT expects GF_PREOP_CHECK_FAILED to be present in xdata_rsp in case of mkdir
failures because of stale layout. But AFR was unwinding null xdata_rsp in case of failures. This was leading to mkdir failures just after remove-brick. Unwind
the xdata_rsp in case of failures to make sure the response from brick reaches
dht."

Upstream patch http://review.gluster.org/14553 posted for review.

Comment 11 Shashank Raj 2016-06-01 13:31:14 UTC

Verified this bug with latest glusterfs-3.7.9-7 and nfs-ganesha-2.3.1-7 build and during remove-brick operations, no failures for directories are seen.

Ran all the automated remove brick related cases which earlier reported this problem and all are passed without any issues.

based on above observation, marking this bug as Verified.

Comment 13 errata-xmlrpc 2016-06-23 05:24:33 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240