Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1365626

Summary:	IO hang on ganesha mount during remove brick operation.
Product:	[Community] GlusterFS	Reporter:	Shashank Raj <sraj>
Component:	ganesha-nfs	Assignee:	Niels de Vos <ndevos>
Status:	CLOSED EOL	QA Contact:
Severity:	high	Docs Contact:
Priority:	high
Version:	3.8	CC:	bugs, jthottan, kkeithle, mvignesh, mzywusko, ndevos, skoduri, storage-qa-internal
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1379662 (view as bug list)		Environment:
Last Closed:	2017-11-07 10:39:47 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1379662

Description Shashank Raj 2016-08-09 17:31:46 UTC

Description of problem:

IO hang on ganesha mount during remove brick operation

Version-Release number of selected component (if applicable):

[root@dhcp43-133 ~]# rpm -qa|grep glusterfs
glusterfs-libs-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-fuse-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-api-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-cli-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-ganesha-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-client-xlators-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-server-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-geo-replication-3.8.1-0.4.git56fcf39.el7rhgs.x86_64

[root@dhcp43-133 ~]# rpm -qa|grep ganesha
nfs-ganesha-gluster-2.4-0.dev.26.el7rhgs.x86_64
nfs-ganesha-2.4-0.dev.26.el7rhgs.x86_64
glusterfs-ganesha-3.8.1-0.4.git56fcf39.el7rhgs.x86_64

How reproducible:

Once

Steps to Reproduce:
1.Create a 6x2 dist-rep volume and enable ganesha on the volume.

2.Do a subdir v4 mount on the client

mount -t nfs -o vers=4 10.70.40.192:/newvolume/subdir /mnt1470753422.46

3.Start creating nested dir and files

for i in {1..30}; do mkdir /mnt1470753422.46/a$i;  for j in {1..50}; do mkdir /mnt1470753422.46/a$i/b$j; for k in {1..50}; do touch /mnt1470753422.46/a$i/b$j/c$k; done done done

4.Start the remove brick operation:

gluster volume remove-brick newvolume replica 2  dhcp43-133.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick0 dhcp41-206.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick1 start

5. Once the remove brick operation is complete, commit the brick removal

gluster volume  remove-brick newvolume replica 2  dhcp43-133.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick0 dhcp41-206.lab.eng.blr.redhat.com:/bricks/brick1/newvolume_brick1 commit 

6. Observe that the IO hangs on the client and following messages are seen in /var/log/ganesha.log

[root@dhcp46-206 ~]# ps -ef|grep mkdir
root      9288  9283  0 20:00 ?        00:00:02 bash -c cd /root && for i in {1..30}; do mkdir /mnt1470753422.46/a$i;  for j in {1..50}; do mkdir /mnt1470753422.46/a$i/b$j; for k in {1..50}; do touch /mnt1470753422.46/a$i/b$j/c$k; done done done

09/08/2016 19:29:53 : epoch 57a9cca6 : dhcp43-133.lab.eng.blr.redhat.com : ganesha.nfsd-26092[dbus_heartbeat] posix2fsal_error :FSAL :CRIT :Mapping 107(default) to ERR_FSAL_SERVERFAULT
09/08/2016 19:29:53 : epoch 57a9cca6 : dhcp43-133.lab.eng.blr.redhat.com : ganesha.nfsd-26092[dbus_heartbeat] glusterfs_close_my_fd :FSAL :CRIT :Error : close returns with Transport endpoint is not connected
09/08/2016 19:29:53 : epoch 57a9cca6 : dhcp43-133.lab.eng.blr.redhat.com : ganesha.nfsd-26092[dbus_heartbeat] mdcache_lru_clean :INODE LRU :CRIT :Error closing file in cleanup: Undefined server error
09/08/2016 19:29:53 : epoch 57a9cca6 : dhcp43-133.lab.eng.blr.redhat.com : ganesha.nfsd-26092[dbus_heartbeat] posix2fsal_error :FSAL :CRIT :Mapping 107(default) to ERR_FSAL_SERVERFAULT
09/08/2016 19:29:53 : epoch 57a9cca6 : dhcp43-133.lab.eng.blr.redhat.com : ganesha.nfsd-26092[dbus_heartbeat] glusterfs_close_my_fd :FSAL :CRIT :Error : close returns with Transport endpoint is not connected
09/08/2016 19:29:53 : epoch 57a9cca6 : dhcp43-133.lab.eng.blr.redhat.com : ganesha.nfsd-26092[dbus_heartbeat] mdcache_lru_clean :INODE LRU :CRIT :Error closing file in cleanup: Undefined server error

Actual results:

IO hang on ganesha mount during remove brick operation.

Expected results:

Additional info:

sosreport and logs will be attached

Comment 1 Shashank Raj 2016-08-09 17:37:01 UTC

sosreport and logs can be found under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1365626

Comment 2 Niels de Vos 2016-09-12 05:39:46 UTC

All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

Comment 4 Niels de Vos 2017-11-07 10:39:47 UTC

This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.