Bug 983939 - ls, cd, creation of new nfs mount hangs when a brick goes offline and comes back online.
ls, cd, creation of new nfs mount hangs when a brick goes offline and comes b...
Status: CLOSED EOL
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
2.1
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Nagaprasad Sathyanarayana
storage-qa-internal@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-07-12 06:56 EDT by spandura
Modified: 2016-02-17 19:20 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-12-03 12:16:43 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description spandura 2013-07-12 06:56:05 EDT
Description of problem:
=======================
ls, cd on nfs mount point and creation of new nfs mount point on a 8 x 2 distribute-replicate volume hangs when a brick goes offline and comes back online during dd operation in progress on a file from fuse, nfs mount. 

Version-Release number of selected component (if applicable):
============================================================
root@king [Jul-12-2013-16:14:25] >rpm -qa | grep glusterfs-server
glusterfs-server-3.4.0.12rhs.beta4-1.el6rhs.x86_64

root@king [Jul-12-2013-16:14:31] >gluster --version
glusterfs 3.4.0.12rhs.beta4 built on Jul 11 2013 23:37:17

How reproducible:
====================

Steps to Reproduce:
=====================
1. Create a 8 x 2 distribute replicate volume. Start the volume

2. From 2 client machines (RHEL5.9 and RHEL6.4) create fuse and nfs mount

3. from all the 4 mount points execute: "dd if=/dev/urandom of=./file.1 bs=1M count=20480"

4. while the dd is in progress, bring down one of the brick from the replica which has the file "file.1" 

5. while the dd is still in progress bring back the brick online

6. cd to the nfs mount point from both the clients. 

7. on client1 cd was successful and on client2 cd hung

8 on client1 performed "ls" from the nfs mount point. ls hung on the client1. 

9. on client1 and client2 try to create new nfs mount point . 

10. on client1 creation of mount point hung and on client2 it was successful.

11. when cd to newly created mount point on client2 , cd hung. 

Expected results:
================
ls, cd and creation of mount points should be successful.

Additional info:
==================
root@king [Jul-12-2013-16:22:58] >gluster v status
Status of volume: vol_dis_rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick king:/rhs/brick1/b0				49152	Y	30779
Brick hicks:/rhs/brick1/b1				49152	Y	13891
Brick king:/rhs/brick1/b2				49153	Y	30790
Brick hicks:/rhs/brick1/b3				49153	Y	13902
Brick luigi:/rhs/brick1/b4				49152	Y	13436
Brick lizzie:/rhs/brick1/b5				49152	Y	5728
Brick luigi:/rhs/brick1/b6				49153	Y	13447
Brick lizzie:/rhs/brick1/b7				49153	Y	5739
Brick rhs-client11:/rhs/brick1/b8			49152	Y	14738
Brick rhs-client12:/rhs/brick1/b9			49152	Y	14315
Brick rhs-client11:/rhs/brick1/b10			49153	Y	14749
Brick rhs-client12:/rhs/brick1/b11			49153	Y	14326
Brick rhs-client13:/rhs/brick1/b12			49152	Y	14130
Brick rhs-client14:/rhs/brick1/b13			49152	Y	14619
Brick rhs-client13:/rhs/brick1/b14			49153	Y	13835
Brick rhs-client14:/rhs/brick1/b15			49153	Y	14630
NFS Server on localhost					2049	Y	31070
Self-heal Daemon on localhost				N/A	Y	31077
NFS Server on hicks					2049	Y	14143
Self-heal Daemon on hicks				N/A	Y	14150
NFS Server on rhs-client13				2049	Y	14142
Self-heal Daemon on rhs-client13			N/A	Y	14151
NFS Server on rhs-client12				2049	Y	14543
Self-heal Daemon on rhs-client12			N/A	Y	14552
NFS Server on rhs-client14				2049	Y	14893
Self-heal Daemon on rhs-client14			N/A	Y	14903
NFS Server on rhs-client11				2049	Y	14962
Self-heal Daemon on rhs-client11			N/A	Y	14971
NFS Server on lizzie					2049	Y	5969
Self-heal Daemon on lizzie				N/A	Y	5976
NFS Server on luigi					2049	Y	13690
Self-heal Daemon on luigi				N/A	Y	13697
 
There are no active volume tasks
root@king [Jul-12-2013-16:23:00] >
root@king [Jul-12-2013-16:23:01] >gluster v info
 
Volume Name: vol_dis_rep
Type: Distributed-Replicate
Volume ID: afc6c50a-b023-4cfd-8fca-134f0c60bbe3
Status: Started
Number of Bricks: 8 x 2 = 16
Transport-type: tcp
Bricks:
Brick1: king:/rhs/brick1/b0
Brick2: hicks:/rhs/brick1/b1
Brick3: king:/rhs/brick1/b2
Brick4: hicks:/rhs/brick1/b3
Brick5: luigi:/rhs/brick1/b4
Brick6: lizzie:/rhs/brick1/b5
Brick7: luigi:/rhs/brick1/b6
Brick8: lizzie:/rhs/brick1/b7
Brick9: rhs-client11:/rhs/brick1/b8
Brick10: rhs-client12:/rhs/brick1/b9
Brick11: rhs-client11:/rhs/brick1/b10
Brick12: rhs-client12:/rhs/brick1/b11
Brick13: rhs-client13:/rhs/brick1/b12
Brick14: rhs-client14:/rhs/brick1/b13
Brick15: rhs-client13:/rhs/brick1/b14
Brick16: rhs-client14:/rhs/brick1/b15
Comment 1 spandura 2013-07-12 07:00:38 EDT
The NFS Mounts from client1 and client2 were made to the host "king".
Comment 3 spandura 2013-07-12 07:41:21 EDT
dd on fuse and nfs mount failed eventually: 

Client1 fuse mount:
===================
root@flea [Jul-12-2013-15:16:47] >dd if=/dev/urandom of=./file.1 bs=1M count=20480dd: writing `./file.1': Transport endpoint is not connected
dd: closing output file `./file.1': Transport endpoint is not connected
root@flea [Jul-12-2013-16:30:52] >pwd
root@flea [Jul-12-2013-17:00:43] >ls -lh file.1
-rw-r--r-- 1 root root 554M Jul 12 17:00 file.1

client1 nfs mount:
===================
root@darrel [Jul-12-2013-15:16:38] >dd if=/dev/urandom of=./file.1 bs=1M count=20480
dd: writing `./file.1': Input/output error
554+0 records in
553+0 records out
579862528 bytes (580 MB) copied, 5147.58 s, 113 kB/s
root@darrel [Jul-12-2013-16:42:28] >
root@flea [Jul-12-2013-17:00:43] >ls -lh file.1
-rw-r--r-- 1 root root 554M Jul 12 17:00 file.1

client2 fuse mount:
=====================
root@darrel [Jul-12-2013-15:16:38] >dd if=/dev/urandom of=./file.1 bs=1M count=20480
20480+0 records in
20480+0 records out
21474836480 bytes (21 GB) copied, 2782.47 s, 7.7 MB/s
root@darrel [Jul-12-2013-16:03:03] >
root@darrel [Jul-12-2013-16:16:20] >
root@darrel [Jul-12-2013-16:16:20] >
root@darrel [Jul-12-2013-16:16:20] >ls -lh ./file.1
-rw-r--r--. 1 root root 554M Jul 12 17:01 ./file.1

client2 nfs mount:
===================
root@darrel [Jul-12-2013-15:16:38] >dd if=/dev/urandom of=./file.1 bs=1M count=20480
dd: writing `./file.1': Input/output error
554+0 records in
553+0 records out
579862528 bytes (580 MB) copied, 5147.58 s, 113 kB/s
root@darrel [Jul-12-2013-16:42:28] >
root@darrel [Jul-12-2013-16:42:28] >
root@darrel [Jul-12-2013-16:42:28] >
root@darrel [Jul-12-2013-16:42:28] >ls -lh ./file.1
-rw-r--r--. 1 root root 554M Jul 12 17:01 ./file.1
Comment 4 spandura 2013-07-12 07:46:48 EDT
SOS Reports, Statedumps, Fuse mount logs : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/983939/
Comment 6 Vivek Agarwal 2015-12-03 12:16:43 EST
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.

Note You need to log in before you can comment on or make changes to this bug.