Bug 763737 (GLUSTER-2005) - Mounting Gluster volume with RO bricks hangs
Summary: Mounting Gluster volume with RO bricks hangs
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-2005
Product: GlusterFS
Classification: Community
Component: core
Version: 3.1.0
Hardware: All
OS: All
low
medium
Target Milestone: ---
Assignee: Amar Tumballi
QA Contact:
URL:
Whiteboard:
: GLUSTER-1905 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-10-22 21:37 UTC by Jacob Shucart
Modified: 2015-12-01 16:45 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: fuse
Documentation: DA
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Jacob Shucart 2010-10-22 21:37:00 UTC
I created LVM snapshots on 4 different systems and then I created a distribute volume consisting of those read only bricks(hostname:/tmp/distribute.snapshot/ was the brick names) and distribute-snap was the volume name.  I started the volume.

I then try to mount it to a client system and it hangs things like when I go to do a df -h.  I have to forcibly unmount the volume.

mount -t glusterfs jacobgfs31:/distribute-snap /mnt

If I then look on the Gluster server I was mounting from I see that the glusterfs process that was running the distribute-snap volume is no longer running.

Comment 1 Amar Tumballi 2010-11-08 07:13:02 UTC
Yes.. Server process will exit because it needs 'extended attribute' support from the backend bricks. If backend bricks are in RO mode, glusterfs fails to start or behave properly. Hence this behaves as if mount hang scenario without starting the volume.

As of now, GlusterFS doesn't support RO backend. We will be addressing the mount command issues if the volume is not started soon, so users are not left with a hung mount point.

Comment 2 Jacob Shucart 2010-11-08 13:28:29 UTC
If we don't support this, then we should give an error if someone tries to set things up in this way...

Comment 3 Jacob Shucart 2010-12-08 19:48:16 UTC
Will this be fixed in 3.1.2?  I had a customer(Kaltura) report an issue where one of their bricks because read only and it hung the entire volume.  This should not happen...

Comment 4 Amar Tumballi 2011-01-21 07:41:36 UTC
*** Bug 1905 has been marked as a duplicate of this bug. ***

Comment 5 Anand Avati 2011-02-22 14:21:51 UTC
PATCH: http://patches.gluster.com/patch/6233 in master (send the CHILD_DOWN event also to fuse)

Comment 6 Amar Tumballi 2011-02-24 08:42:39 UTC
By sending the CHILD_DOWN event to fuse, the cases where some (or all) of the servers (glusterfsd) have not started, the mount point doesn't hang, but continues to work with existing glusterfsd (if all are not available like the case here, it will complain that 'Transport End point not connected')

Its marked for DP because, we have to specify in FAQ (or similar) pages that if user gets 'Transport end point not connected' error, (s)he should check if all the glusterfsd's are running fine.

Comment 7 Saurabh 2011-03-10 08:17:12 UTC
when glusterfsd process is killed,


[root@centos-qa-3 nfs-test]# ls
file.1   file.11  file.14  file.18  file.2   file.3   file.4  file.6  file.8  read.fsxgood
file.10  file.13  file.15  file.19  file.20  file.30  file.5  file.7  read


when second is also killed,

[root@centos-qa-3 nfs-test]# echo test >> file.30
[root@centos-qa-3 nfs-test]# ls
file.1   file.11  file.14  file.18  file.2   file.3   file.4  file.6  file.8  read.fsxgood
file.10  file.13  file.15  file.19  file.20  file.30  file.5  file.7  read


when the third is killed, specifically one both the glusterfsd processes on one brick are killed,


[root@centos-qa-3 nfs-test]# ls
file.1   file.11  file.14  file.18  file.2   file.3   file.4  file.6  file.8  read.fsxgood
file.10  file.13  file.15  file.19  file.20  file.30  file.5  file.7  read
[root@centos-qa-3 nfs-test]# echo test1 >> file30
-bash: file30: Transport endpoint is not connected

when all the processes are killed,

[root@centos-qa-3 nfs-test]# ls
ls: .: Transport endpoint is not connected


Note You need to log in before you can comment on or make changes to this bug.