Bug 765501 (GLUSTER-3769) - File system can mount without error when half the files are missing
Summary: File system can mount without error when half the files are missing
Keywords:
Status: CLOSED WONTFIX
Alias: GLUSTER-3769
Product: GlusterFS
Classification: Community
Component: fuse
Version: pre-release
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Rajesh
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-01 21:01 UTC by purpleidea
Modified: 2013-07-04 22:43 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-10-11 11:12:23 UTC
Regression: ---
Mount Type: ---
Documentation: DP
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description purpleidea 2011-11-01 21:01:33 UTC
While testing a four node distribute-replicate setup I noticed the following.

Given:
* nodes 1 and 3 form one raid 1 (replicate)
* nodes 2 and 4 form another raid 1,
* these two groups together form a raid 0 (distribute).

If nodes 2 and 4 are down, a client is still able to mount the file system, however ~half the files will be missing. This can be a problem is data processing depends on knowing if a file exists. It would be convenient that to have a mount option such as "dont_mount_unless_fs_is_complete", so that mounts would fail if half or a third of the pool is down.

Thank you!

Note that semiosis from irc suggested that a: "Transport endpoint not connected" error on read could be generated optionally too.

Comment 1 Rajesh 2011-12-14 05:41:19 UTC
* This problem can be solved during mount by just getting the status of the volume(with "gluster volume status <volname>") before mounting it. This command shows whether or not the bricks of a volume are online. This should let us know if all files will be available after mounting the volume.

* If the application depends on the availability of files, checking whether all subvolumes(of distribute) are online or not *only at mount time* is not enough. The bricks can go down anytime after the volume is mounted. Avoiding a single point of failure is an important feature of GlusterFS, as such, failing the mount itself because some of the files are unavailable, IMO, doesn't go with the idea of cluster filesystems.

Any thoughts?

Comment 2 purpleidea 2011-12-14 08:06:30 UTC
I completely agree with your second point in particular. There was discussion about this on IRC a while back...

Perhaps I'm wrong, but I think that if the filesystem is mounted and enough nodes go down such that it isn't possible to maintain the entire volume (such as described in my example) then a client error *should* occur, similar to when a regular NFS server goes down.

It's not a single point of failure if you've built a cluster with enough separate nodes. However if enough of them go down, eventually you can't offer up the mountpoint.

If I can help in some way, please let me know.

James

Comment 3 Amar Tumballi 2012-02-28 08:39:48 UTC
James, there is an option for distribute to not mount if there is some subvolume not up during the mount time.

use glusterfs command to mount with '--xlator-option *dht*.assert-no-child-down=yes' and your expectations will be met. Do let us know if that is sufficient. We can think of adding a mount option in mount.glusterfs to utilize this if required.

Comment 4 purpleidea 2012-03-31 06:30:05 UTC
Amar, et al. I'll be able to test this in the next month since I don't have test machines available at the moment (but they are arriving soon)... Thanks for the help, I'll be back with feedback shortly!

Comment 5 Vidya Sakar 2012-08-15 09:55:47 UTC
Sounds like a useful option to document? adding the DP flag.


Note You need to log in before you can comment on or make changes to this bug.