While testing a four node distribute-replicate setup I noticed the following. Given: * nodes 1 and 3 form one raid 1 (replicate) * nodes 2 and 4 form another raid 1, * these two groups together form a raid 0 (distribute). If nodes 2 and 4 are down, a client is still able to mount the file system, however ~half the files will be missing. This can be a problem is data processing depends on knowing if a file exists. It would be convenient that to have a mount option such as "dont_mount_unless_fs_is_complete", so that mounts would fail if half or a third of the pool is down. Thank you! Note that semiosis from irc suggested that a: "Transport endpoint not connected" error on read could be generated optionally too.
* This problem can be solved during mount by just getting the status of the volume(with "gluster volume status <volname>") before mounting it. This command shows whether or not the bricks of a volume are online. This should let us know if all files will be available after mounting the volume. * If the application depends on the availability of files, checking whether all subvolumes(of distribute) are online or not *only at mount time* is not enough. The bricks can go down anytime after the volume is mounted. Avoiding a single point of failure is an important feature of GlusterFS, as such, failing the mount itself because some of the files are unavailable, IMO, doesn't go with the idea of cluster filesystems. Any thoughts?
I completely agree with your second point in particular. There was discussion about this on IRC a while back... Perhaps I'm wrong, but I think that if the filesystem is mounted and enough nodes go down such that it isn't possible to maintain the entire volume (such as described in my example) then a client error *should* occur, similar to when a regular NFS server goes down. It's not a single point of failure if you've built a cluster with enough separate nodes. However if enough of them go down, eventually you can't offer up the mountpoint. If I can help in some way, please let me know. James
James, there is an option for distribute to not mount if there is some subvolume not up during the mount time. use glusterfs command to mount with '--xlator-option *dht*.assert-no-child-down=yes' and your expectations will be met. Do let us know if that is sufficient. We can think of adding a mount option in mount.glusterfs to utilize this if required.
Amar, et al. I'll be able to test this in the next month since I don't have test machines available at the moment (but they are arriving soon)... Thanks for the help, I'll be back with feedback shortly!
Sounds like a useful option to document? adding the DP flag.