Hide Forgot
I set up a two-server scenario with "volume create shared replica 2 transport tcp 192.168.99.9:/mnt/glusterfs/shared 192.168.99.10:/mnt/glusterfs/shared" and started the volume. Then mounted the volume on both of the servers. Forcing the ethernet connecting between both servers down results in no responds from the local gluster mount. As an example: I did a "dd" on the first machine, brought the ethernet link down and tried to "ls" on the mountpoint or executing "df -h" on both servers. All this results in a stalled shell not responding to anything (including Ctrl-C). Cancelling the "dd" job doesn't work too. Killing all glusterfs processes, unmounting the mountpoint brings system back to normal status (all shells are responding). Then mounting the volume again without reenabling the ethernet link works and the local mount is accessible.
A friend of mine tried this with four machines: Volume Name: vstore Type: Replicate Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: 192.168.123.123:/srv/export Brick2: 192.168.123.124:/srv/export Brick3: 192.168.123.125:/srv/export Brick4: 192.168.123.126:/srv/export Setting up such a scenario and disconnecting Ethernet never falls into the same problem as i have. So we think this is might be a problem of the Quorum.
Hi, Glusterfs uses ip addresses. If all the interfaces are down, then the mount-point cannot resolve the bricks (even if it is a local-host, as it does not use 127.0.0.1 addr). Since all the bricks go down, the Ops fail. As for the hang, fixes in 3.1.1 have taken care of it, and after a 42sec timeout, the ops terminate. Once the network is back up, and the bricks are up, the mount point is active again. You do not have to restart all the servers/bricks. When there were 4 bricks/server, since it had access to at least one of the bricks, the ops were successful in that instance. With regards, Shishir
Hi, sorry for the late answer. Thx for your explanation, so in my understanding if a 2-node setup looses interface connectivity to the other node it lasts 42 seconds until the mount-point gets active again/is responding ? I ask this because i want to use glusterfs to store images of my kvm virtual machines, maildirs and webhosting stuff on this two machines. Further plans are more machines that hold the same content. If the glusterfs goes down, atm this means a totally unresponding mount-point, virtual machines that are not responding or maybe crashing, imap-servers that can't delive mailbox content and so on. Regards, Andreas
Hi Andreas, Recently we fixes some bug similar to this (bug 763737), can you try the same experiment with latest git head (now available at https://github.com/gluster/glusterfs) or wait for few more days to have a QA release with these fixes. Regards, Amar
This works fine for us atm with 3.1.3qa2 release. Please see if it fixes issues for you.