Description of problem: After creating a replica volume, the self-heal daemon starts properly on both nodes. However, if gluster is restarted on one of the nodes, the SHD does not start back up on that node. Further, this behaviour worked properly in 3.3, so this appears to possibly be a regression. Version-Release number of selected component (if applicable): 3.4 alpha3, as well as git clone from fc39ee2ea3a22704ebacd0607cf6fd4eae9ec66a How reproducible: 1. Setup glusterfs on two nodes. 2. Start glusterd with the provided init script (extras/init.d/glusterd-Redhat) 3. Create a replica volume between the two nodes (e.g. gluster volume create ssd0 replica 2 transport tcp node{1,3}:/mnt/raid) 4. Start the volume. After this, all services should be online: [root@node1 glusterfs]# gluster volume status; gluster volume info ssd0 Status of volume: ssd0 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick node1:/mnt/raid 49152 Y 24185 Brick node3:/mnt/raid 49152 Y 5771 NFS Server on localhost 2049 Y 24197 Self-heal Daemon on localhost N/A Y 24201 NFS Server on 0d8f2efc-4dc3-4446-bf7b-d6ec76c6038b 2049 Y 5783 Self-heal Daemon on 0d8f2efc-4dc3-4446-bf7b-d6ec76c6038 b N/A Y 5787 5. Stop glusterd on one node, and start it back up 6. The self heal daemon will not be running. Actual results: The self-heal daemon does not start up: [root@node1 glusterfs]# /etc/init.d/glusterd restart Starting glusterd: [ OK ] [root@node1 glusterfs]# gluster volume status Status of volume: ssd0 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick node1:/mnt/raid 49152 Y 24185 Brick node3:/mnt/raid 49152 Y 5771 NFS Server on localhost N/A N N/A Self-heal Daemon on localhost N/A N N/A NFS Server on 0d8f2efc-4dc3-4446-bf7b-d6ec76c6038b 2049 Y 5783 Self-heal Daemon on 0d8f2efc-4dc3-4446-bf7b-d6ec76c6038 b N/A Y 5787 /usr/local/var/log/glustershd.log: [2013-05-06 16:19:02.366355] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available) [2013-05-06 16:19:05.370388] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available) [2013-05-06 16:19:08.374314] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available) [2013-05-06 16:19:11.378341] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available) [2013-05-06 16:19:14.382406] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available) [2013-05-06 16:19:17.386442] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available) [2013-05-06 16:19:20.390358] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available) [2013-05-06 16:19:23.394397] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available) [2013-05-06 16:19:26.398454] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available) [2013-05-06 16:19:29.402534] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available) [2013-05-06 16:19:32.406468] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available) [2013-05-06 16:19:35.410532] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available) [2013-05-06 16:19:38.414620] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available) [2013-05-06 16:19:41.419477] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-ssd0-client-0: changing port to 49152 (from 0) [2013-05-06 16:19:41.419525] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available) [2013-05-06 16:19:41.423595] I [client-handshake.c:1658:select_server_supported_programs] 0-ssd0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2013-05-06 16:19:41.424026] I [client-handshake.c:1456:client_setvolume_cbk] 0-ssd0-client-0: Connected to 10.0.0.1:49152, attached to remote volume '/mnt/raid'. [2013-05-06 16:19:41.424043] I [client-handshake.c:1468:client_setvolume_cbk] 0-ssd0-client-0: Server and Client lk-version numbers are not same, reopening the fds [2013-05-06 16:19:41.424588] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ssd0-client-0: Server lk version = 1 [2013-05-06 16:19:47.555621] W [socket.c:515:__socket_rwv] 0-glusterfs: readv on 127.0.0.1:24007 failed (No data available) [2013-05-06 16:19:47.555675] W [socket.c:1963:__socket_proto_state_machine] 0-glusterfs: reading from socket failed. Error (No data available), peer (127.0.0.1:24007) [2013-05-06 16:19:58.425741] I [glusterfsd-mgmt.c:1544:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing It also appears that the processes don't startup either: [root@node3 ~]# ps aux | grep gluster root 19794 5.2 0.0 261260 14072 ? Ssl 12:21 0:00 /usr/local/sbin/glusterd --pid-file=/var/run/glusterd.pid --log-level DEBUG root 19819 0.0 0.0 408160 16792 ? Ssl 12:21 0:00 /usr/local/sbin/glusterfsd -s node3 --volfile-id ssd0.node3.mnt-raid -p /var/lib/glusterd/vols/ssd0/run/node3-mnt-raid.pid -S /var/run/533e299ff1f7017262a9657a16c819ca.socket --brick-name /mnt/raid -l /usr/local/var/log/glusterfs/bricks/mnt-raid.log --xlator-option *-posix.glusterd-uuid=dc1a5ac1-e502-42d8-be71-256aa771f7e3 --brick-port 49152 --xlator-option ssd0-server.listen-port=49152 Expected results: The SHD should start on the node so that the volume can be repaired on that node.
hey! I've got the same issue on Ubuntu 12.10 with gluster 3.4 beta. To workaround this I've just tried to create another volume. After starting it - all daemons (nfs and shd) on all volumes came back to life. Hope it will be solved soon.
oh. and I wonder if simple stop/start volume won't solve that too.
It does look like starting/stopping the volume restarts the daemons properly, however, we were looking to use replication so that we can keep access to the data if one node fails. So, starting/stopping the volume isn't really an option for us.
(In reply to comment #3) > It does look like starting/stopping the volume restarts the daemons > properly, however, we were looking to use replication so that we can keep > access to the data if one node fails. So, starting/stopping the volume isn't > really an option for us. the other thing is that what I saw was that this error is only in the info. what I mean is that healing works - or at least I can tell you how it worked for me: even though self healing daemon and nfs were stopped according to gluster volume status - when i turned off one server, added files on the first and started second after entering newly directory they replicated (i didn't wait 10 minutes). I don't know if automatic self healing will work (you neeed to test it). but you can test it with mounting the volume with nfs on client side - if it works, that means that only info is corrupted.
The problem being observed is specific to a 2 node setup[1]. This is because of the following, 1) glusterd has server-side quorum added in 3.4 release. For further details on how it works, see: http://www.gluster.org/community/documentation/index.php/Features/Server-quorum 2) Also recently changes that deferred restarting of gluster daemons such as glustershd and gluster-nfs. See comments in http://review.gluster.org/#/c/4835/4/xlators/mgmt/glusterd/src/glusterd-sm.c for an explanation for why spawning of daemons were deferred. As a consequence of 1), the 2 node setup does not meet quorum. The implementation of quorum in 3.4-alpha3 is such that it evaluates if quorum is upheld,, even when not enabled, and quorum ratio is set to >50% by default. The code responsible for 2) relied on this behaviour to perform the deferred spawning of internal daemons. Since in a 2 node setup, when one node is down, quorum is not met, we fail to spawn gluster-nfs and glustershd processes. The following two patches have already been sent to address the above mentioned issues, which manifest as this bug. These should fix this bug. - http://review.gluster.com/#/c/4973/ - Makes spawning independent of quorum implementation - http://review.gluster.com/#/c/4954/2 - Makes quorum implementation evaluate quorum only if it were explicitly enabled [1] - When there are more than 2 nodes in the cluster, we would need more nodes to be down to get caught in >50 % default quorum setting.
REVIEW: http://review.gluster.org/4973 (glusterd: Start bricks on glusterd startup, only once) posted (#2) for review on master by Krishnan Parthasarathi (kparthas)
I did some testing this morning after merging the patches that you mentioned into the source from github, and it seems like the daemons are coming up/reporting their status properly. Thank you for the detailed response, as I wasn't aware of the quorum addition. I'm assuming this will probably make it into 3.4 release eventually?
COMMIT: http://review.gluster.org/4973 committed in master by Vijay Bellur (vbellur) ------ commit f8d77623ff49ebc60686dcb17978175e861b6634 Author: Krishnan Parthasarathi <kparthas> Date: Thu May 9 18:07:59 2013 +0530 glusterd: Start bricks on glusterd startup, only once The restarting of bricks has been deffered until the cluster 'stabilizes' itself volumes' view. Since glusterd_spawn_daemons is executed everytime a peer 'joins' the cluster, it may inadvertently restart bricks that were taken offline for say, maintenance purposes. This fix avoids that. Change-Id: Ic2a0a9657eb95c82d03cf5eb893322cf55c44eba BUG: 960190 Signed-off-by: Krishnan Parthasarathi <kparthas> Reviewed-on: http://review.gluster.org/4973 Reviewed-by: Amar Tumballi <amarts> Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Vijay Bellur <vbellur>
REVIEW: http://review.gluster.org/5022 (glusterd: Start bricks on glusterd startup, only once) posted (#1) for review on release-3.4 by Krishnan Parthasarathi (kparthas)
COMMIT: http://review.gluster.org/5022 committed in release-3.4 by Vijay Bellur (vbellur) ------ commit 764bb0c1e69294a16af22c82a7e788976a0ff797 Author: Krishnan Parthasarathi <kparthas> Date: Thu May 9 18:07:59 2013 +0530 glusterd: Start bricks on glusterd startup, only once The restarting of bricks has been deffered until the cluster 'stabilizes' itself volumes' view. Since glusterd_spawn_daemons is executed everytime a peer 'joins' the cluster, it may inadvertently restart bricks that were taken offline for say, maintenance purposes. This fix avoids that. Change-Id: Ic2a0a9657eb95c82d03cf5eb893322cf55c44eba BUG: 960190 Signed-off-by: Krishnan Parthasarathi <kparthas> Reviewed-on: http://review.gluster.org/5022 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Vijay Bellur <vbellur>