Bug 960190 - Gluster SHD and NFS do not start
Summary: Gluster SHD and NFS do not start
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: glusterd
Version: pre-release
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: krishnan parthasarathi
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 962431
TreeView+ depends on / blocked
 
Reported: 2013-05-06 16:24 UTC by pjameson
Modified: 2015-11-03 23:05 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-24 17:15:31 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description pjameson 2013-05-06 16:24:39 UTC
Description of problem:
After creating a replica volume, the self-heal daemon starts properly on both nodes. However, if gluster is restarted on one of the nodes, the SHD does not start back up on that node. Further, this behaviour worked properly in 3.3, so this appears to possibly be a regression.

Version-Release number of selected component (if applicable): 3.4 alpha3, as well as git clone from fc39ee2ea3a22704ebacd0607cf6fd4eae9ec66a


How reproducible:
1. Setup glusterfs on two nodes. 
2. Start glusterd with the provided init script (extras/init.d/glusterd-Redhat)
3. Create a replica volume between the two nodes (e.g. gluster volume create ssd0 replica 2 transport tcp node{1,3}:/mnt/raid)
4. Start the volume. After this, all services should be online:

[root@node1 glusterfs]# gluster volume status; gluster volume info ssd0
Status of volume: ssd0
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick node1:/mnt/raid					49152	Y	24185
Brick node3:/mnt/raid					49152	Y	5771
NFS Server on localhost					2049	Y	24197
Self-heal Daemon on localhost				N/A	Y	24201
NFS Server on 0d8f2efc-4dc3-4446-bf7b-d6ec76c6038b	2049	Y	5783
Self-heal Daemon on 0d8f2efc-4dc3-4446-bf7b-d6ec76c6038
b							N/A	Y	5787

5. Stop glusterd on one node, and start it back up
6. The self heal daemon will not be running.
  
Actual results:

The self-heal daemon does not start up:
[root@node1 glusterfs]# /etc/init.d/glusterd restart
Starting glusterd:                                         [  OK  ]
[root@node1 glusterfs]# gluster volume status
Status of volume: ssd0
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick node1:/mnt/raid					49152	Y	24185
Brick node3:/mnt/raid					49152	Y	5771
NFS Server on localhost					N/A	N	N/A
Self-heal Daemon on localhost				N/A	N	N/A
NFS Server on 0d8f2efc-4dc3-4446-bf7b-d6ec76c6038b	2049	Y	5783
Self-heal Daemon on 0d8f2efc-4dc3-4446-bf7b-d6ec76c6038
b							N/A	Y	5787

/usr/local/var/log/glustershd.log:

[2013-05-06 16:19:02.366355] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available)
[2013-05-06 16:19:05.370388] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available)
[2013-05-06 16:19:08.374314] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available)
[2013-05-06 16:19:11.378341] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available)
[2013-05-06 16:19:14.382406] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available)
[2013-05-06 16:19:17.386442] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available)
[2013-05-06 16:19:20.390358] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available)
[2013-05-06 16:19:23.394397] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available)
[2013-05-06 16:19:26.398454] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available)
[2013-05-06 16:19:29.402534] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available)
[2013-05-06 16:19:32.406468] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available)
[2013-05-06 16:19:35.410532] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available)
[2013-05-06 16:19:38.414620] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available)
[2013-05-06 16:19:41.419477] I [rpc-clnt.c:1648:rpc_clnt_reconfig] 0-ssd0-client-0: changing port to 49152 (from 0)
[2013-05-06 16:19:41.419525] W [socket.c:515:__socket_rwv] 0-ssd0-client-0: readv on 10.0.0.1:24007 failed (No data available)
[2013-05-06 16:19:41.423595] I [client-handshake.c:1658:select_server_supported_programs] 0-ssd0-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330)
[2013-05-06 16:19:41.424026] I [client-handshake.c:1456:client_setvolume_cbk] 0-ssd0-client-0: Connected to 10.0.0.1:49152, attached to remote volume '/mnt/raid'.
[2013-05-06 16:19:41.424043] I [client-handshake.c:1468:client_setvolume_cbk] 0-ssd0-client-0: Server and Client lk-version numbers are not same, reopening the fds
[2013-05-06 16:19:41.424588] I [client-handshake.c:450:client_set_lk_version_cbk] 0-ssd0-client-0: Server lk version = 1
[2013-05-06 16:19:47.555621] W [socket.c:515:__socket_rwv] 0-glusterfs: readv on 127.0.0.1:24007 failed (No data available)
[2013-05-06 16:19:47.555675] W [socket.c:1963:__socket_proto_state_machine] 0-glusterfs: reading from socket failed. Error (No data available), peer (127.0.0.1:24007)
[2013-05-06 16:19:58.425741] I [glusterfsd-mgmt.c:1544:mgmt_getspec_cbk] 0-glusterfs: No change in volfile, continuing

It also appears that the processes don't startup either:

[root@node3 ~]# ps aux | grep gluster
root     19794  5.2  0.0 261260 14072 ?        Ssl  12:21   0:00 /usr/local/sbin/glusterd --pid-file=/var/run/glusterd.pid --log-level DEBUG
root     19819  0.0  0.0 408160 16792 ?        Ssl  12:21   0:00 /usr/local/sbin/glusterfsd -s node3 --volfile-id ssd0.node3.mnt-raid -p /var/lib/glusterd/vols/ssd0/run/node3-mnt-raid.pid -S /var/run/533e299ff1f7017262a9657a16c819ca.socket --brick-name /mnt/raid -l /usr/local/var/log/glusterfs/bricks/mnt-raid.log --xlator-option *-posix.glusterd-uuid=dc1a5ac1-e502-42d8-be71-256aa771f7e3 --brick-port 49152 --xlator-option ssd0-server.listen-port=49152



Expected results:
The SHD should start on the node so that the volume can be repaired on that node.

Comment 1 piotrektt 2013-05-10 12:57:35 UTC
hey! I've got the same issue on Ubuntu 12.10 with gluster 3.4 beta. 
To workaround this I've just tried to create another volume. After starting it - all daemons (nfs and shd) on all volumes came back to life.

Hope it will be solved soon.

Comment 2 piotrektt 2013-05-10 13:02:26 UTC
oh. and I wonder if simple stop/start volume won't solve that too.

Comment 3 pjameson 2013-05-10 18:16:57 UTC
It does look like starting/stopping the volume restarts the daemons properly, however, we were looking to use replication so that we can keep access to the data if one node fails. So, starting/stopping the volume isn't really an option for us.

Comment 4 piotrektt 2013-05-10 19:51:57 UTC
(In reply to comment #3)
> It does look like starting/stopping the volume restarts the daemons
> properly, however, we were looking to use replication so that we can keep
> access to the data if one node fails. So, starting/stopping the volume isn't
> really an option for us.

the other thing is that what I saw was that this error is only in the info. what I mean is that healing works - or at least I can tell you how it worked for me: 
even though self healing daemon and nfs were stopped according to gluster volume status - when i turned off one server, added files on the first and started second after entering newly directory they replicated (i didn't wait 10 minutes). I don't know if automatic self healing will work (you neeed to test it). but you can test it with mounting the volume with nfs on client side - if it works, that means that only info is corrupted.

Comment 5 krishnan parthasarathi 2013-05-14 19:12:44 UTC
The problem being observed is specific to a 2 node setup[1]. This is because of the following,
1) glusterd has server-side quorum added in 3.4 release. For further details on how it works, see: http://www.gluster.org/community/documentation/index.php/Features/Server-quorum

2) Also recently changes that deferred restarting of gluster daemons such as glustershd and gluster-nfs. See comments in http://review.gluster.org/#/c/4835/4/xlators/mgmt/glusterd/src/glusterd-sm.c for an explanation for why spawning of daemons were deferred.

As a consequence of 1), the 2 node setup does not meet quorum. The implementation of quorum in 3.4-alpha3 is such that it evaluates if quorum is upheld,, even when not enabled, and quorum ratio is set to >50% by default.
The code responsible for 2) relied on this behaviour to perform the deferred spawning of internal daemons. Since in a 2 node setup, when one node is down, quorum is not met, we fail to spawn gluster-nfs and glustershd processes.

The following two patches have already been sent to address the above mentioned issues, which manifest as this bug. These should fix this bug.
- http://review.gluster.com/#/c/4973/ - Makes spawning independent of quorum    implementation
- http://review.gluster.com/#/c/4954/2 - Makes quorum implementation evaluate quorum only if it were explicitly enabled



[1] - When there are more than 2 nodes in the cluster, we would need more nodes to be down to get caught in >50 % default quorum setting.

Comment 6 Anand Avati 2013-05-14 19:14:30 UTC
REVIEW: http://review.gluster.org/4973 (glusterd: Start bricks on glusterd startup, only once) posted (#2) for review on master by Krishnan Parthasarathi (kparthas)

Comment 7 pjameson 2013-05-15 16:56:40 UTC
I did some testing this morning after merging the patches that you mentioned into the source from github, and it seems like the daemons are coming up/reporting their status properly. Thank you for the detailed response, as I wasn't aware of the quorum addition. 
I'm assuming this will probably make it into 3.4 release eventually?

Comment 8 Anand Avati 2013-05-16 05:23:05 UTC
COMMIT: http://review.gluster.org/4973 committed in master by Vijay Bellur (vbellur) 
------
commit f8d77623ff49ebc60686dcb17978175e861b6634
Author: Krishnan Parthasarathi <kparthas>
Date:   Thu May 9 18:07:59 2013 +0530

    glusterd: Start bricks on glusterd startup, only once
    
    The restarting of bricks has been deffered until the cluster 'stabilizes'
    itself volumes' view. Since glusterd_spawn_daemons is executed everytime
    a peer 'joins' the cluster, it may inadvertently restart bricks that
    were taken offline for say, maintenance purposes. This fix avoids that.
    
    Change-Id: Ic2a0a9657eb95c82d03cf5eb893322cf55c44eba
    BUG: 960190
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/4973
    Reviewed-by: Amar Tumballi <amarts>
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 9 Anand Avati 2013-05-16 12:27:19 UTC
REVIEW: http://review.gluster.org/5022 (glusterd: Start bricks on glusterd startup, only once) posted (#1) for review on release-3.4 by Krishnan Parthasarathi (kparthas)

Comment 10 Anand Avati 2013-05-17 04:41:45 UTC
COMMIT: http://review.gluster.org/5022 committed in release-3.4 by Vijay Bellur (vbellur) 
------
commit 764bb0c1e69294a16af22c82a7e788976a0ff797
Author: Krishnan Parthasarathi <kparthas>
Date:   Thu May 9 18:07:59 2013 +0530

    glusterd: Start bricks on glusterd startup, only once
    
    The restarting of bricks has been deffered until the cluster 'stabilizes'
    itself volumes' view. Since glusterd_spawn_daemons is executed everytime
    a peer 'joins' the cluster, it may inadvertently restart bricks that
    were taken offline for say, maintenance purposes. This fix avoids that.
    
    Change-Id: Ic2a0a9657eb95c82d03cf5eb893322cf55c44eba
    BUG: 960190
    Signed-off-by: Krishnan Parthasarathi <kparthas>
    Reviewed-on: http://review.gluster.org/5022
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>


Note You need to log in before you can comment on or make changes to this bug.