1107649 – glusterd fails to spawn brick , nfs and self-heald processes

Bug 1107649 - glusterd fails to spawn brick , nfs and self-heald processes

Summary: glusterd fails to spawn brick , nfs and self-heald processes

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	krishnan parthasarathi
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1112515
TreeView+	depends on / blocked

Reported:	2014-06-10 11:48 UTC by Ravishankar N
Modified:	2015-11-03 23:06 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Clones:	1112515 (view as bug list)
Environment:
Last Closed:	2014-07-14 10:30:06 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ravishankar N 2014-06-10 11:48:01 UTC

Description of problem:
Problem reported in detail by Brent Kolasinski here:
http://supercolony.gluster.org/pipermail/gluster-users/2014-June/040541.html

Version-Release number of selected component (if applicable):
glusterfs-3.5 and master branch

How reproducible:
Always

Steps to Reproduce:
1. Create a 1x2 replica volume using 2 nodes
2. NFS mount the volume from a client
3. `pkill gluster` on node 2 (node 1 still serves the files from the volume)
4. `pkill gluster` on node 1
5.  restart glusterd on node 1
6. Try I/O from the NFS mount.

Actual results:
I/O fails because glusterd fails to start nfs, shd and sometimes the brick processes, as seen from `gluster volume status`

Expected results:
glusterd should spawn them.

Additional info:

Comment 1 Ravishankar N 2014-06-10 15:41:08 UTC

Issue:
----------------------------------
glusterd_friend_sm ()
{
  quorum_action = _gf_false;

  while (!list_empty (&gd_friend_sm_queue)){
              //blah blah
               quorum_action = _gf_true;
  }
  if (quorum_action) 
                glusterd_spawn_daemons
}
----------------------------------

As long as node 2 is down gd_friend_sm_queue is empty and hence glusterd_spawn_daemons never gets called.

While discussing with KP, I was given to understand that the above code was intentionally written so that each glusterd does not start the glusterfsd processes until it's friends are also up and running and are in sync. Need to come up with a solution which covers the use case given in the bug description. A workaround is to 'gluster volume start <volname> force` on the node which is up.

Comment 2 Anand Avati 2014-06-11 10:28:40 UTC

REVIEW: http://review.gluster.org/8034 (glusterd: spawn daemons/processes when peer count less than 2) posted (#1) for review on master by Ravishankar N (ravishankar)

Comment 3 Alexey Zilber 2014-06-17 10:28:15 UTC

Have a user configurable timeout.  In fact, that was what I was expecting, but after waiting for a long time I realized that wasn't the way it worked.  I think a timeout value is a good compromise.  Maybe something like 5 minutes as a default?

Comment 4 Ravishankar N 2014-07-14 10:30:06 UTC

Closing this as currenly there is no way of determining if the one node that came up after both nodes went down is the pristine one after all.

Note You need to log in before you can comment on or make changes to this bug.