Bug 1107649
Summary: | glusterd fails to spawn brick , nfs and self-heald processes | |||
---|---|---|---|---|
Product: | [Community] GlusterFS | Reporter: | Ravishankar N <ravishankar> | |
Component: | glusterd | Assignee: | krishnan parthasarathi <kparthas> | |
Status: | CLOSED WONTFIX | QA Contact: | ||
Severity: | unspecified | Docs Contact: | ||
Priority: | unspecified | |||
Version: | mainline | CC: | alexeyzilber, bkolasinski, gluster-bugs, nsathyan | |
Target Milestone: | --- | |||
Target Release: | --- | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1112515 (view as bug list) | Environment: | ||
Last Closed: | 2014-07-14 10:30:06 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1112515 |
Description
Ravishankar N
2014-06-10 11:48:01 UTC
Issue: ---------------------------------- glusterd_friend_sm () { quorum_action = _gf_false; while (!list_empty (&gd_friend_sm_queue)){ //blah blah quorum_action = _gf_true; } if (quorum_action) glusterd_spawn_daemons } ---------------------------------- As long as node 2 is down gd_friend_sm_queue is empty and hence glusterd_spawn_daemons never gets called. While discussing with KP, I was given to understand that the above code was intentionally written so that each glusterd does not start the glusterfsd processes until it's friends are also up and running and are in sync. Need to come up with a solution which covers the use case given in the bug description. A workaround is to 'gluster volume start <volname> force` on the node which is up. REVIEW: http://review.gluster.org/8034 (glusterd: spawn daemons/processes when peer count less than 2) posted (#1) for review on master by Ravishankar N (ravishankar) Have a user configurable timeout. In fact, that was what I was expecting, but after waiting for a long time I realized that wasn't the way it worked. I think a timeout value is a good compromise. Maybe something like 5 minutes as a default? Closing this as currenly there is no way of determining if the one node that came up after both nodes went down is the pristine one after all. |