Bug 1493967

Summary:	glusterd ends up with multiple uuids for the same node
Product:	[Community] GlusterFS	Reporter:	Atin Mukherjee <amukherj>
Component:	glusterd	Assignee:	Atin Mukherjee <amukherj>
Status:	CLOSED CURRENTRELEASE	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	mainline	CC:	bugs
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	glusterfs-3.13.0	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	1495162 (view as bug list)		Environment:
Last Closed:	2017-12-08 17:41:21 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1495162

Description Atin Mukherjee 2017-09-21 08:48:46 UTC

Description of problem:
In a multi node cluster, if one of the glusterd instance goes down and comes back, then there might be a race situation where glusterd needs to retrieve its uuid (glusterd_retrieve_uuid) and at the same time as part of receiving a friend handshake from other peer, glusterd iterates over the volume information recieved from remote node and checks for if a brick is local or not by calling MY_UUID which in turn calls glusterd_retrieve_uuid. This could end up in a situation where for the same node glusterd ends up generating two UUID files in /var/lib/glusterd. Following is the log snippet which confirms the above:
    
[2017-09-01 03:09:24.458030] I [glusterd.c:146:glusterd_uuid_init] 0-management: retrieved UUID: fd46a495-7e33-468f-88f6-63c815fac640  // thread 1 retrieve uuid from glusterd.info
[2017-09-01 03:09:24.458034] E [glusterd-store.c:2109:glusterd_retrieve_uuid] 0-: No previous uuid is present
//thread 2 can not retrieve uuid, because in thread1 the file pointer has already become eof.
[2017-09-01 03:09:24.458041] E [glusterd-store.c:2117:glusterd_retrieve_uuid] 0-: Returning -1
[2017-09-01 03:09:24.458076] I [glusterd.c:176:glusterd_uuid_generate_save] 0-management: generated UUID: 190bb292-a296-4125-96da-42b247511cc4
[2017-09-01 03:09:24.458129] E [store.c:367:gf_store_save_value] 0-: Able to store key: UUID,value: 190bb292-a296-4125-96da-42b247511cc4


Version-Release number of selected component (if applicable):
mainline

How reproducible:
rarely

Comment 1 Worker Ant 2017-09-21 08:51:28 UTC

REVIEW: https://review.gluster.org/18333 (glusterd: retrieve uuid under mutex lock) posted (#1) for review on master by Atin Mukherjee (amukherj)

Comment 2 Worker Ant 2017-09-22 04:45:57 UTC

REVIEW: https://review.gluster.org/18333 (glusterd: retrieve uuid under mutex lock) posted (#2) for review on master by Atin Mukherjee (amukherj)

Comment 3 Worker Ant 2017-09-25 11:10:44 UTC

COMMIT: https://review.gluster.org/18333 committed in master by Atin Mukherjee (amukherj) 
------
commit 898f0b7ce31ddf8ec02e572c5d22eff2e4205b4c
Author: Atin Mukherjee <amukherj>
Date:   Thu Sep 21 14:05:35 2017 +0530

    glusterd: retrieve uuid under mutex lock
    
    In a multi node cluster, if one of the glusterd instance goes down and
    comes back, then there might be a race situation where glusterd needs to
    retrieve its uuid (glusterd_retrieve_uuid) and at the same time as part of
    receiving a friend handshake from other peer, glusterd iterates over the volume
    information recieved from remote node and checks for if a brick is local or not
    by calling MY_UUID which in turn calls glusterd_retrieve_uuid. And the
    same applies for glusterd_store_global_info () function too. This
    could end up in a situation where for the same node glusterd ends up
    generating two UUID files in /var/lib/glusterd. Following is the log
    snippet which confirms the above:
    
    [2017-09-01 03:09:24.458030] I [glusterd.c:146:glusterd_uuid_init] 0-management: retrieved UUID: fd46a495-7e33-468f-88f6-63c815fac640  // thread 1 retrieve uuid from glusterd.info
    [2017-09-01 03:09:24.458034] E [glusterd-store.c:2109:glusterd_retrieve_uuid] 0-: No previous uuid is present
    //thread 2 can not retrieve uuid, because in thread1 the file pointer has already become eof.
    [2017-09-01 03:09:24.458041] E [glusterd-store.c:2117:glusterd_retrieve_uuid] 0-: Returning -1
    [2017-09-01 03:09:24.458076] I [glusterd.c:176:glusterd_uuid_generate_save] 0-management: generated UUID: 190bb292-a296-4125-96da-42b247511cc4
    [2017-09-01 03:09:24.458129] E [store.c:367:gf_store_save_value] 0-: Able to store key: UUID,value: 190bb292-a296-4125-96da-42b247511cc4
    
    Fix is to retrieve the uuid under mutex lock.
    
    Credits : cynthia.zhou
    
    Change-Id: Ib9a5e159c3febf2aef13aa5e38f0a51fe409dadb
    BUG: 1493967
    Signed-off-by: Atin Mukherjee <amukherj>

Comment 4 Shyamsundar 2017-12-08 17:41:21 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.13.0, please open a new bug report.

glusterfs-3.13.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/announce/2017-December/000087.html
[2] https://www.gluster.org/pipermail/gluster-users/