Bug 1105543

Summary: [SNAPSHOT]: glusterd hangs when a node with stale snap entries is attached to the cluster
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: senaik
Component: snapshotAssignee: Avra Sengupta <asengupt>
Status: CLOSED DEFERRED QA Contact: Anoop <annair>
Severity: medium Docs Contact:
Priority: high    
Version: rhgs-3.0CC: asengupt, asriram, jbyers, josferna, nsathyan, rhs-bugs, rjoseph, smohan, storage-qa-internal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: SNAPSHOT
Fixed In Version: Doc Type: Known Issue
Doc Text:
When a node with stale snap entry is attached to the cluster, the stale entries are propagated throughout the cluster and stale snapshots which are not present are displayed. Workaround: Do not attach a peer with stale snap entries
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-29 13:19:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1087818    

Description senaik 2014-06-06 11:30:52 UTC
Description of problem:
=======================
glusterd hangs when a node with stale snap entries are attached to the cluster


Version-Release number of selected component (if applicable):
=============================================================

glusterfs 3.6.0.13

How reproducible:
================
1/1

Steps to Reproduce:
===================

4 node cluster : Node1 Node2 Node3 Node4 

Attach a node(Node5) to the cluster while snapshot creation (snap_vol0_1.. snap_vol0_n) is in progress.
Check snapshots on the newly added peer. 


On node 4 :
----------
After snapshots are completed, detach the node and attach it again while starting snapshot creation again(snap1.. snapn)


Now probe is successful, but gluster peer status hangs as it is trying to start bricks of the stale snap entries (snap_vol0_1) which is not present on Node4.

 E [glusterd-handshake.c:85:get_snap_volname_and_volinfo] 0-management: Failed to fetch s
nap snap_vol0_1
[2014-06-06 10:49:52.132065] E [glusterd-handshake.c:196:build_volfile_path] 0-management: Failed to get snap volinfo
 from path (/snaps/snap_vol0_1/a94839bbaf994733a5b591bb59731d9c.snapshot16.lab.eng.blr.redhat.com.var-run-gluster-sna
ps-a94839bbaf994733a5b591bb59731d9c-brick4-b0)


Also checking gluster peer status on the newly added  peer shows the below status : 

[root@snapshot01 ~]# gluster peer status
Number of Peers: 1

Hostname: 10.70.40.172
Uuid: 2c797de6-e1e0-4a9a-8729-f5d3ce2de2a1
State: Sent and Received peer request (Connected)

Its shows the above status of one Node and the status other nodes are not listed . 

Actual results:


Expected results:


Additional info:

Comment 8 Shalaka 2014-06-26 15:28:59 UTC
Please review and signoff edited doc text.

Comment 12 Avra Sengupta 2016-01-29 13:19:21 UTC
Current Gluster architecture does not support implementation of fix. Therefore this fix is deferred till Gluterd 2.0.