Bug 1100282

Summary: [SNAPSHOT]: cleanup of stale snap volume doesnt happen after missed restore and gluster cli's hungs
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rahul Hinduja <rhinduja>
Component: snapshotAssignee: Avra Sengupta <asengupt>
Status: CLOSED ERRATA QA Contact: Rahul Hinduja <rhinduja>
Severity: urgent Docs Contact:
Priority: high    
Version: rhgs-3.0CC: asengupt, nsathyan, rhs-bugs, senaik, ssamanta, storage-qa-internal, vagarwal
Target Milestone: ---   
Target Release: RHGS 3.0.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: SNAPSHOT
Fixed In Version: glusterfs-3.6.0.16-1.el6rhs Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1100324 (view as bug list) Environment:
Last Closed: 2014-09-22 19:39:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1108652, 1109024    
Bug Blocks: 1067342, 1100324    

Description Rahul Hinduja 2014-05-22 12:28:13 UTC
Description of problem:
=======================

When a restore happens as part of missed restored, the entry of snap volume under snaps directory after successful restore doesnt clear. Also all the gluster cli's times out


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.6.0.5-1.el6rhs.x86_64


How reproducible:
=================
1/1


Steps to Reproduce:
===================

1. Create and start the volume(2*2) from 4 nodes
2. Create a snapshot of volume
3. kill glusterd on node2
4. bring down the node4(poweroff)
5. offline the volume from node1 (gluster volume stop volume)
6. Restore the volume to snapshot taken at step2
7. Start the glusterd on node2
8. bring back the node4

Actual results:
===============

1. snap volume entry under /var/lib/glusterd/snaps/ of node2 and node4 still present
2. Any gluster command hung and eventually times out since glusterd is trying to do the handshake


Expected results:
=================
1. snap volume entry under /var/lib/glusterd/snaps/ of node2 and node4 should be deleted
2. Any gluster command should succeed

Comment 4 Avra Sengupta 2014-05-27 11:56:35 UTC
Fix at https://code.engineering.redhat.com/gerrit/25736

Comment 5 Rahul Hinduja 2014-06-12 12:05:07 UTC
During verification hit bz 1108652, marking it dependent for verification

Comment 6 senaik 2014-06-17 07:01:39 UTC
Version : glusterfs 3.6.0.17
=======

Repeated the steps as mentioned in 'Steps to Reproduce' missed snaps list shows the entry with Snap UUID with 3:2 ratio .

Marking the bug 'Verified' 


Node where glusterd was down :
============================
cat /var/lib/glusterd/snaps/missed_snaps_list |grep cbe565c107bf4a589420730504d1f9f8
ad1244a5-7c43-4812-808d-cae70399fecf:c8c90378-4ec4-41ff-83c9-4d731916c064=cbe565c107bf4a589420730504d1f9f8:2:/var/run/gluster/snaps/cbe565c107bf4a589420730504d1f9f8/brick2/b1:3:2
9f6160b0-a4db-47a5-ab8f-b7f0a328eadc:c8c90378-4ec4-41ff-83c9-4d731916c064=cbe565c107bf4a589420730504d1f9f8:4:/var/run/gluster/snaps/cbe565c107bf4a589420730504d1f9f8/brick4/b1:3:2


Node which was powered off and brought back up :
==============================================
cat /var/lib/glusterd/snaps/missed_snaps_list |grep cbe565c107bf4a589420730504d1f9f8
ad1244a5-7c43-4812-808d-cae70399fecf:c8c90378-4ec4-41ff-83c9-4d731916c064=cbe565c107bf4a589420730504d1f9f8:2:/var/run/gluster/snaps/cbe565c107bf4a589420730504d1f9f8/brick2/b1:3:2
9f6160b0-a4db-47a5-ab8f-b7f0a328eadc:c8c90378-4ec4-41ff-83c9-4d731916c064=cbe565c107bf4a589420730504d1f9f8:4:/var/run/gluster/snaps/cbe565c107bf4a589420730504d1f9f8/brick4/b1:3:2

Comment 8 errata-xmlrpc 2014-09-22 19:39:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html