1126813 – [SNAPSHOT]:glusterd crash while creating snapshots on a volume , few restore operations were tried on different volume when glusterd was down on one of the nodes

Bug 1126813 - [SNAPSHOT]:glusterd crash while creating snapshots on a volume , few restore operations were tried on different volume when glusterd was down on one of the nodes

Summary: [SNAPSHOT]:glusterd crash while creating snapshots on a volume , few restore ...

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	snapshot
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	rjoseph
QA Contact:	Anoop
Docs Contact:
URL:
Whiteboard:	SNAPSHOT
Depends On:
Blocks:	1126789
TreeView+	depends on / blocked

Reported:	2014-08-05 11:09 UTC by senaik
Modified:	2016-09-17 12:58 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-01-05 09:21:43 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description senaik 2014-08-05 11:09:20 UTC

Description of problem:
======================
glusterd crashed on 2 nodes while creating a snapshot on a volume .
Couple of restore operations were tried prior to this on a different volume when glusterd was down on one of the nodes . 

Version-Release number of selected component (if applicable):
============================================================
glusterfs-3.6.0.25-1.el6rhs.x86_64

How reproducible:
================
1/1

Steps to Reproduce:
=================
1.Create a 2x2 dist-rep volume and start it .Fuse and NFS mount the volume

2.Create some files and create a snapshot . Repeat the steps until few snapshots are created 
[root@inception tmp]# gluster snapshot create snap1_vol2 vol2
snapshot create: success: Snap snap1_vol2 created successfully
[root@inception tmp]# gluster snapshot create snap2_vol2 vol2
snapshot create: success: Snap snap2_vol2 created successfully
[root@inception tmp]# gluster snapshot create snap3_vol2 vol2
snapshot create: success: Snap snap3_vol2 created successfully

3.Stop glusterd on node2 (rhs-arch-srv2)

4.Restore the volume to snap1_vol2
gluster snap restore snap1_vol2
Snapshot restore: snap1_vol2: Snap restored successfully

bring back glusterd on node2

4. Create another snap on vol2 - snap create is successful.

5.Restore the volume to snap2_vol2

gluster snap restore snap2_vol2
snapshot restore: failed: Volume (vol2) has been started. Volume needs to be stopped before restoring a snapshot.
Snapshot command failed
[root@inception tmp]# gluster v stop vol2
Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y
volume stop: vol2: success
[root@inception tmp]# gluster snap restore snap2_vol2
Snapshot restore: snap2_vol2: Snap restored successfully

6.Create another snap of the volume ,(glusterd is still down on node2)it fails with quorum not met (as expected)

gluster snapshot create snap5_vol2 vol2
snapshot create: failed: quorum is not met
Snapshot command failed

Started glusterd on node 2 and created another snapshot (from 2 different terminals of the same node)

gluster snapshot create snap6_vol2 vol2
snapshot create: failed: Pre Validation failed on rhs-arch-srv3.lab.eng.blr.redhat.com. Please check log file for details.
Pre Validation failed on rhs-arch-srv4.lab.eng.blr.redhat.com. Please check log file for details.

Further snap create on the volume failed with Pre Validation failed error 

7.Repeated step2 to step4 on a different volume (vol3)  and snapshot create failed with below error :

[root@inception tmp]# gluster snapshot create snap4_vol3 vol3
snapshot create: failed: glusterds are not in quorum
Snapshot command failed

glusterd crashed on node 3 and node4 

Snapshot restore operation was tried on vol2 when glusterd was down on node2, so there is an entry in the missed snaplist on node2. 
Another restore operation was tried on the same volume. But Node3 and Node4 still had the details of the first restore operation on vol2, but in all cases the restore operation was successful without any error message. 


bt :
===

#0  0x00007f34617dc64d in glusterd_delete_stale_volume (stale_volinfo=0xd558c0, 
    valid_volinfo=0xd3f0b0) at glusterd-utils.c:4596
#1  0x00007f34617e4c1e in glusterd_import_friend_volume (
    peer_data=0x7f346a327a60, count=<value optimized out>)
    at glusterd-utils.c:4718
#2  0x00007f34617e4d86 in glusterd_import_friend_volumes (
    peer_data=0x7f346a327a60) at glusterd-utils.c:4756
#3  0x00007f34617e4ff4 in glusterd_compare_friend_data (
    peer_data=0x7f346a327a60, status=0x7fffcaa8002c, 
    hostname=<value optimized out>) at glusterd-utils.c:5733
#4  0x00007f34617b2d3d in glusterd_ac_handle_friend_add_req (
    event=<value optimized out>, ctx=0xcfd640) at glusterd-sm.c:672
#5  0x00007f34617b3526 in glusterd_friend_sm () at glusterd-sm.c:1076
#6  0x00007f34617b1b7e in __glusterd_handle_incoming_friend_req (
    req=<value optimized out>) at glusterd-handler.c:2330
#7  0x00007f346179935f in glusterd_big_locked_handler (req=0x7f346125004c, 
    actor_fn=0x7f34617b1870 <__glusterd_handle_incoming_friend_req>)
    at glusterd-handler.c:80
#8  0x000000343ec09985 in rpcsvc_handle_rpc_call (svc=<value optimiz



Actual results:

Expected results:


Additional info:

Comment 1 senaik 2014-08-05 11:16:25 UTC

sosreports :
==========
http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/snapshots/1126813/

Comment 7 rjoseph 2016-01-05 09:21:43 UTC

Could not able to reproduce this bug. Please reopen the bug if you see the problem again.

Note You need to log in before you can comment on or make changes to this bug.