1165648 – [USS]: If glusterd goes down on the originator node while snapshots are activated, after glusterd comes back up, accessing .snaps do not list any snapshots even if they are present

Bug 1165648 - [USS]: If glusterd goes down on the originator node while snapshots are activated, after glusterd comes back up, accessing .snaps do not list any snapshots even if they are present

Summary: [USS]: If glusterd goes down on the originator node while snapshots are activ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	snapshot
Sub Component:
Version:	rhgs-3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.3.0
Assignee:	Mohammed Rafi KC
QA Contact:	Anil Shah
Docs Contact:
URL:
Whiteboard:	USS
Depends On:	1448150 1463512
Blocks:	1417147
TreeView+	depends on / blocked

Reported:	2014-11-19 12:28 UTC by senaik
Modified:	2017-09-21 04:53 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.8.4-25
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-09-21 04:25:52 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2017:2774	0	normal	SHIPPED_LIVE	glusterfs bug fix and enhancement update	2017-09-21 08:16:29 UTC

Description senaik 2014-11-19 12:28:38 UTC

Description of problem:
=======================
If glusterd /node (through which volume has been mounted)goes down, while activating the snapshots , when glusterd comes back up handshake does not happen and the snapshot is still shown as 'Stopped' on that node

With the latest change that snapshots are deactivated by default and if they have to be used they should be activated specifically. This would lead to the User seeing different information on different mount points if the volume has been mounted through many servers.

Version-Release number of selected component (if applicable):
============================================================
glusterfs 3.6.0.33

How reproducible:
================
always


Steps to Reproduce:
==================
1.Create a 2x2 dist-rep volume and start it 

2.Fuse and NFS mount the volumes from 2 servers 
mount -t glusterfs 10.70.40.169:/vol3 /mnt/vol3_fuse
mount -t nfs -o vers=3,nolock 10.70.40.169:/vol3 /mnt/vol3_nfs/


mount -t glusterfs 10.70.40.170:/vol3 /mnt/vol3_fuse1
mount -t nfs -o vers=3,nolock 10.70.40.170:/vol3 /mnt/vol3_nfs1/

3.Enable USS 

4.Create data from all 4 mount points 

5.Take snapshot vol3-snap1 on the volume

6.Check snapshot info on all the nodes - it shows 'Stopped'
gluster snapshot info vol3-snap1
Snapshot                  : vol3-snap1
Snap UUID                 : 9209b34d-ba86-41dc-a8a4-05aadfd67951
Created                   : 2014-11-19 16:03:47
Snap Volumes:

	Snap Volume Name          : c8e9b7472fdd46039ea82684b178ed4a
	Origin Volume name        : vol3
	Snaps taken for vol3      : 1
	Snaps available for vol3  : 255
	Status                    : Stopped

7.Stop glusterd on snapshot14(10.70.40.170)

8.Activate the snapshot
gluster snapshot activate vol3-snap1
Snapshot activate: vol3-snap1: Snap activated successfully

9.Start glusterd on snapshot14(10.70.40.170)

10.Check snapshot info on all 4 nodes- it shows 'started' on all nodes except 
snapshot14(10.70.40.170)

11.cd to .snaps 
Volume mounted through 10.70.40.169:
====================================
fuse mount:
~~~~~~~~~~
[root@dhcp-0-97 vol3_fuse]# cd .snaps
[root@dhcp-0-97 .snaps]# ll
total 0
drwxr-xr-x. 7 root root 158 Nov 19 16:03 vol3-snap1

[root@dhcp-0-97 .snaps]# cd vol3-snap1/
[root@dhcp-0-97 vol3-snap1]# ll
total 0
drwxr-xr-x. 2 root root 6 Nov 19 16:01 d1_fuse
drwxr-xr-x. 2 root root 6 Nov 19 16:03 d1_fuse1
drwxr-xr-x. 2 root root 6 Nov 19 16:01 d1_nfs
drwxr-xr-x. 2 root root 6 Nov 19 16:03 d1_nfs1
[root@dhcp-0-97 vol3-snap1]# pwd
/mnt/vol3_fuse/.snaps/vol3-snap1

nfs mount:
~~~~~~~~~
[root@dhcp-0-97 vol3_nfs]# cd .snaps
[root@dhcp-0-97 .snaps]# ll
total 0
drwxr-xr-x. 7 root root 158 Nov 19 16:03 vol3-snap1
[root@dhcp-0-97 .snaps]# cd vol3-snap1/
[root@dhcp-0-97 vol3-snap1]# ll
total 0
drwxr-xr-x. 2 root root 12 Nov 19 16:01 d1_fuse
drwxr-xr-x. 2 root root 12 Nov 19 16:03 d1_fuse1
drwxr-xr-x. 2 root root 12 Nov 19 16:01 d1_nfs
drwxr-xr-x. 2 root root 12 Nov 19 16:03 d1_nfs1
[root@dhcp-0-97 vol3-snap1]# pwd
/mnt/vol3_nfs/.snaps/vol3-snap1


Volume mounted through 10.70.40.170
(where glusterd was down while snap activate was done)
======================================================
fuse mount:
~~~~~~~~~~~
[root@dhcp-0-97 .snaps]# pwd
/mnt/vol3_fuse1/.snaps
[root@dhcp-0-97 .snaps]# ll
total 0

nfs mount:
~~~~~~~~~~
[root@dhcp-0-97 vol3_nfs1]# cd .snaps
[root@dhcp-0-97 .snaps]# ll
total 0
[root@dhcp-0-97 .snaps]# pwd
/mnt/vol3_nfs1/.snaps


Actual results:
==============
If glusterd goes down on the node (through which volume has been mounted)while activating the snapshots , and when glusterd comes back up handshake does not happen and the snapshot is still shown as 'Stopped' on that node .
Accessing .snaps from the mount where glusterd went down shows no snapshots listed under .snaps


Expected results:
================
After glusterd comes back up on the node, handshake should happen and snapshot should be started on all nodes and accessing .snaps from all mounts should show same information.


Additional info:

Comment 5 Mohammed Rafi KC 2015-03-27 10:05:05 UTC

duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1122064.

upstream patch : http://review.gluster.org/#/c/9664/

Comment 12 Anil Shah 2017-05-03 09:18:10 UTC

not seeing snapshots in .snaps directory, where glusterd was down when snap activate was done.

(where glusterd was down while snap activate was done)
[root@dhcp46-157 fuse]# cd .snaps
[root@dhcp46-157 .snaps]# pwd
/mnt/fuse/.snaps
[root@dhcp46-157 .snaps]# ll
total 0

Able to see snaps from other client:
[root@dhcp47-13 .snaps]# pwd
/mnt/fuse/.snaps
[root@dhcp47-13 .snaps]# ls 
snap2

Snapshot info output from node which was down when snapshot was activated
[root@rhs-arch-srv2 ~]# gluster snapshot info snap2
Snapshot                  : snap2
Snap UUID                 : 1421b902-bda3-4604-aa6b-9d2ef52832a1
Created                   : 2017-05-03 08:15:20
Snap Volumes:

	Snap Volume Name          : 1cd3bcaeea0447418f0b1ad80c3ec3b6
	Origin Volume name        : vol1
	Snaps taken for vol1      : 2
	Snaps available for vol1  : 254
	Status                    : Started


Able to reproduce this bug. Hence marking this bug as failed QA

Comment 13 Mohammed Rafi KC 2017-05-04 15:59:37 UTC

upstream master patch : https://review.gluster.org/17178

Comment 14 Atin Mukherjee 2017-05-08 12:06:11 UTC

downstream patch : https://code.engineering.redhat.com/gerrit/#/c/105517

Comment 17 Anil Shah 2017-06-20 07:06:52 UTC

[root@rhs-arch-srv2 core]# gluster snapshot info snap0
Snapshot                  : snap0
Snap UUID                 : 80f0b29d-b7da-419b-b595-8d216f1ffafc
Created                   : 2017-06-20 06:58:05
Snap Volumes:

	Snap Volume Name          : 6ce57c9284d34f828a1927c9aaeb14db
	Origin Volume name        : newvolume
	Snaps taken for newvolume      : 1
	Snaps available for newvolume  : 255
	Status                    : Stopped
 
[root@rhs-arch-srv2 core]# service glusterd stop


[root@rhs-arch-srv1 core]# gluster snapshot activate snap0
Snapshot activate: snap0: Snap activated successfully


[root@rhs-arch-srv2 core]# service glusterd start
Redirecting to /bin/systemctl start glusterd.service


[root@rhs-arch-srv2 core]# gluster snapshot info snap0
Snapshot                  : snap0
Snap UUID                 : 80f0b29d-b7da-419b-b595-8d216f1ffafc
Created                   : 2017-06-20 06:58:05
Snap Volumes:

	Snap Volume Name          : 6ce57c9284d34f828a1927c9aaeb14db
	Origin Volume name        : newvolume
	Snaps taken for newvolume      : 1
	Snaps available for newvolume  : 255
	Status                    : Started


bug verified on build glusterfs-3.8.4-28.el7rhgs.x86_64

Comment 19 errata-xmlrpc 2017-09-21 04:25:52 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Comment 20 errata-xmlrpc 2017-09-21 04:53:56 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2774

Note You need to log in before you can comment on or make changes to this bug.