Bug 1100211

Summary:	[SNAPSHOT] : A brick volume-id is changed after reboot of the brick's node
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	spandura
Component:	snapshot	Assignee:	rjoseph
Status:	CLOSED ERRATA	QA Contact:	Rahul Hinduja <rhinduja>
Severity:	high	Docs Contact:
Priority:	high
Version:	rhgs-3.0	CC:	nsathyan, rhs-bugs, spandura, ssamanta, storage-qa-internal
Target Milestone:	---
Target Release:	RHGS 3.0.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:	SNAPSHOT
Fixed In Version:	glusterfs-3.6.0.17-1.el6rhs	Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:
Clones:	1105484 (view as bug list)		Environment:
Last Closed:	2014-09-22 19:39:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1105484

Description spandura 2014-05-22 09:33:44 UTC

Description of problem:
==========================
In a replicate volume with snapshots taken on the volume , when one of the storage node is rebooted , the volume-id of the brick on that storage-node is changed. 

The changed volume-id on the brick matches with the first snapshot volume id. 

Version-Release number of selected component (if applicable):
==============================================================
glusterfs-server-3.6.0.5-1.el6rhs.x86_64
root@fan [May-22-2014-14:56:17] >gluster --version
glusterfs 3.6.0.5 built on May 20 2014 10:52:06


How reproducible:
=======================
Tried once

Steps to Reproduce:
=======================
1. Create 1 x 2 replicate volume. Start the volume.

2. Create a snapshot of the volume snap_0. 

3. Create a fuse mount and start "dd" on a 20GB file.

4. While dd is in progress and the size of the file is 1GB, create snapshot snap_1.

5. continue dd from mount. While dd is in progress and the size of the file is 3.5GB , create snapshot snap_2. 

6. while dd is still in progress reboot node1. 

7. From the node2, set self-heal-daemon to off. 


Actual results:
=================
After the reboot, the volume-id of the brick1 is changed hence the brick process  of the volume is not started. 

root@fan [May-22-2014-12:38:06] >gluster v info
 
Volume Name: vol_rep
Type: Replicate
Volume ID: 68a62e9c-073b-4343-b10b-a5d934aac6f9
Status: Created
Snap Volume: no
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: fan:/rhs/bricks/b1
Brick2: mia:/rhs/bricks/b2
root@fan [May-22-2014-12:38:10] >

getfattr on the brick before reboot:
=======================================
root@fan [May-22-2014-12:40:53] >getfattr -d -e hex -m . /rhs/bricks/b1/
getfattr: Removing leading '/' from absolute path names
# file: rhs/bricks/b1/
trusted.afr.vol_rep-client-0=0x000000000000000000000000
trusted.afr.vol_rep-client-1=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x68a62e9c073b4343b10ba5d934aac6f9

getfattr on the brick after reboot:
=========================================
root@fan [May-22-2014-13:15:57] >getfattr -d -e hex -m . /rhs/bricks/b1/
getfattr: Removing leading '/' from absolute path names
# file: rhs/bricks/b1/
trusted.afr.vol_rep-client-0=0x000000000000000000000000
trusted.afr.vol_rep-client-1=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x129fb89816a74aa78a036282c771216d

volume info after reboot:-
=============================
root@fan [May-22-2014-14:53:21] >gluster v info
 
Volume Name: vol_rep
Type: Replicate
Volume ID: 68a62e9c-073b-4343-b10b-a5d934aac6f9
Status: Started
Snap Volume: no
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: fan:/rhs/bricks/b1
Brick2: mia:/rhs/bricks/b2
Options Reconfigured:
features.barrier: disable
cluster.self-heal-daemon: off
root@fan [May-22-2014-14:53:23] >

volume status after reboot:
============================
root@fan [May-22-2014-14:53:58] >gluster v status
Status of volume: vol_rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick fan:/rhs/bricks/b1				N/A	N	N/A
Brick mia:/rhs/bricks/b2				49152	Y	2698
NFS Server on localhost					2049	Y	1625
NFS Server on mia					2049	Y	2710
 
Task Status of Volume vol_rep
------------------------------------------------------------------------------
There are no active volume tasks
 
root@fan [May-22-2014-14:54:00] >


getfattr on the brick which was always online:
==============================================
root@mia [May-22-2014-13:09:42] >getfattr -d -e hex -m . /rhs/bricks/b2/
getfattr: Removing leading '/' from absolute path names
# file: rhs/bricks/b2/
trusted.afr.vol_rep-client-0=0x000000000000000000000000
trusted.afr.vol_rep-client-1=0x000000000000000000000000
trusted.gfid=0x00000000000000000000000000000001
trusted.glusterfs.dht=0x000000010000000000000000ffffffff
trusted.glusterfs.volume-id=0x68a62e9c073b4343b10ba5d934aac6f9

Expected results:
=====================
volume-id's of the bricks shouldn't change

Additional info:
=======================

root@fan [May-22-2014-13:14:10] >gluster snapshot status volume vol_rep

Snap Name : snap_0
Snap UUID : 64539bb5-e501-4701-aa5c-0239a7728735

	Brick Path        :   fan:/var/run/gluster/snaps/129fb89816a74aa78a036282c771216d/brick1/b1
	Volume Group      :   RHS_vg1
	Brick Running     :   Yes
	Brick PID         :   1563
	Data Percentage   :   1.26
	LV Size           :   59.00g


	Brick Path        :   mia:/var/run/gluster/snaps/129fb89816a74aa78a036282c771216d/brick2/b2
	Volume Group      :   RHS_vg1
	Brick Running     :   Yes
	Brick PID         :   2796
	Data Percentage   :   1.26
	LV Size           :   59.00g


Snap Name : snap_1
Snap UUID : d48bcae3-3a05-4c80-b2c9-57693fd7f295

	Brick Path        :   fan:/var/run/gluster/snaps/8807fbdf289a439d80e0156b37905a7d/brick1/b1
	Volume Group      :   RHS_vg1
	Brick Running     :   Yes
	Brick PID         :   1568
	Data Percentage   :   3.52
	LV Size           :   59.00g


	Brick Path        :   mia:/var/run/gluster/snaps/8807fbdf289a439d80e0156b37905a7d/brick2/b2
	Volume Group      :   RHS_vg1
	Brick Running     :   Yes
	Brick PID         :   2920
	Data Percentage   :   3.60
	LV Size           :   59.00g


Snap Name : snap_2
Snap UUID : 2c974724-7fc0-4589-bed9-ad3696f5d2d9

	Brick Path        :   fan:/var/run/gluster/snaps/2c3f716e67a947b4afed5a9df9a80ef5/brick1/b1
	Volume Group      :   RHS_vg1
	Brick Running     :   Yes
	Brick PID         :   1575
	Data Percentage   :   6.71
	LV Size           :   59.00g


	Brick Path        :   mia:/var/run/gluster/snaps/2c3f716e67a947b4afed5a9df9a80ef5/brick2/b2
	Volume Group      :   RHS_vg1
	Brick Running     :   Yes
	Brick PID         :   3018
	Data Percentage   :   6.76
	LV Size           :   59.00g

root@fan [May-22-2014-13:14:20] >

root@fan [May-22-2014-13:12:39] >cat /var/lib/glusterd/vols/vol_rep/info 
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
version=9
transport-type=0
parent_volname=N/A
volume-id=68a62e9c-073b-4343-b10b-a5d934aac6f9
username=c4031efe-d623-44d9-9e21-6c27d06181cf
password=5fd73e10-76f2-450c-aa13-75d1b4849ce2
op-version=2
client-op-version=2
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
features.barrier=disable
cluster.self-heal-daemon=off
brick-0=fan:-rhs-bricks-b1
brick-1=mia:-rhs-bricks-b2
root@fan [May-22-2014-13:12:59] >

root@fan [May-22-2014-14:56:20] >cat /var/lib/glusterd/snaps/snap_0/129fb89816a74aa78a036282c771216d/info
type=2
count=2
status=1
sub_count=2
stripe_count=1
replica_count=2
version=2
transport-type=0
parent_volname=vol_rep
volume-id=129fb898-16a7-4aa7-8a03-6282c771216d
username=c15648d1-5b1b-4591-8e23-cfcede13ea62
password=ba6efb04-3c46-47d4-bb6d-fbfe43bd3ea0
op-version=4
client-op-version=2
restored_from_snap=00000000-0000-0000-0000-000000000000
snap-max-hard-limit=256
features.barrier=enable
brick-0=fan:-var-run-gluster-snaps-129fb89816a74aa78a036282c771216d-brick1-b1
brick-1=mia:-var-run-gluster-snaps-129fb89816a74aa78a036282c771216d-brick2-b2
root@fan [May-22-2014-15:02:13] >

Comment 2 spandura 2014-05-22 09:56:52 UTC

SOS Reports : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1100211/

Comment 3 spandura 2014-05-22 10:00:48 UTC

NOTE : The volume-id of the brick is not alone changed. The contents of the brick is also lost.

Comment 4 rjoseph 2014-05-22 12:41:02 UTC

The logs does not of have entries when the problem occurred. It only has the latest logs where you are getting the symptom.

Is it possible for you to reproduce it again? In our local setup it is not getting reproduced.

From the mount logs it seems that the main volume brick is mounted with the snapshot brick. Did you perform any snapshot restore operation or mounted the volume brick explicitly, or any other activity which can cause issues with the brick mount point?

Comment 5 spandura 2014-05-27 09:11:03 UTC

I didn't perform anything else other than mentioned in the steps to recreate the issue. 

I will try to recreate the issue.

Comment 6 spandura 2014-06-04 08:23:30 UTC

I am able to recreate this issue on build "glusterfs 3.6.0.12 built on Jun  3 2014 11:03:28". 

This time, the volume-id of the brick on the rebooted node changed along with, the brick which was always online got killed. 

Because of this I/O failed on the mount with Input/Output Error.

Log messages from brick which was always online and got shutdown:
=================================================================

[2014-06-04 08:17:10.700408] E [posix.c:4274:_posix_handle_xattr_keyvalue_pair] 0-vol_rep-posix: fgetxattr failed on fd=18 while doing xattrop: Key:trusted.afr.vol_rep-client-1 (Input/output error)
[2014-06-04 08:17:10.700450] I [server-rpc-fops.c:1867:server_fxattrop_cbk] 0-vol_rep-server: 65458: FXATTROP 0 (37267384-bf53-4d3a-8114-581d64090819) ==> (Success)
[2014-06-04 08:17:24.543339] W [posix-helpers.c:1409:posix_health_check_thread_proc] 0-vol_rep-posix: stat() on /rhs/bricks/b2 returned: Input/output error
[2014-06-04 08:17:24.543399] M [posix-helpers.c:1429:posix_health_check_thread_proc] 0-vol_rep-posix: health-check failed, going down
[2014-06-04 08:17:34.555909] I [client_t.c:184:gf_client_get] 0-vol_rep-server: client_uid=fan.lab.eng.blr.redhat.com-1716-2014/06/04-08:17:33:650199-vol_rep-client-1-0-0
[2014-06-04 08:17:34.556005] I [server-handshake.c:578:server_setvolume] 0-vol_rep-server: accepted client from fan.lab.eng.blr.redhat.com-1716-2014/06/04-08:17:33:650199-vol_rep-client-1-0-0 (version: 3.6.0.12)
[2014-06-04 08:17:34.556416] I [client_t.c:184:gf_client_get] 0-vol_rep-server: client_uid=fan.lab.eng.blr.redhat.com-1716-2014/06/04-08:17:33:650199-vol_rep-client-1-0-0
[2014-06-04 08:17:34.558531] W [posix-helpers.c:538:posix_pstat] 0-vol_rep-posix: lstat failed on /rhs/bricks/b2/ (Input/output error)
[2014-06-04 08:17:34.685769] I [client_t.c:184:gf_client_get] 0-vol_rep-server: client_uid=fan.lab.eng.blr.redhat.com-1691-2014/06/04-08:17:32:646939-vol_rep-client-1-0-0
[2014-06-04 08:17:34.685854] I [server-handshake.c:578:server_setvolume] 0-vol_rep-server: accepted client from fan.lab.eng.blr.redhat.com-1691-2014/06/04-08:17:32:646939-vol_rep-client-1-0-0 (version: 3.6.0.12)
[2014-06-04 08:17:34.686235] I [client_t.c:184:gf_client_get] 0-vol_rep-server: client_uid=fan.lab.eng.blr.redhat.com-1691-2014/06/04-08:17:32:646939-vol_rep-client-1-0-0
[2014-06-04 08:17:34.686433] W [posix-helpers.c:538:posix_pstat] 0-vol_rep-posix: lstat failed on /rhs/bricks/b2/ (Input/output error)
[2014-06-04 08:17:34.686799] W [posix-helpers.c:538:posix_pstat] 0-vol_rep-posix: lstat failed on /rhs/bricks/b2/ (Input/output error)
[2014-06-04 08:17:34.686826] E [posix.c:148:posix_lookup] 0-vol_rep-posix: lstat on /rhs/bricks/b2/ failed: Input/output error
[2014-06-04 08:17:34.686941] W [posix-helpers.c:538:posix_pstat] 0-vol_rep-posix: lstat failed on /rhs/bricks/b2/ (Input/output error)
[2014-06-04 08:17:34.686972] E [posix.c:148:posix_lookup] 0-vol_rep-posix: lstat on /rhs/bricks/b2/ failed: Input/output error
[2014-06-04 08:17:34.687012] E [server-rpc-fops.c:190:server_lookup_cbk] 0-vol_rep-server: 8: LOOKUP / (00000000-0000-0000-0000-000000000001) ==> (Input/output error)
[2014-06-04 08:17:54.543680] M [posix-helpers.c:1434:posix_health_check_thread_proc] 0-vol_rep-posix: still alive! -> SIGTERM
[2014-06-04 08:17:54.544080] W [glusterfsd.c:1182:cleanup_and_exit] (--> 0-: received signum (15), shutting down

Comment 7 rjoseph 2014-06-05 08:31:04 UTC

This issue is seen because file-system UUID of origin volume and snapshot volume are same. When we take LVM snapshot file-system UUID is also replicated. 

There are file-system specific tools available to fix this issue, but AFAIK no file-system agnostic solution is available as of now. Will be sending a patch soon after some more investigation.

Comment 8 rjoseph 2014-06-12 10:30:40 UTC

Review posted in downstream

https://code.engineering.redhat.com/gerrit/#/c/26739/

Comment 9 spandura 2014-06-16 13:06:06 UTC

Verified the fix on the build "glusterfs 3.6.0.17 built on Jun 13 2014 11:01:21" using the steps as mentioned in the bug description. 

Bug is fixed. Moving the bug to Verified state.

Comment 11 errata-xmlrpc 2014-09-22 19:39:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2014-1278.html