996513 – glusterd :- If peers are down in cluster and volumes are created again after clean up, when peers are back online, it overwrites/sync volume metadata (overwrites new volume)

Bug 996513 - glusterd :- If peers are down in cluster and volumes are created again after clean up, when peers are back online, it overwrites/sync volume metadata (overwrites new volume)

Summary: glusterd :- If peers are down in cluster and volumes are created again after ...

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Nagaprasad Sathyanarayana
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-08-13 10:35 UTC by Rachana Patel
Modified:	2016-02-18 00:21 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-12-03 17:11:03 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Rachana Patel 2013-08-13 10:35:19 UTC

Description of problem:
glusterd :- If peers are down in cluster and volumes are created again after clean up, when peers are back online, it overwrites/sync volume metadata (overwrites new volume)

Version-Release number of selected component (if applicable):
3.4.0.18rhs-1.el6rhs.x86_64


How reproducible:
always

Steps to Reproduce:
1) In cluster of four server(rhsauto018, rhsauto027, rhsauto026,rhsauto031), created few volumes(of different type)

[root@rhsauto018 ~]# gluster v info
 
Volume Name: slave1
Type: Distributed-Replicate
Volume ID: 51f90b10-7267-438c-8665-9259f48b36cc
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick3
Brick2: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick3
Brick3: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick3
Brick4: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick4
Brick5: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick4
Brick6: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick4
 
Volume Name: slave4
Type: Distributed-Replicate
Volume ID: 1875fba4-7f4e-497c-bce7-488a129f52c3
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/4-1
Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/4-1
Brick3: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/4-1
Brick4: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/4-2
Brick5: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/4-2
Brick6: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/4-2
 
Volume Name: slave7
Type: Distribute
Volume ID: 66ef9107-1884-441f-95ae-dcd60a00a42a
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/7
Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/7
Brick3: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/7
 
Volume Name: slave6
Type: Distribute
Volume ID: 127fcc58-736c-4af9-8cb9-ec0f435f2e89
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/6
Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/6
Brick3: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/6
 
Volume Name: slave8
Type: Distribute
Volume ID: 824f0449-844f-4762-ae20-2813964a9624
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/8
Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/8
Brick3: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/8
 
Volume Name: slave5
Type: Distribute
Volume ID: 61fd12ac-1472-429e-83bb-980803e0fa12
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/5
Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/5
Brick3: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/5
 
Volume Name: slave2
Type: Distribute
Volume ID: a2dcf3f2-1526-4258-ab4a-6894db73a9fd
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick1
Brick2: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick1
Brick3: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick1
Brick4: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick2


2) 2 RHS servers went down(rhsauto026, rhsauto027) and were not coming up so done a clean up and reconfiguration as mentioned below
a)stop all volumes (successful, no error reported)
b)delete all volumes
c)remove x-attr from all bricks using 'setfattr -x'

d) create two new volumes (yes, names are matching to names of deleted volumes) using few bricks which were previously used in volume creation

e) check gluster v info

[root@rhsauto031 ~]# gluster v info
 
Volume Name: slave1
Type: Distribute
Volume ID: 45d2fd0a-a44d-4c69-adb9-30afb67ab61e
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: rhsauto031.lab.eng.blr.redhat.com:/rhs/brick1
Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick1
Brick3: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick2
Brick4: rhsauto031.lab.eng.blr.redhat.com:/rhs/brick3
 
Volume Name: slave2
Type: Distribute
Volume ID: c9ecb6d4-4245-4f32-81f2-03a6dbf3d7a6
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick3
Brick2: rhsauto031.lab.eng.blr.redhat.com:/rhs/brick2

3) after a day or more, down server came up and it synced volume meta data in wrongly
It overwrites new volume definition and gluster volume info shows old info
(all deleted volumes are present)



Actual results:
It overwrites new volume definition and gluster volume info shows old info
(all deleted volumes are present)


Expected results:
It should sync data from server which was up for long time to server which just came up. 

Additional info:
while clean up, if you remove server from cluster using 'gluster peer detach <peer> force' then not able to reproduce this

Comment 1 Rachana Patel 2013-08-13 10:36:45 UTC

sosreport @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/996412/

Comment 5 Vivek Agarwal 2015-12-03 17:11:03 UTC

Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.

Note You need to log in before you can comment on or make changes to this bug.