Bug 996513

Summary: glusterd :- If peers are down in cluster and volumes are created again after clean up, when peers are back online, it overwrites/sync volume metadata (overwrites new volume)
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rachana Patel <racpatel>
Component: glusterdAssignee: Nagaprasad Sathyanarayana <nsathyan>
Status: CLOSED EOL QA Contact: storage-qa-internal <storage-qa-internal>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.1CC: mzywusko, rhs-bugs, smohan, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-12-03 17:11:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Rachana Patel 2013-08-13 10:35:19 UTC
Description of problem:
glusterd :- If peers are down in cluster and volumes are created again after clean up, when peers are back online, it overwrites/sync volume metadata (overwrites new volume)

Version-Release number of selected component (if applicable):
3.4.0.18rhs-1.el6rhs.x86_64


How reproducible:
always

Steps to Reproduce:
1) In cluster of four server(rhsauto018, rhsauto027, rhsauto026,rhsauto031), created few volumes(of different type)

[root@rhsauto018 ~]# gluster v info
 
Volume Name: slave1
Type: Distributed-Replicate
Volume ID: 51f90b10-7267-438c-8665-9259f48b36cc
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick3
Brick2: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick3
Brick3: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick3
Brick4: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick4
Brick5: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick4
Brick6: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick4
 
Volume Name: slave4
Type: Distributed-Replicate
Volume ID: 1875fba4-7f4e-497c-bce7-488a129f52c3
Status: Started
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/4-1
Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/4-1
Brick3: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/4-1
Brick4: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/4-2
Brick5: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/4-2
Brick6: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/4-2
 
Volume Name: slave7
Type: Distribute
Volume ID: 66ef9107-1884-441f-95ae-dcd60a00a42a
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/7
Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/7
Brick3: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/7
 
Volume Name: slave6
Type: Distribute
Volume ID: 127fcc58-736c-4af9-8cb9-ec0f435f2e89
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/6
Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/6
Brick3: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/6
 
Volume Name: slave8
Type: Distribute
Volume ID: 824f0449-844f-4762-ae20-2813964a9624
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/8
Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/8
Brick3: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/8
 
Volume Name: slave5
Type: Distribute
Volume ID: 61fd12ac-1472-429e-83bb-980803e0fa12
Status: Started
Number of Bricks: 3
Transport-type: tcp
Bricks:
Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/5
Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/5
Brick3: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/5
 
Volume Name: slave2
Type: Distribute
Volume ID: a2dcf3f2-1526-4258-ab4a-6894db73a9fd
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick1
Brick2: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick1
Brick3: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick1
Brick4: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick2


2) 2 RHS servers went down(rhsauto026, rhsauto027) and were not coming up so done a clean up and reconfiguration as mentioned below
a)stop all volumes (successful, no error reported)
b)delete all volumes
c)remove x-attr from all bricks using 'setfattr -x'

d) create two new volumes (yes, names are matching to names of deleted volumes) using few bricks which were previously used in volume creation

e) check gluster v info

[root@rhsauto031 ~]# gluster v info
 
Volume Name: slave1
Type: Distribute
Volume ID: 45d2fd0a-a44d-4c69-adb9-30afb67ab61e
Status: Started
Number of Bricks: 4
Transport-type: tcp
Bricks:
Brick1: rhsauto031.lab.eng.blr.redhat.com:/rhs/brick1
Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick1
Brick3: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick2
Brick4: rhsauto031.lab.eng.blr.redhat.com:/rhs/brick3
 
Volume Name: slave2
Type: Distribute
Volume ID: c9ecb6d4-4245-4f32-81f2-03a6dbf3d7a6
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick3
Brick2: rhsauto031.lab.eng.blr.redhat.com:/rhs/brick2

3) after a day or more, down server came up and it synced volume meta data in wrongly
It overwrites new volume definition and gluster volume info shows old info
(all deleted volumes are present)



Actual results:
It overwrites new volume definition and gluster volume info shows old info
(all deleted volumes are present)


Expected results:
It should sync data from server which was up for long time to server which just came up. 

Additional info:
while clean up, if you remove server from cluster using 'gluster peer detach <peer> force' then not able to reproduce this

Comment 1 Rachana Patel 2013-08-13 10:36:45 UTC
sosreport @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/996412/

Comment 5 Vivek Agarwal 2015-12-03 17:11:03 UTC
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/

If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.