Description of problem: glusterd :- If peers are down in cluster and volumes are created again after clean up, when peers are back online, it overwrites/sync volume metadata (overwrites new volume) Version-Release number of selected component (if applicable): 3.4.0.18rhs-1.el6rhs.x86_64 How reproducible: always Steps to Reproduce: 1) In cluster of four server(rhsauto018, rhsauto027, rhsauto026,rhsauto031), created few volumes(of different type) [root@rhsauto018 ~]# gluster v info Volume Name: slave1 Type: Distributed-Replicate Volume ID: 51f90b10-7267-438c-8665-9259f48b36cc Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick3 Brick2: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick3 Brick3: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick3 Brick4: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick4 Brick5: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick4 Brick6: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick4 Volume Name: slave4 Type: Distributed-Replicate Volume ID: 1875fba4-7f4e-497c-bce7-488a129f52c3 Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/4-1 Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/4-1 Brick3: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/4-1 Brick4: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/4-2 Brick5: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/4-2 Brick6: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/4-2 Volume Name: slave7 Type: Distribute Volume ID: 66ef9107-1884-441f-95ae-dcd60a00a42a Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/7 Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/7 Brick3: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/7 Volume Name: slave6 Type: Distribute Volume ID: 127fcc58-736c-4af9-8cb9-ec0f435f2e89 Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/6 Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/6 Brick3: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/6 Volume Name: slave8 Type: Distribute Volume ID: 824f0449-844f-4762-ae20-2813964a9624 Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/8 Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/8 Brick3: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/8 Volume Name: slave5 Type: Distribute Volume ID: 61fd12ac-1472-429e-83bb-980803e0fa12 Status: Started Number of Bricks: 3 Transport-type: tcp Bricks: Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick5/5 Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick5/5 Brick3: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick5/5 Volume Name: slave2 Type: Distribute Volume ID: a2dcf3f2-1526-4258-ab4a-6894db73a9fd Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: rhsauto026.lab.eng.blr.redhat.com:/rhs/brick1 Brick2: rhsauto027.lab.eng.blr.redhat.com:/rhs/brick1 Brick3: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick1 Brick4: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick2 2) 2 RHS servers went down(rhsauto026, rhsauto027) and were not coming up so done a clean up and reconfiguration as mentioned below a)stop all volumes (successful, no error reported) b)delete all volumes c)remove x-attr from all bricks using 'setfattr -x' d) create two new volumes (yes, names are matching to names of deleted volumes) using few bricks which were previously used in volume creation e) check gluster v info [root@rhsauto031 ~]# gluster v info Volume Name: slave1 Type: Distribute Volume ID: 45d2fd0a-a44d-4c69-adb9-30afb67ab61e Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: rhsauto031.lab.eng.blr.redhat.com:/rhs/brick1 Brick2: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick1 Brick3: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick2 Brick4: rhsauto031.lab.eng.blr.redhat.com:/rhs/brick3 Volume Name: slave2 Type: Distribute Volume ID: c9ecb6d4-4245-4f32-81f2-03a6dbf3d7a6 Status: Started Number of Bricks: 2 Transport-type: tcp Bricks: Brick1: rhsauto018.lab.eng.blr.redhat.com:/rhs/brick3 Brick2: rhsauto031.lab.eng.blr.redhat.com:/rhs/brick2 3) after a day or more, down server came up and it synced volume meta data in wrongly It overwrites new volume definition and gluster volume info shows old info (all deleted volumes are present) Actual results: It overwrites new volume definition and gluster volume info shows old info (all deleted volumes are present) Expected results: It should sync data from server which was up for long time to server which just came up. Additional info: while clean up, if you remove server from cluster using 'gluster peer detach <peer> force' then not able to reproduce this
sosreport @ http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/996412/
Thank you for submitting this issue for consideration in Red Hat Gluster Storage. The release for which you requested us to review, is now End of Life. Please See https://access.redhat.com/support/policy/updates/rhs/ If you can reproduce this bug against a currently maintained version of Red Hat Gluster Storage, please feel free to file a new report against the current release.