Bug 1248897

Summary:

Split-Brain found on "/" after replacing a brick using replace-brick commit force

Product:

[Red Hat Storage] Red Hat Gluster Storage

Reporter:

spandura

Component:

replicate

Assignee:

Pranith Kumar K <pkarampu>

Status:

CLOSED WONTFIX

QA Contact:

Anees Patel <anepatel>

Severity:

high

Docs Contact:

Priority:

medium

Version:

rhgs-3.1

CC:

anepatel, nchilaka, pkarampu, ravishankar, rcyriac, rhinduja, rhs-bugs, sanandpa, vavuthu

Target Milestone:

---

Keywords:

ZStream

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-04-16 18:08:49 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Scripts required to execute the case	none

Description spandura 2015-07-31 06:42:09 UTC

Description of problem:
========================
On a 2x2 dis-rep volume, nodes were replaced with new nodes. After replacing the nodes, split-brain were observed on the "/". 

Version-Release number of selected component (if applicable):
============================================================
glusterfs-3.7.1-10.el6rhs.x86_64

How reproducible:
================
2/2

Steps to Reproduce:
======================
1. Create 2 x 2 dis-rep volume, Start the volume. Create fuse mount

2. From fuse mount execute "self_heal_all_file_types.sh <mount-point> "glusterfs" "create".

3. Bring down any combination randomly from set :
volume_2_2_brick_takedown_combinations = [
    ["Brick1"],
    ["Brick2"],
    ["Brick3"],
    ["Brick4"],
    ["Brick1", "Brick3"],
    ["Brick1", "Brick4"],
    ["Brick2", "Brick3"],
    ["Brick2", "Brick4"], ]

4. Wait for the IO to complete and calculate arequal-checkm 

5. Perform replace-brick on the offline brick. 

6. Wait for the self-heal to complete. 

7. Verify arequal-checksum after self-heal and before replace-brick. (Both should match)

8. From fuse mount execute "self_heal_all_file_types.sh <mount-point> "glusterfs" "modify".

9. Bring down any combination randomly from set :
volume_2_2_brick_takedown_combinations = [
    ["Brick1"],
    ["Brick2"],
    ["Brick3"],
    ["Brick4"],
    ["Brick1", "Brick3"],
    ["Brick1", "Brick4"],
    ["Brick2", "Brick3"],
    ["Brick2", "Brick4"], ]

10. Wait for the IO to complete and calculate arequal-checkm 

11. Perform replace-brick on the offline brick. 

12. Wait for the self-heal to complete. 

13. Verify arequal-checksum after self-heal and before replace-brick. (Both should match)

Repeat the above until all the combinations are executed. Also , ONce brought down brick combination not to be brought down again. 

Actual results:
===============
While executing the above test case for the 4th time when the bricks were brought offline, there was GFID mismatch in the files. 

Expected results:
================
heal should be successful

Comment 2 spandura 2015-07-31 09:35:56 UTC

Created attachment 1058005 [details]
Scripts required to execute the case

Comment 4 spandura 2015-08-27 10:36:48 UTC

SOS Report : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1248897/

Comment 5 Anuradha 2015-08-27 11:02:14 UTC

Shwetha,

I don't have access rights to view the SOS reports. Please provide required permissions.

Comment 11 Ravishankar N 2017-11-28 10:22:35 UTC

Hi Vijay, can you see if you can re-create this issue?

Comment 15 Anees Patel 2019-07-04 16:14:43 UTC

Executed steps described in comment#0
Replace-brick was done and IO's were generated per the script in comment#0
No split-brain was found.
Test passed 3/3

Gluster version:
# rpm -qa | grep gluster
glusterfs-cli-6.0-7.el7rhgs.x86_64
glusterfs-api-6.0-7.el7rhgs.x86_64
glusterfs-resource-agents-6.0-7.el7rhgs.noarch
python2-gluster-6.0-7.el7rhgs.x86_64
glusterfs-geo-replication-6.0-7.el7rhgs.x86_64