Bug 1248897 - Split-Brain found on "/" after replacing a brick using replace-brick commit force
Split-Brain found on "/" after replacing a brick using replace-brick commit f...
Status: ASSIGNED
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate (Show other bugs)
3.1
Unspecified Unspecified
medium Severity high
: ---
: ---
Assigned To: Pranith Kumar K
storage-qa-internal@redhat.com
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-07-31 02:42 EDT by spandura
Modified: 2017-11-17 13:19 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Scripts required to execute the case (12.40 KB, text/x-matlab)
2015-07-31 05:35 EDT, spandura
no flags Details

  None (edit)
Description spandura 2015-07-31 02:42:09 EDT
Description of problem:
========================
On a 2x2 dis-rep volume, nodes were replaced with new nodes. After replacing the nodes, split-brain were observed on the "/". 

Version-Release number of selected component (if applicable):
============================================================
glusterfs-3.7.1-10.el6rhs.x86_64

How reproducible:
================
2/2

Steps to Reproduce:
======================
1. Create 2 x 2 dis-rep volume, Start the volume. Create fuse mount

2. From fuse mount execute "self_heal_all_file_types.sh <mount-point> "glusterfs" "create".

3. Bring down any combination randomly from set :
volume_2_2_brick_takedown_combinations = [
    ["Brick1"],
    ["Brick2"],
    ["Brick3"],
    ["Brick4"],
    ["Brick1", "Brick3"],
    ["Brick1", "Brick4"],
    ["Brick2", "Brick3"],
    ["Brick2", "Brick4"], ]

4. Wait for the IO to complete and calculate arequal-checkm 

5. Perform replace-brick on the offline brick. 

6. Wait for the self-heal to complete. 

7. Verify arequal-checksum after self-heal and before replace-brick. (Both should match)

8. From fuse mount execute "self_heal_all_file_types.sh <mount-point> "glusterfs" "modify".

9. Bring down any combination randomly from set :
volume_2_2_brick_takedown_combinations = [
    ["Brick1"],
    ["Brick2"],
    ["Brick3"],
    ["Brick4"],
    ["Brick1", "Brick3"],
    ["Brick1", "Brick4"],
    ["Brick2", "Brick3"],
    ["Brick2", "Brick4"], ]

10. Wait for the IO to complete and calculate arequal-checkm 

11. Perform replace-brick on the offline brick. 

12. Wait for the self-heal to complete. 

13. Verify arequal-checksum after self-heal and before replace-brick. (Both should match)

Repeat the above until all the combinations are executed. Also , ONce brought down brick combination not to be brought down again. 

Actual results:
===============
While executing the above test case for the 4th time when the bricks were brought offline, there was GFID mismatch in the files. 

Expected results:
================
heal should be successful
Comment 2 spandura 2015-07-31 05:35:56 EDT
Created attachment 1058005 [details]
Scripts required to execute the case
Comment 5 Anuradha 2015-08-27 07:02:14 EDT
Shwetha,

I don't have access rights to view the SOS reports. Please provide required permissions.

Note You need to log in before you can comment on or make changes to this bug.