Bug 1029778

Summary: AFR : Hardlinks of a file are not self-healed after resolving the split-brain on the file
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: spandura
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED NOTABUG QA Contact: Sudhir D <sdharane>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.1CC: pkarampu, rhs-bugs, spandura, storage-qa-internal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-30 09:34:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description spandura 2013-11-13 08:01:49 UTC
Description of problem:
==========================
In a 1 x 3 replicate volume, a file is in data split-brain state. Created hard-link to the file. Creation of hardlink was successful. Resolved the split-brain of the file  by removing the file , file's hardlink, .glusterfs entry for the file on 2 bricks {brick1, brick2} considering brick3 as good copy. 

1) "ls -l" from mount point doesn't list the hardlink file even after resolving the split-brain on the file. 

2) Hardlink files is not self-healed. 

3) Stat on the file just shows Number of Links as "1" instead of Actual number of hardlinks. 

Version-Release number of selected component (if applicable):
=============================================================
glusterfs 3.4.0.35.1u2rhs built on Oct 21 2013 14:00:58

How reproducible:
=================
Often

Steps to Reproduce:
====================
1.Create 1 x 3 replicate volume. Set self-heal-daemon to off. Start the volume

2.Create 2 fuse mounts. 

3.From one mount point create a file : " dd if=/dev/urandom of=./test_file bs=1M count=1"

4. Bring down brick1 and brick2

5. From mount1 write data to "test_file": " dd if=/dev/urandom of=./test_file bs=2M count=1"

6. Bring back brick1 . Bring down brick3

7. From mount2 write data to "test_file" : " dd if=/dev/urandom of=./test_file bs=3M count=1"

8. Bring back brick2. Bring down brick1

9. From mount2 write data to "test_file" : " dd if=/dev/urandom of=./test_file bs=4M count=1"

10. Bring back brick1 and brick3. 

Note: At this state the file is in split-brain state.

11. From mount2, create hard-link to split-brain file {Successful}

12. Rename "test_file" to "test_file1" {Successful}

13. Create a symbolic link to the "test_file1" {successful}

13. Resolve split-brain on brick1 and brick2. i.e remove the test_file1 , hardlink file , .glusterfs entry on brick1 and brick2. Retain the brick3 copy. 

Actual results:
==================
1. "ls -l" from mount point doesn't report the hard links. 

2. "test_file1" is self-healed to brick1 and brick2. 

3. Hard-links are not self-healed. 

4. stat shows number of Links as "1" 

root@rhs-client14 [Nov-13-2013- 6:03:06] >stat test_file1 
  File: `test_file1'
  Size: 2097152   	Blocks: 4096       IO Block: 131072 regular file
Device: 1eh/30d	Inode: 13299062108449068165  Links: 1
Access: (0666/-rw-rw-rw-)  Uid: (  501/ qa_func)   Gid: (  503/qa_system)
Access: 2013-11-12 10:37:53.000000000 +0000
Modify: 2013-11-12 12:06:25.281715000 +0000
Change: 2013-11-12 12:40:40.562173454 +0000

Expected results:
=================
Should self-heal hard-links.

Comment 2 spandura 2013-11-13 08:27:43 UTC
SOS Reports: http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1029778/


root@rhs-client11 [Nov-13-2013- 8:03:35] >gluster v info
 
Volume Name: vol_rep
Type: Replicate
Volume ID: d75d19c8-fb2f-475e-915c-d24d4dede1e3
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: rhs-client11:/rhs/bricks/b1
Brick2: rhs-client12:/rhs/bricks/b1-rep1
Brick3: rhs-client13:/rhs/bricks/b1-rep2
Options Reconfigured:
nfs.disable: on
cluster.self-heal-daemon: off
root@rhs-client11 [Nov-13-2013- 8:27:22] >

Comment 3 Pranith Kumar K 2013-12-27 09:46:49 UTC
Shwetha,
     I followed the steps and was seeing the expected behavior instead of the bug. How was the resolution of split-brain performed in the setup? After the file, its hardlink + gfid-link are removed, we need to access both file, its hardlink from the mount point to make sure both the files are healed. 'ls -l' may not show this file until this is performed because, afr does not know that 'brick3' is the source at the time because there are no extended attributes to say so.

You can see the following output:
Initially find . | xargs stat only shows the softlink, no files at all.
but once both 'test_fil1, h_test_file' are accessed both the files are re-created. And further 'ls -l' shows all the files as expected.

root@pranithk-laptop - /mnt/r2 
15:04:21 :) ⚡ find . | xargs stat
  File: ‘.’
  Size: 42        	Blocks: 1          IO Block: 131072 directory
Device: 24h/36d	Inode: 1           Links: 3
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:fusefs_t:s0
Access: 2013-12-27 15:06:17.378897044 +0530
Modify: 2013-12-27 15:06:11.103890634 +0530
Change: 2013-12-27 15:06:11.103890634 +0530
 Birth: -
  File: ‘./s_test_file1’ -> ‘test_fil1’
  Size: 9         	Blocks: 0          IO Block: 131072 symbolic link
Device: 24h/36d	Inode: 9866554501010886634  Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Context: system_u:object_r:fusefs_t:s0
Access: 2013-12-27 15:06:17.378897044 +0530
Modify: 2013-12-27 15:04:21.226778369 +0530
Change: 2013-12-27 15:04:21.226778369 +0530
 Birth: -

root@pranithk-laptop - /mnt/r2 
15:07:33 :) ⚡ ls -l test_fil1
-rw-r--r--. 2 root root 2097152 Dec 27 14:58 test_fil1

root@pranithk-laptop - /mnt/r2 
15:08:08 :) ⚡ ls -l h_test_file
-rw-r--r--. 2 root root 2097152 Dec 27 14:58 h_test_file

root@pranithk-laptop - /mnt/r2 
15:08:14 :) ⚡ ls -l
total 4096
-rw-r--r--. 2 root root 2097152 Dec 27 14:58 h_test_file
lrwxrwxrwx. 1 root root       9 Dec 27 15:04 s_test_file1 -> test_fil1
-rw-r--r--. 2 root root 2097152 Dec 27 14:58 test_fil1

Pranith.

Comment 5 Red Hat Bugzilla 2023-09-14 01:53:34 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days