Bug 975754 - afr: self heal of files failed when simulated a disk replacement scenario
afr: self heal of files failed when simulated a disk replacement scenario
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterd (Show other bugs)
2.1
x86_64 Linux
medium Severity urgent
: ---
: ---
Assigned To: Pranith Kumar K
spandura
: TestBlocker
Depends On:
Blocks: 986905
  Show dependency treegraph
 
Reported: 2013-06-19 05:31 EDT by Rahul Hinduja
Modified: 2013-09-23 18:29 EDT (History)
9 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0.12rhs.beta6-1
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 986905 (view as bug list)
Environment:
Last Closed: 2013-09-23 18:29:52 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rahul Hinduja 2013-06-19 05:31:19 EDT
Description of problem:
=======================

self heal of hard linked files failed when simulated a disk replacement scenario and than healing with heal full. 

Files got created as regular empty files on the bricks which were removed and created with the same name followed by volume start force and heal full

Detailed case mentioned below:


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.4.0.9rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.9rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.9rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.9rhs-1.el6rhs.x86_64
glusterfs-devel-3.4.0.9rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.9rhs-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.9rhs-1.el6rhs.x86_64

Steps Carried:
==============
1. Created 6*2 volume from 4 servers (server1-4)
2. Mounted on client (Fuse and NFS)
3. Created f and n directory from the mount.
4. cd to f from fuse mount
5. cd to n from nfs mount
6. Created test_hardlink_self_heal directory from f and n

7. create files and directories under “test_hardlink_self_heal” using the following code from mount point f and n : 
“ cd test_hardlink_self_heal ; for i in `seq 1 5` ; do mkdir dir.$i ; for j in `seq 1 10` ; do dd if=/dev/input_file of=dir.$i/file.$j bs=1k count=$j ; done ; done ; cd ../   ”

8. Brought down bricks from server 2 and powered off server 4.

9. create hard links to files under directory “test_hardlink_self_heal/dir.*” using the following code from mount point f and n: 
“ cd test_hardlink_self_heal ; for i in `seq 1 5` ; do for j in `seq 1 10` ; do ln dir.$i/file.$j dir.$i/link_file.$j; done ; done ; cd ../  “

10. Brought back server 4 and started the volume forcefully.

11. Self heal started and completed successfully.

12. verify the hard links are self-healed using the following code from the mount point:
“ ( cd test_hardlink_self_heal ; for i in `seq 1 5` ; do for j in `seq 1 10` ; do if [ `stat -c %i dir.$i/file.$j` != `stat -c %i dir.$i/link_file.$j` ] ; then exit 1 ; fi ; done ; done ; cd ../ ) “

echo $? shows exit status 0.

13. Brought down bricks from server1 and server3 (kill -9)

14. create hard links to files under directory “test_hardlink_self_heal/dir.*” using the following code from mount point: 
“ cd test_hardlink_self_heal ; for i in `seq 1 5` ; do mkdir new_dir.$i ; for j in `seq 1 10` ; do ln dir.$i/file.$j new_dir.$i/new_file.$j ; done ; done ; cd ../ “

15. Bring back all offlined process using volume start force.

16. verify the hard links are self-healed using the following code from the mount point:
“ ( cd test_hardlink_self_heal ; for i in `seq 1 5` ; do for j in `seq 1 10` ; do if [ `stat -c %i dir.$i/file.$j` != `stat -c %i new_dir.$i/new_file.$j` ] ; then exit 1 ; fi ; done ; done ; cd ../ )   “

echo $? shows exit status 0.

17. Brought down all brick process from server2 and server4 (kill -9)

18. remove one brick's directory from every replicate-subvolumes to simulate disk replacement scenario (removed b2,b4,b6 from server2 and b8,b10,b12 from server4)

19. Created removed directory with the same name (created b2,b4,b6 from server2 and b8,b10,b12 from server4)

20. Started the volume forcefully which is successful.

21. Start the heal full using "gluster volume heal <vol-name> full"

22. Few of the files got created on server2 and server4 as regular empty files.

Actual results:
===============

Arequal did not match on the bricks of same replica pair. Few files got created on server2 and server4 as 0 byte. 

Number of links differ between the bricks of replica pair.

Files that are in question are as follows:
==========================================

b1/b2

< /rhs/brick1/b1/n/test_hardlink_self_heal/dir.2/file.1 0x0577a082ec314310ad5a33f1a0189977 1024
---
> /rhs/brick1/b2/n/test_hardlink_self_heal/dir.2/file.1 0x0577a082ec314310ad5a33f1a0189977 0


b3/b4

17c17
< /rhs/brick1/b3/f/test_hardlink_self_heal/dir.2/file.6 0x0be124943ec64ad39e83a5f84b950855 6144
---
> /rhs/brick1/b4/f/test_hardlink_self_heal/dir.2/file.6 0x0be124943ec64ad39e83a5f84b950855 0
20c20
< /rhs/brick1/b3/f/test_hardlink_self_heal/dir.3/file.6 0x0845cb18d7314d6a90abe4912324eff5 6144
---
> /rhs/brick1/b4/f/test_hardlink_self_heal/dir.3/file.6 0x0845cb18d7314d6a90abe4912324eff5 0
81c81
< /rhs/brick1/b3/n/test_hardlink_self_heal/dir.3/file.1 0x166b00abee30437699da3d1a8a9cc484 1024
---
> /rhs/brick1/b4/n/test_hardlink_self_heal/dir.3/file.1 0x166b00abee30437699da3d1a8a9cc484 0


b5/b6

5c5
< /rhs/brick1/b5/f/test_hardlink_self_heal/dir.1/file.6 0x77f745a0f5cf4efca2da40a75de76069 6144
---
> /rhs/brick1/b6/f/test_hardlink_self_heal/dir.1/file.6 0x77f745a0f5cf4efca2da40a75de76069 0
20c20
< /rhs/brick1/b5/f/test_hardlink_self_heal/dir.4/file.6 0xb6ef74f3fed74971a6cd352acb69fca8 6144
---
> /rhs/brick1/b6/f/test_hardlink_self_heal/dir.4/file.6 0xb6ef74f3fed74971a6cd352acb69fca8 0
64c64
< /rhs/brick1/b5/n/test_hardlink_self_heal/dir.2/file.6 0xd11ade94f7084a778d6e20e4aff9a860 6144
---
> /rhs/brick1/b6/n/test_hardlink_self_heal/dir.2/file.6 0xd11ade94f7084a778d6e20e4aff9a860 0


b7/b8

23c23
< /rhs/brick1/b7/f/test_hardlink_self_heal/dir.5/file.6 0xece85a9a422247fa9836ee1aeec8117b 6144
---
> /rhs/brick1/b8/f/test_hardlink_self_heal/dir.5/file.6 0xece85a9a422247fa9836ee1aeec8117b 0
48c48
< /rhs/brick1/b7/n/test_hardlink_self_heal/dir.1/file.1 0x582860c6ed31459fb9c913d60b8530e5 1024
---
> /rhs/brick1/b8/n/test_hardlink_self_heal/dir.1/file.1 0x582860c6ed31459fb9c913d60b8530e5 0
60c60
< /rhs/brick1/b7/n/test_hardlink_self_heal/dir.3/file.6 0x26c5ab665e8149e49719e054e30a3dc6 6144
---
> /rhs/brick1/b8/n/test_hardlink_self_heal/dir.3/file.6 0x26c5ab665e8149e49719e054e30a3dc6 0
75c75
< /rhs/brick1/b7/n/test_hardlink_self_heal/dir.5/file.6 0x49ce57de4a8746949a98a5fcccc6df60 6144
---
> /rhs/brick1/b8/n/test_hardlink_self_heal/dir.5/file.6 0x49ce57de4a8746949a98a5fcccc6df60 0


b9/b10

< /rhs/brick1/b9/n/test_hardlink_self_heal/dir.4/file.6 0x5d14b14e4dd44ed885af89b6a2045fa5 6144
---
> /rhs/brick1/b10/n/test_hardlink_self_heal/dir.4/file.6 0x5d14b14e4dd44ed885af89b6a2045fa5 0


b11/b12

8c8
< /rhs/brick1/b11/f/test_hardlink_self_heal/dir.2/file.1 0x268ebce2c8a64d46bacc55bb9722142d 1024
---
> /rhs/brick1/b12/f/test_hardlink_self_heal/dir.2/file.1 0x268ebce2c8a64d46bacc55bb9722142d 0
46c46
< /rhs/brick1/b11/n/test_hardlink_self_heal/dir.1/file.6 0xbd6bfd4c5df04ef9b21359aec3f2913b 6144
---
> /rhs/brick1/b12/n/test_hardlink_self_heal/dir.1/file.6 0xbd6bfd4c5df04ef9b21359aec3f2913b 0



Stat on file which is in question for b1/b2 is
==============================================

# stat /rhs/brick1/b1/n/test_hardlink_self_heal/dir.2/file.1  File: `/rhs/brick1/b1/n/test_hardlink_self_heal/dir.2/file.1'
  Size: 1024      	Blocks: 8          IO Block: 4096   regular file
Device: fd02h/64770d	Inode: 1744831047  Links: 4
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-06-19 11:54:54.846039000 -0400
Modify: 2013-06-19 11:54:54.849039000 -0400
Change: 2013-06-19 12:51:20.620036657 -0400


# stat /rhs/brick1/b2/n/test_hardlink_self_heal/dir.2/file.1  File: `/rhs/brick1/b2/n/test_hardlink_self_heal/dir.2/file.1'
  Size: 0         	Blocks: 0          IO Block: 4096   regular empty file
Device: fd02h/64770d	Inode: 1006633866  Links: 2
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-06-19 12:42:40.095142205 -0400
Modify: 2013-06-19 12:42:40.095142205 -0400
Change: 2013-06-19 12:42:40.095142205 -0400


Getfattr:
=========

# getfattr -d -e hex -m . /rhs/brick1/b1/n/test_hardlink_self_heal/dir.2/file.1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/b1/n/test_hardlink_self_heal/dir.2/file.1
trusted.afr.vol-dis-rep-client-0=0x000000000000000000000000
trusted.afr.vol-dis-rep-client-1=0x000000000000000000000000
trusted.gfid=0x0577a082ec314310ad5a33f1a0189977


# getfattr -d -e hex -m . /rhs/brick1/b2/n/test_hardlink_self_heal/dir.2/file.1
getfattr: Removing leading '/' from absolute path names
# file: rhs/brick1/b2/n/test_hardlink_self_heal/dir.2/file.1
trusted.gfid=0x0577a082ec314310ad5a33f1a0189977


arequal miss match between servers:
===================================

server1:
========

# ./areequal-checksum 
----------------------- Subvolume: 1----------------------

Entry counts
Regular files   : 83
Directories     : 25
Symbolic links  : 0
Other           : 0
Total           : 108

Metadata checksums
Regular files   : 48a1e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 42e2a6e4d9c94a782ce440b6c62f0572
Directories     : 656c6e666f7b7c48
Symbolic links  : 0
Other           : 0
Total           : b6a8834709d3342
----------------------- Subvolume: 2----------------------

Entry counts
Regular files   : 90
Directories     : 25
Symbolic links  : 0
Other           : 0
Total           : 115

Metadata checksums
Regular files   : 20c85
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 7172e858de313ff41f7951c0bc5dbf80
Directories     : 6c595f710e040667
Symbolic links  : 0
Other           : 0
Total           : 252e6e96c688613
----------------------- Subvolume: 3----------------------

Entry counts
Regular files   : 96
Directories     : 25
Symbolic links  : 0
Other           : 0
Total           : 121

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : fd09adca15b119833cb5c676fa4d5d38
Directories     : 300f0000332f08
Symbolic links  : 0
Other           : 0
Total           : c18c64bcefcf6bb3


server2:
========

# ./areequal-checksum 
----------------------- Subvolume: 1----------------------

Entry counts
Regular files   : 83
Directories     : 25
Symbolic links  : 0
Other           : 0
Total           : 108

Metadata checksums
Regular files   : 48a1e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 99c291b322a39148acb823bf0869c70c
Directories     : 656c6e666f7b7c48
Symbolic links  : 0
Other           : 0
Total           : 5016dc6a45b12a0c
----------------------- Subvolume: 2----------------------

Entry counts
Regular files   : 90
Directories     : 25
Symbolic links  : 0
Other           : 0
Total           : 115

Metadata checksums
Regular files   : 20c85
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : aa52df0f255be4c49f2532c9721b7dfe
Directories     : 6c595f710e040667
Symbolic links  : 0
Other           : 0
Total           : 592eb2b759449f5d
----------------------- Subvolume: 3----------------------

Entry counts
Regular files   : 96
Directories     : 25
Symbolic links  : 0
Other           : 0
Total           : 121

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 7172e858de313ff41f7951c0bc5dbf80
Directories     : 300f0000332f08
Symbolic links  : 0
Other           : 0
Total           : 6e3bb698625faf7c


server3:
========

# ./areequal-checksum 
----------------------- Subvolume: 4----------------------

Entry counts
Regular files   : 72
Directories     : 25
Symbolic links  : 0
Other           : 0
Total           : 97

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 8079246448e1d641dd27b85a7e692906
Directories     : 9350f39041e3c41
Symbolic links  : 0
Other           : 0
Total           : 546b93073296c306
----------------------- Subvolume: 5----------------------

Entry counts
Regular files   : 64
Directories     : 25
Symbolic links  : 0
Other           : 0
Total           : 89

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 6bf687f74378e1fff9d18f58d09a1a34
Directories     : 300800300f2808
Symbolic links  : 0
Other           : 0
Total           : 921700afa3edd3c3
----------------------- Subvolume: 6----------------------

Entry counts
Regular files   : 53
Directories     : 25
Symbolic links  : 0
Other           : 0
Total           : 78

Metadata checksums
Regular files   : 486e85
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 251640e519d05bb12bdee0022eccd4f8
Directories     : 392e555d4a6e
Symbolic links  : 0
Other           : 0
Total           : ec899c96241c527


server4:
========

# ./areequal-checksum 
----------------------- Subvolume: 4----------------------

Entry counts
Regular files   : 72
Directories     : 25
Symbolic links  : 0
Other           : 0
Total           : 97

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : d72256a1780b2b067eb74ce5f63f09c0
Directories     : 9350f39041e3c41
Symbolic links  : 0
Other           : 0
Total           : a0a0157d8a2a1e87
----------------------- Subvolume: 5----------------------

Entry counts
Regular files   : 64
Directories     : 25
Symbolic links  : 0
Other           : 0
Total           : 89

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : e78dc26588f8c788da1d18ee968af88c
Directories     : 300800300f2808
Symbolic links  : 0
Other           : 0
Total           : 3da0d28b2e7d170c
----------------------- Subvolume: 6----------------------

Entry counts
Regular files   : 53
Directories     : 25
Symbolic links  : 0
Other           : 0
Total           : 78

Metadata checksums
Regular files   : 486e85
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 724d3220293aa6f6884e14bda69af43e
Directories     : 392e555d4a6e
Symbolic links  : 0
Other           : 0
Total           : fa031fb3dafd18a6

Expected results:
==================

arequal should match and hard linked file should successfully heal.
Comment 6 spandura 2013-07-25 07:18:54 EDT
Verified the fix on the build:
==============================
root@rhs-client11 [Jul-25-2013-16:45:16] >rpm -qa | grep glusterfs-server
glusterfs-server-3.4.0.12rhs.beta6-1.el6rhs.x86_64

root@rhs-client11 [Jul-25-2013-16:45:22] >gluster --version
glusterfs 3.4.0.12rhs.beta6 built on Jul 23 2013 16:20:03

Bug is fixed.
Comment 7 Scott Haines 2013-09-23 18:29:52 EDT
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.