Bug 1294597 - data self-heal happened from sink brick to source brick
data self-heal happened from sink brick to source brick
Status: CLOSED WONTFIX
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate (Show other bugs)
3.1
Unspecified Unspecified
medium Severity urgent
: ---
: ---
Assigned To: Ravishankar N
storage-qa-internal@redhat.com
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-29 01:48 EST by spandura
Modified: 2018-04-16 14:04 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
shell script to create/modify/truncate files (65.26 KB, application/x-shellscript)
2015-12-29 04:02 EST, spandura
no flags Details
Distaf log. (507.17 KB, text/plain)
2015-12-31 04:00 EST, Ravishankar N
no flags Details

  None (edit)
Description spandura 2015-12-29 01:48:14 EST
Description of problem:
========================
On a 2x3 dis-rep cold tier and 2x3 dis-rep hot tiered volume, modified files (truncate all the files to size 0) while the bricks were brought offline. The bricks were brought offline. Following are some of the observations made. 

1) AFR extended attributes were not marked on the source brick to indicate write failures on the bricks which were brought offline

2) Since the extended attributes were not marked, self-heal daemon didn't perform the self heal on the files

3) heal info showed all the entries as '0'. 

4) calculated arequal-checksum from mount. This actually healed the files from sink to source. i.e the sink brick had higher file size where as source had all 0 sized files and the afr extended attributes were also not marked to indicate source and sink bricks on the files.

5) all the files were on hot-tier 

Version-Release number of selected component (if applicable):
===========================================================
glusterfs-3.7.5-13.el6rhs.x86_64

How reproducible:
==================
1/3 

Steps to Reproduce:
===================
1. create a 2x3 dis-rep cold and hot tiered volume. start the volume. Create fuse mount. 

2. Create files on mount point. 

3. Repeat the following steps for the operations: (modify the files, truncate the file to size 0)

a. starting the operation on the mount
b. bring down certain bricks from each subvolume
c. after the modification of files are complete calculate arequal-checksum
d. bring back the bricks
e. wait for self-heal to complete
f. once self-heal is complete, calculate arequal-checksum
g. compare checksums calculated at (c) and (f). they should be same

Actual results:
===============
The checksums mismatched with truncate operation. 

Expected results:
===================
checksums should match

Additional info:
====================
rhsauto021.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier2/ : Online brick
rhsauto020.lab.eng.blr.redhat.com:/bricks/brick2/testvol_tier1/ : Offline brick
rhsauto019.lab.eng.blr.redhat.com:/bricks/brick2/testvol_tier0/ : Online brick

###############################################################################
Extended attributes of files when the brick was down and truncate succeeded on other 2 bricks
###############################################################################

2015-12-28 17:57:40,846 INFO get_number_of_entries_in_brick Extended attributes of all the files/dirs
2015-12-28 17:57:40,847 INFO run Executing getfattr -d -e hex -m . /bricks/brick1/testvol_tier2/* on rhsauto021.lab.eng.blr.redhat.com
2015-12-28 17:57:40,867 INFO run "getfattr -d -e hex -m . /bricks/brick1/testvol_tier2/*" on rhsauto021.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:57:40,867 INFO run "getfattr -d -e hex -m . /bricks/brick1/testvol_tier2/*" on rhsauto021.lab.eng.blr.redhat.com: STDOUT is 
 # file: bricks/brick1/testvol_tier2/D_file_10
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x03000000000000005681299d000d3c76
trusted.gfid=0xb9d22d7c03364bb18ca00b65fe90823a

# file: bricks/brick1/testvol_tier2/D_file_2
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x03000000000000005681299d000d3c76
trusted.gfid=0xb7a25ccb4234485699458536e2a36151

# file: bricks/brick1/testvol_tier2/D_file_4
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x03000000000000005681299d000d3c76
trusted.gfid=0x1f4860de1ed041c6bfe0ebe9814474ef

# file: bricks/brick1/testvol_tier2/D_file_5
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x03000000000000005681299d000d3c76
trusted.gfid=0xbe8e0a4886e24d1bb6ea94a17b71ea51

# file: bricks/brick1/testvol_tier2/D_file_7
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x03000000000000005681299d000d3c76
trusted.gfid=0x748fe3c075544b8793c99d80d2b83052

# file: bricks/brick1/testvol_tier2/D_file_9
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x03000000000000005681299d000d3c76
trusted.gfid=0x4864b70bf9e24c138b2b97147597351e

# file: bricks/brick1/testvol_tier2/file_dir_ops.sh
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00067eca
trusted.gfid=0x44b51c0fa0c142c9b048200d17c821d7


2015-12-28 17:57:40,867 ERROR run "getfattr -d -e hex -m . /bricks/brick1/testvol_tier2/*" on rhsauto021.lab.eng.blr.redhat.com: STDERR is 
 getfattr: Removing leading '/' from absolute path names

2015-12-28 17:57:40,868 INFO run Executing ls -l /bricks/brick1/testvol_tier2/* on rhsauto021.lab.eng.blr.redhat.com
2015-12-28 17:57:40,890 INFO run "ls -l /bricks/brick1/testvol_tier2/*" on rhsauto021.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:57:40,890 INFO run "ls -l /bricks/brick1/testvol_tier2/*" on rhsauto021.lab.eng.blr.redhat.com: STDOUT is 
 -rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick1/testvol_tier2/D_file_10
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick1/testvol_tier2/D_file_2
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick1/testvol_tier2/D_file_4
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick1/testvol_tier2/D_file_5
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick1/testvol_tier2/D_file_7
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick1/testvol_tier2/D_file_9
-rwxr-xr-x. 2 root root 66826 Dec 28 12:17 /bricks/brick1/testvol_tier2/file_dir_ops.sh

2015-12-28 17:57:40,890 INFO run Executing find /bricks/brick2/testvol_tier1 -mindepth 1 | grep -ve '.glusterfs\|.trashcan' | wc -l on rhsauto020.lab.eng.blr.redhat.com
2015-12-28 17:57:40,926 INFO run "find /bricks/brick2/testvol_tier1 -mindepth 1 | grep -ve '.glusterfs\|.trashcan' | wc -l" on rhsauto020.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:57:40,926 INFO get_number_of_entries_in_brick Number of entries on rhsauto020.lab.eng.blr.redhat.com:/bricks/brick2/testvol_tier1: 7

2015-12-28 17:57:40,926 INFO get_number_of_entries_in_brick Extended attributes of all the files/dirs
2015-12-28 17:57:40,927 INFO run Executing getfattr -d -e hex -m . /bricks/brick2/testvol_tier1/* on rhsauto020.lab.eng.blr.redhat.com
2015-12-28 17:57:40,948 INFO run "getfattr -d -e hex -m . /bricks/brick2/testvol_tier1/*" on rhsauto020.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:57:40,948 INFO run "getfattr -d -e hex -m . /bricks/brick2/testvol_tier1/*" on rhsauto020.lab.eng.blr.redhat.com: STDOUT is 
 # file: bricks/brick2/testvol_tier1/D_file_10
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00065050
trusted.gfid=0xb9d22d7c03364bb18ca00b65fe90823a

# file: bricks/brick2/testvol_tier1/D_file_2
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00065050
trusted.gfid=0xb7a25ccb4234485699458536e2a36151

# file: bricks/brick2/testvol_tier1/D_file_4
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00065050
trusted.gfid=0x1f4860de1ed041c6bfe0ebe9814474ef

# file: bricks/brick2/testvol_tier1/D_file_5
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00065050
trusted.gfid=0xbe8e0a4886e24d1bb6ea94a17b71ea51

# file: bricks/brick2/testvol_tier1/D_file_7
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00065050
trusted.gfid=0x748fe3c075544b8793c99d80d2b83052

# file: bricks/brick2/testvol_tier1/D_file_9
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00065050
trusted.gfid=0x4864b70bf9e24c138b2b97147597351e

# file: bricks/brick2/testvol_tier1/file_dir_ops.sh
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00065050
trusted.gfid=0x44b51c0fa0c142c9b048200d17c821d7


2015-12-28 17:57:40,949 ERROR run "getfattr -d -e hex -m . /bricks/brick2/testvol_tier1/*" on rhsauto020.lab.eng.blr.redhat.com: STDERR is 
 getfattr: Removing leading '/' from absolute path names

2015-12-28 17:57:40,949 INFO run Executing ls -l /bricks/brick2/testvol_tier1/* on rhsauto020.lab.eng.blr.redhat.com
2015-12-28 17:57:40,972 INFO run "ls -l /bricks/brick2/testvol_tier1/*" on rhsauto020.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:57:40,972 INFO run "ls -l /bricks/brick2/testvol_tier1/*" on rhsauto020.lab.eng.blr.redhat.com: STDOUT is 
 -rw-r--r--. 2 root root  5120 Dec 28 12:17 /bricks/brick2/testvol_tier1/D_file_10
-rw-r--r--. 2 root root  1024 Dec 28 12:17 /bricks/brick2/testvol_tier1/D_file_2
-rw-r--r--. 2 root root  2048 Dec 28 12:17 /bricks/brick2/testvol_tier1/D_file_4
-rw-r--r--. 2 root root  2560 Dec 28 12:17 /bricks/brick2/testvol_tier1/D_file_5
-rw-r--r--. 2 root root  3584 Dec 28 12:17 /bricks/brick2/testvol_tier1/D_file_7
-rw-r--r--. 2 root root  4608 Dec 28 12:17 /bricks/brick2/testvol_tier1/D_file_9
-rwxr-xr-x. 2 root root 66826 Dec 28 12:17 /bricks/brick2/testvol_tier1/file_dir_ops.sh

2015-12-28 17:57:40,972 INFO run Executing find /bricks/brick2/testvol_tier0 -mindepth 1 | grep -ve '.glusterfs\|.trashcan' | wc -l on rhsauto019.lab.eng.blr.redhat.com
2015-12-28 17:57:40,995 INFO run "find /bricks/brick2/testvol_tier0 -mindepth 1 | grep -ve '.glusterfs\|.trashcan' | wc -l" on rhsauto019.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:57:40,995 INFO get_number_of_entries_in_brick Number of entries on rhsauto019.lab.eng.blr.redhat.com:/bricks/brick2/testvol_tier0: 7

2015-12-28 17:57:40,996 INFO get_number_of_entries_in_brick Extended attributes of all the files/dirs
2015-12-28 17:57:40,996 INFO run Executing getfattr -d -e hex -m . /bricks/brick2/testvol_tier0/* on rhsauto019.lab.eng.blr.redhat.com
2015-12-28 17:57:41,015 INFO run "getfattr -d -e hex -m . /bricks/brick2/testvol_tier0/*" on rhsauto019.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:57:41,015 INFO run "getfattr -d -e hex -m . /bricks/brick2/testvol_tier0/*" on rhsauto019.lab.eng.blr.redhat.com: STDOUT is 
 # file: bricks/brick2/testvol_tier0/D_file_10
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283a000d56ea
trusted.gfid=0xb9d22d7c03364bb18ca00b65fe90823a

# file: bricks/brick2/testvol_tier0/D_file_2
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283a000d56ea
trusted.gfid=0xb7a25ccb4234485699458536e2a36151

# file: bricks/brick2/testvol_tier0/D_file_4
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283a000d56ea
trusted.gfid=0x1f4860de1ed041c6bfe0ebe9814474ef

# file: bricks/brick2/testvol_tier0/D_file_5
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283a000d56ea
trusted.gfid=0xbe8e0a4886e24d1bb6ea94a17b71ea51

# file: bricks/brick2/testvol_tier0/D_file_7
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283a000d56ea
trusted.gfid=0x748fe3c075544b8793c99d80d2b83052

# file: bricks/brick2/testvol_tier0/D_file_9
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283a000d56ea
trusted.gfid=0x4864b70bf9e24c138b2b97147597351e

# file: bricks/brick2/testvol_tier0/file_dir_ops.sh
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283a000d56ea
trusted.gfid=0x44b51c0fa0c142c9b048200d17c821d7


2015-12-28 17:57:41,016 ERROR run "getfattr -d -e hex -m . /bricks/brick2/testvol_tier0/*" on rhsauto019.lab.eng.blr.redhat.com: STDERR is 
 getfattr: Removing leading '/' from absolute path names

2015-12-28 17:57:41,016 INFO run Executing ls -l /bricks/brick2/testvol_tier0/* on rhsauto019.lab.eng.blr.redhat.com
2015-12-28 17:57:41,036 INFO run "ls -l /bricks/brick2/testvol_tier0/*" on rhsauto019.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:57:41,037 INFO run "ls -l /bricks/brick2/testvol_tier0/*" on rhsauto019.lab.eng.blr.redhat.com: STDOUT is 
 -rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick2/testvol_tier0/D_file_10
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick2/testvol_tier0/D_file_2
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick2/testvol_tier0/D_file_4
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick2/testvol_tier0/D_file_5
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick2/testvol_tier0/D_file_7
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick2/testvol_tier0/D_file_9
-rwxr-xr-x. 2 root root 66826 Dec 28 12:17 /bricks/brick2/testvol_tier0/file_dir_ops.sh


###############################################################################
Extended attributes of files when the brick came online and self-heal complete 
and before arequal-checksum
###############################################################################
2015-12-28 17:58:33,323 INFO get_number_of_entries_in_brick Extended attributes of all the files/dirs
2015-12-28 17:58:33,324 INFO run Executing getfattr -d -e hex -m . /bricks/brick1/testvol_tier2/* on rhsauto021.lab.eng.blr.redhat.com
2015-12-28 17:58:33,342 INFO run "getfattr -d -e hex -m . /bricks/brick1/testvol_tier2/*" on rhsauto021.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:58:33,343 INFO run "getfattr -d -e hex -m . /bricks/brick1/testvol_tier2/*" on rhsauto021.lab.eng.blr.redhat.com: STDOUT is 
 # file: bricks/brick1/testvol_tier2/D_file_10
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x03000000000000005681299d000d3c76
trusted.gfid=0xb9d22d7c03364bb18ca00b65fe90823a

# file: bricks/brick1/testvol_tier2/D_file_2
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x03000000000000005681299d000d3c76
trusted.gfid=0xb7a25ccb4234485699458536e2a36151

# file: bricks/brick1/testvol_tier2/D_file_4
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x03000000000000005681299d000d3c76
trusted.gfid=0x1f4860de1ed041c6bfe0ebe9814474ef

# file: bricks/brick1/testvol_tier2/D_file_5
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x03000000000000005681299d000d3c76
trusted.gfid=0xbe8e0a4886e24d1bb6ea94a17b71ea51

# file: bricks/brick1/testvol_tier2/D_file_7
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x03000000000000005681299d000d3c76
trusted.gfid=0x748fe3c075544b8793c99d80d2b83052

# file: bricks/brick1/testvol_tier2/D_file_9
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x03000000000000005681299d000d3c76
trusted.gfid=0x4864b70bf9e24c138b2b97147597351e

# file: bricks/brick1/testvol_tier2/file_dir_ops.sh
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00067eca
trusted.gfid=0x44b51c0fa0c142c9b048200d17c821d7


2015-12-28 17:58:33,343 ERROR run "getfattr -d -e hex -m . /bricks/brick1/testvol_tier2/*" on rhsauto021.lab.eng.blr.redhat.com: STDERR is 
 getfattr: Removing leading '/' from absolute path names

2015-12-28 17:58:33,343 INFO run Executing ls -l /bricks/brick1/testvol_tier2/* on rhsauto021.lab.eng.blr.redhat.com
2015-12-28 17:58:33,365 INFO run "ls -l /bricks/brick1/testvol_tier2/*" on rhsauto021.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:58:33,365 INFO run "ls -l /bricks/brick1/testvol_tier2/*" on rhsauto021.lab.eng.blr.redhat.com: STDOUT is 
 -rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick1/testvol_tier2/D_file_10
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick1/testvol_tier2/D_file_2
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick1/testvol_tier2/D_file_4
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick1/testvol_tier2/D_file_5
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick1/testvol_tier2/D_file_7
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick1/testvol_tier2/D_file_9
-rwxr-xr-x. 2 root root 66826 Dec 28 12:17 /bricks/brick1/testvol_tier2/file_dir_ops.sh

2015-12-28 17:58:33,365 INFO run Executing find /bricks/brick2/testvol_tier1 -mindepth 1 | grep -ve '.glusterfs\|.trashcan' | wc -l on rhsauto020.lab.eng.blr.redhat.com
2015-12-28 17:58:33,391 INFO run "find /bricks/brick2/testvol_tier1 -mindepth 1 | grep -ve '.glusterfs\|.trashcan' | wc -l" on rhsauto020.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:58:33,391 INFO get_number_of_entries_in_brick Number of entries on rhsauto020.lab.eng.blr.redhat.com:/bricks/brick2/testvol_tier1: 7

2015-12-28 17:58:33,391 INFO get_number_of_entries_in_brick Extended attributes of all the files/dirs
2015-12-28 17:58:33,392 INFO run Executing getfattr -d -e hex -m . /bricks/brick2/testvol_tier1/* on rhsauto020.lab.eng.blr.redhat.com
2015-12-28 17:58:33,413 INFO run "getfattr -d -e hex -m . /bricks/brick2/testvol_tier1/*" on rhsauto020.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:58:33,414 INFO run "getfattr -d -e hex -m . /bricks/brick2/testvol_tier1/*" on rhsauto020.lab.eng.blr.redhat.com: STDOUT is 
 # file: bricks/brick2/testvol_tier1/D_file_10
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00065050
trusted.gfid=0xb9d22d7c03364bb18ca00b65fe90823a

# file: bricks/brick2/testvol_tier1/D_file_2
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00065050
trusted.gfid=0xb7a25ccb4234485699458536e2a36151

# file: bricks/brick2/testvol_tier1/D_file_4
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00065050
trusted.gfid=0x1f4860de1ed041c6bfe0ebe9814474ef

# file: bricks/brick2/testvol_tier1/D_file_5
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00065050
trusted.gfid=0xbe8e0a4886e24d1bb6ea94a17b71ea51

# file: bricks/brick2/testvol_tier1/D_file_7
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00065050
trusted.gfid=0x748fe3c075544b8793c99d80d2b83052

# file: bricks/brick2/testvol_tier1/D_file_9
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00065050
trusted.gfid=0x4864b70bf9e24c138b2b97147597351e

# file: bricks/brick2/testvol_tier1/file_dir_ops.sh
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283d00065050
trusted.gfid=0x44b51c0fa0c142c9b048200d17c821d7


2015-12-28 17:58:33,414 ERROR run "getfattr -d -e hex -m . /bricks/brick2/testvol_tier1/*" on rhsauto020.lab.eng.blr.redhat.com: STDERR is 
 getfattr: Removing leading '/' from absolute path names

2015-12-28 17:58:33,414 INFO run Executing ls -l /bricks/brick2/testvol_tier1/* on rhsauto020.lab.eng.blr.redhat.com
2015-12-28 17:58:33,435 INFO run "ls -l /bricks/brick2/testvol_tier1/*" on rhsauto020.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:58:33,435 INFO run "ls -l /bricks/brick2/testvol_tier1/*" on rhsauto020.lab.eng.blr.redhat.com: STDOUT is 
 -rw-r--r--. 2 root root  5120 Dec 28 12:17 /bricks/brick2/testvol_tier1/D_file_10
-rw-r--r--. 2 root root  1024 Dec 28 12:17 /bricks/brick2/testvol_tier1/D_file_2
-rw-r--r--. 2 root root  2048 Dec 28 12:17 /bricks/brick2/testvol_tier1/D_file_4
-rw-r--r--. 2 root root  2560 Dec 28 12:17 /bricks/brick2/testvol_tier1/D_file_5
-rw-r--r--. 2 root root  3584 Dec 28 12:17 /bricks/brick2/testvol_tier1/D_file_7
-rw-r--r--. 2 root root  4608 Dec 28 12:17 /bricks/brick2/testvol_tier1/D_file_9
-rwxr-xr-x. 2 root root 66826 Dec 28 12:17 /bricks/brick2/testvol_tier1/file_dir_ops.sh

2015-12-28 17:58:33,436 INFO run Executing find /bricks/brick2/testvol_tier0 -mindepth 1 | grep -ve '.glusterfs\|.trashcan' | wc -l on rhsauto019.lab.eng.blr.redhat.com
2015-12-28 17:58:33,459 INFO run "find /bricks/brick2/testvol_tier0 -mindepth 1 | grep -ve '.glusterfs\|.trashcan' | wc -l" on rhsauto019.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:58:33,459 INFO get_number_of_entries_in_brick Number of entries on rhsauto019.lab.eng.blr.redhat.com:/bricks/brick2/testvol_tier0: 7

2015-12-28 17:58:33,460 INFO get_number_of_entries_in_brick Extended attributes of all the files/dirs
2015-12-28 17:58:33,460 INFO run Executing getfattr -d -e hex -m . /bricks/brick2/testvol_tier0/* on rhsauto019.lab.eng.blr.redhat.com
2015-12-28 17:58:33,481 INFO run "getfattr -d -e hex -m . /bricks/brick2/testvol_tier0/*" on rhsauto019.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:58:33,481 INFO run "getfattr -d -e hex -m . /bricks/brick2/testvol_tier0/*" on rhsauto019.lab.eng.blr.redhat.com: STDOUT is 
 # file: bricks/brick2/testvol_tier0/D_file_10
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283a000d56ea
trusted.gfid=0xb9d22d7c03364bb18ca00b65fe90823a

# file: bricks/brick2/testvol_tier0/D_file_2
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283a000d56ea
trusted.gfid=0xb7a25ccb4234485699458536e2a36151

# file: bricks/brick2/testvol_tier0/D_file_4
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283a000d56ea
trusted.gfid=0x1f4860de1ed041c6bfe0ebe9814474ef

# file: bricks/brick2/testvol_tier0/D_file_5
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283a000d56ea
trusted.gfid=0xbe8e0a4886e24d1bb6ea94a17b71ea51

# file: bricks/brick2/testvol_tier0/D_file_7
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283a000d56ea
trusted.gfid=0x748fe3c075544b8793c99d80d2b83052

# file: bricks/brick2/testvol_tier0/D_file_9
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283a000d56ea
trusted.gfid=0x4864b70bf9e24c138b2b97147597351e

# file: bricks/brick2/testvol_tier0/file_dir_ops.sh
security.selinux=0x73797374656d5f753a6f626a6563745f723a676c7573746572645f627269636b5f743a733000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x02000000000000005681283a000d56ea
trusted.gfid=0x44b51c0fa0c142c9b048200d17c821d7


2015-12-28 17:58:33,481 ERROR run "getfattr -d -e hex -m . /bricks/brick2/testvol_tier0/*" on rhsauto019.lab.eng.blr.redhat.com: STDERR is 
 getfattr: Removing leading '/' from absolute path names

2015-12-28 17:58:33,481 INFO run Executing ls -l /bricks/brick2/testvol_tier0/* on rhsauto019.lab.eng.blr.redhat.com
2015-12-28 17:58:33,504 INFO run "ls -l /bricks/brick2/testvol_tier0/*" on rhsauto019.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-28 17:58:33,504 INFO run "ls -l /bricks/brick2/testvol_tier0/*" on rhsauto019.lab.eng.blr.redhat.com: STDOUT is 
 -rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick2/testvol_tier0/D_file_10
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick2/testvol_tier0/D_file_2
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick2/testvol_tier0/D_file_4
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick2/testvol_tier0/D_file_5
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick2/testvol_tier0/D_file_7
-rw-r--r--. 2 root root     0 Dec 28 12:23 /bricks/brick2/testvol_tier0/D_file_9
-rwxr-xr-x. 2 root root 66826 Dec 28 12:17 /bricks/brick2/testvol_tier0/file_dir_ops.sh
Comment 4 spandura 2015-12-29 04:02 EST
Created attachment 1110131 [details]
shell script to create/modify/truncate files

Command Example: file_dir_ops.sh data_ops <mountpoint> truncate1 10 0M
Comment 8 Ravishankar N 2015-12-31 04:00 EST
Created attachment 1110720 [details]
Distaf log.

After looking at the distaf logs with Pranith, it was found that the xfs godown method used to kill the brick mounts is taking a long time for killing the bricks of the replica. (One minute interval between successive godowns). Since the modification/truncate tests are run asynchronously, they could have completed before all bricks were down. Also, since the godown method kills xfs, the truncates might not have been synced to the disk, which is why when it comes up, the file size is what it was before truncate. In AFR, if there are no pending xattrs, the bigger file is selected as source and the heal happens when file is accessed from the mount. This explains the arequal checksum mismatch, lack of pending xattrs on the files and the lack of heals by the self-heal demon.

After modifying the test to use  BRICK_TAKEDOWN_METHOD="service_kill" instead of xfs godown, the test passed in all 3 runs.

Since this is expected behaviour, it is not a blocker bug per se. But we could provide a fix for 3.1.3 by doing an fsync after truncate in afr transaction. If the fsync fails, then post-op will have pending xattrs for the killed brick and heal will happen in the right direction.
Comment 9 Ravishankar N 2016-02-24 04:58:41 EST
Moving this tp 3.1.4 after triaging.

Note You need to log in before you can comment on or make changes to this bug.