Bug 1294732

Summary: arequal-checksum mismatch between before and after successful heal (self-heal of renamed files)
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: spandura
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED DUPLICATE QA Contact: storage-qa-internal <storage-qa-internal>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: rhgs-3.1CC: rhs-bugs, storage-qa-internal
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-01-18 10:47:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description spandura 2015-12-30 03:44:52 UTC
Description of problem:
==========================
On a 2x3 dis-rep volume, renamed few files from mount. While rename is in progress  1 brick both the subvolumes went offline. Once the rename is complete, the bricks are brought online. self-heal indicated the self-heal is completed. But when tried to get the renamed files, they were missing from mountpoint. 

Observation:
=============
The directory E_dir_1 was renamed to E_dir_renamed_1. After the brick came online, the E_dir_1 is recreated under E_dir_renamed_1 with exactly same entries as E_dir_renamed_1. 

Version-Release number of selected component (if applicable):
===============================================================
glusterfs-3.7.5-13.el6rhs.x86_64

How reproducible:
===============
Often

Steps to Reproduce:
======================
1. create a 2x3 dis-rep cold and hot tiered volume. start the volume. Create fuse mount. 

2. Create files on mount point.

3.Repeat the following steps for the operations: (create file/dirs, copy files/dirs from files created, renamed few files and dirs)

a. starting the operation on the mount
b. bring down certain bricks from each subvolume
c. after the operation is complete calculate arequal-checksum
d. bring back the bricks
e. wait for self-heal to complete
f. once self-heal is complete, calculate arequal-checksum
g. compare checksums calculated at (c) and (f). they should be same
h. bring down bricks from each subvolume
i. calculate arequal-checksum
j. compare checksums calculated at (f) and (j). they should be same.

Actual results:
==================
With rename operation, the arequal-checksum mismated b/w (c) and (f).


2015-12-30 08:31:33,624 INFO compare_arequal_checksum_mount Arequal-Checksum on mount cutlass.lab.eng.blr.redhat.com:/mnt/glusterfs1 : 'after-op-before-self-heal-rename' is: 

Entry counts
Regular files   : 1597
Directories     : 105
Symbolic links  : 0
Other           : 0
Total           : 1702

Metadata checksums
Regular files   : 48974c
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 85e9d8cb4c9db47a45fbadc1302f1f0f
Directories     : 7a041170126a5d7f
Symbolic links  : 0
Other           : 0
Total           : ba16647a6ed8f60a

2015-12-30 08:31:33,624 INFO compare_arequal_checksum_mount Arequal-Checksum on mount cutlass.lab.eng.blr.redhat.com:/mnt/glusterfs1 : 'after-self-heal-rename' is: 

Entry counts
Regular files   : 1582
Directories     : 105
Symbolic links  : 0
Other           : 0
Total           : 1687

Metadata checksums
Regular files   : 2cb0
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : dabaae4e2af4bf861ed63a104b16459b
Directories     : 39041079166d425e
Symbolic links  : 0
Other           : 0
Total           : fd688427778fb843

2015-12-30 08:31:33,624 ERROR compare_arequal_checksum_mount Checksums on mount cutlass.lab.eng.blr.redhat.com:/mnt/glusterfs1 of 'after-op-before-self-heal-rename' and 'after-self-heal-rename' doesn't match
2015-12-30 08:31:33,624 INFO run Executing find /mnt/glusterfs1 | uniq -d on cutlass.lab.eng.blr.redhat.com
2015-12-30 08:31:34,681 INFO run "find /mnt/glusterfs1 | uniq -d" on cutlass.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-30 08:31:34,681 INFO get_duplicate_entries_from_mount No Duplicate Entries found under cutlass.lab.eng.blr.redhat.com:/mnt/glusterfs1
2015-12-30 08:31:34,682 ERROR get_missing_entries_from_mount Missing entries from mount when comparing entries 'after-op-before-self-heal-rename' and entries 'after-self-heal-rename': 
/mnt/glusterfs1/E_dir_renamed_1/E_dir_1/E_file_renamed_11 /mnt/glusterfs1/E_dir_renamed_1/E_dir_1/E_file_renamed_10 /mnt/glusterfs1/E_dir_renamed_1/E_dir_1/E_file_renamed_13 /mnt/glusterfs1/E_dir_renamed_1/E_dir_1/E_file_renamed_12 /mnt/glusterfs1/E_dir_renamed_1/E_dir_1/E_file_renamed_15 /mnt/glusterfs1/E_dir_renamed_1/E_dir_1/E_file_renamed_14 /mnt/glusterfs1/E_dir_renamed_1/E_dir_1/E_file_renamed_9 /mnt/glusterfs1/E_dir_renamed_1/E_dir_1/E_file_renamed_8 /mnt/glusterfs1/E_dir_renamed_1/E_dir_1/E_file_renamed_1 /mnt/glusterfs1/E_dir_renamed_1/E_dir_1/E_file_renamed_3 /mnt/glusterfs1/E_dir_renamed_1/E_dir_1/E_file_renamed_2 /mnt/glusterfs1/E_dir_renamed_1/E_dir_1/E_file_renamed_5 /mnt/glusterfs1/E_dir_renamed_1/E_dir_1/E_file_renamed_4 /mnt/glusterfs1/E_dir_renamed_1/E_dir_1/E_file_renamed_7 /mnt/glusterfs1/E_dir_renamed_1/E_dir_1/E_file_renamed_6

Expected results:
====================
arequal-checksum should be same after self-heal 

Additional info:
===================
[root@rhsauto019:~] Dec-30-2015 03:28:07 $gluster v info
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: 19b1e5ca-af1d-4361-8b4d-d13992fcc185
Status: Started
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: rhsauto019.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick0
Brick2: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick1
Brick3: rhsauto021.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick2
Brick4: rhsauto022.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick3
Brick5: rhsauto019.lab.eng.blr.redhat.com:/bricks/brick1/testvol_brick4
Brick6: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick1/testvol_brick5
Options Reconfigured:
performance.readdir-ahead: on