Bug 1294632

Summary: Missing entries after self-heal completion
Product: Red Hat Gluster Storage Reporter: spandura
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED WONTFIX QA Contact: storage-qa-internal <storage-qa-internal>
Severity: urgent Docs Contact:
Priority: low    
Version: rhgs-3.1CC: mchangir, mzywusko, ravishankar, rcyriac, rhs-bugs, sankarshan, smohan, spandura
Target Milestone: ---Keywords: ZStream
Target Release: ---Flags: atalur: needinfo? (spandura)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-04-16 18:05:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description spandura 2015-12-29 11:21:35 UTC
Description of problem:
==========================
On a 2x3 cold and hot tiered volume, brought down 1 brick from each subvolume. copied files to mount. Brought back bricks online. Self-heal got completed on hot and cold tier. Brought down other brick from each sub volume and tried to get the copied files. There were few copied files missing. 

Observation:
===========
For the files that are missing, the data file exist on the hot-tier but one of the brick in the cold-tier subvolume doesn't have 'link-to' file. 

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.5-13.el7rhgs.x86_64

How reproducible:
=================
Tired once

Steps to Reproduce:
====================
1. create a 2x3 dis-rep cold and hot tiered volume. start the volume. Create fuse mount. 

2. Repeat the following steps for the operations: (create file/dirs, copy files/dirs from files created)

a. starting the operation on the mount
b. bring down certain bricks from each subvolume
c. after the operation is complete calculate arequal-checksum
d. bring back the bricks
e. wait for self-heal to complete
f. once self-heal is complete, calculate arequal-checksum
g. compare checksums calculated at (c) and (f). they should be same
h. bring down bricks from each subvolume
i. calculate arequal-checksum
j. compare checksums calculated at (f) and (j). they should be same.

Actual results:
================
Arequal checksum mismatched.


015-12-29 15:49:37,797 INFO compare_arequal_checksum_mount Arequal-Checksum on mount cutlass.lab.eng.blr.redhat.com:/mnt/glusterfs : 'after-self-heal-copy' is: 

Entry counts
Regular files   : 1057
Directories     : 69
Symbolic links  : 0
Other           : 0
Total           : 1126

Metadata checksums
Regular files   : 48974c
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 1cbd48496b3a49731c6b8c8b25536b98
Directories     : 4b071170126a5d7f
Symbolic links  : 0
Other           : 0
Total           : 4bd1d5b25c037f94

2015-12-29 15:49:37,797 INFO compare_arequal_checksum_mount Arequal-Checksum on mount cutlass.lab.eng.blr.redhat.com:/mnt/glusterfs : 'before-next-op-rename' is: 

Entry counts
Regular files   : 1052
Directories     : 69
Symbolic links  : 0
Other           : 0
Total           : 1121

Metadata checksums
Regular files   : 2cb0
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : e3f89df229df6151b02a6fadfa0895fa
Directories     : 2858412f24757255
Symbolic links  : 0
Other           : 0
Total           : 7b8ab370f7a286fe

2015-12-29 15:49:37,797 ERROR compare_arequal_checksum_mount Checksums on mount cutlass.lab.eng.blr.redhat.com:/mnt/glusterfs of 'after-self-heal-copy' and 'before-next-op-rename' doesn't match
2015-12-29 15:49:37,797 INFO run Executing find /mnt/glusterfs | uniq -d on cutlass.lab.eng.blr.redhat.com
2015-12-29 15:49:40,095 INFO run "find /mnt/glusterfs | uniq -d" on cutlass.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-29 15:49:40,095 INFO get_duplicate_entries_from_mount No Duplicate Entries found under cutlass.lab.eng.blr.redhat.com:/mnt/glusterfs
2015-12-29 15:49:40,095 ERROR get_missing_entries_from_mount Missing entries from mount when comparing entries 'after-self-heal-copy' and entries 'before-next-op-rename': 
/mnt/glusterfs/E_file_copy_32 /mnt/glusterfs/E_file_copy_33 /mnt/glusterfs/E_file_copy_30 /mnt/glusterfs/E_file_copy_31 /mnt/glusterfs/E_file_copy_35

Expected results:
===================
arequal-checksums should match

Additional info:
==================
 
Volume Name: testvol
Type: Tier
Volume ID: 50b291c4-68ec-4b40-8ca3-cd2a1524f43f
Status: Started
Number of Bricks: 12
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 3 = 6
Brick1: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick3/testvol_tier5
Brick2: rhsauto019.lab.eng.blr.redhat.com:/bricks/brick3/testvol_tier4
Brick3: rhsauto022.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier3
Brick4: rhsauto021.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier2
Brick5: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick2/testvol_tier1
Brick6: rhsauto019.lab.eng.blr.redhat.com:/bricks/brick2/testvol_tier0
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 3 = 6
Brick7: rhsauto019.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick0
Brick8: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick1
Brick9: rhsauto021.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick2
Brick10: rhsauto022.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick3
Brick11: rhsauto019.lab.eng.blr.redhat.com:/bricks/brick1/testvol_brick4
Brick12: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick1/testvol_brick5
Options Reconfigured:
performance.readdir-ahead: on
features.ctr-enabled: on
cluster.tier-mode: cache
cluster.watermark-low: 75
cluster.watermark-hi: 90


[root@rhsauto019:/etc/yum.repos.d] Dec-29-2015 10:34:30 $ls -l /bricks/brick*/testvol_*/E_file_copy_33
-rw-r--r--. 2 root root 33792 Dec 29 10:08 /bricks/brick3/testvol_tier4/E_file_copy_33
[root@rhsauto019:/etc/yum.repos.d] Dec-29-2015 10:34:32 $


[root@rhsauto020:/etc/yum.repos.d] Dec-29-2015 10:34:30 $ls -l /bricks/brick*/testvol_*/E_file_copy_33
---------T. 2 root root     0 Dec 29 10:08 /bricks/brick0/testvol_brick1/E_file_copy_33
-rw-r--r--. 2 root root 33792 Dec 29 10:08 /bricks/brick3/testvol_tier5/E_file_copy_33
[root@rhsauto020:/etc/yum.repos.d] Dec-29-2015 10:34:32 $

[root@rhsauto021:/etc/yum.repos.d] Dec-29-2015 10:34:30 $ls -l /bricks/brick*/testvol_*/E_file_copy_33
---------T. 2 root root 0 Dec 29 10:08 /bricks/brick0/testvol_brick2/E_file_copy_33
[root@rhsauto021:/etc/yum.repos.d] Dec-29-2015 10:34:32 $

[root@rhsauto022:/etc/yum.repos.d] Dec-29-2015 10:34:30 $ls -l /bricks/brick*/testvol_*/E_file_copy_33
-rw-r--r--. 2 root root 33792 Dec 29 10:08 /bricks/brick1/testvol_tier3/E_file_copy_33
[root@rhsauto022:/etc/yum.repos.d] Dec-29-2015 10:34:32 $

Comment 3 Anuradha 2015-12-30 13:27:44 UTC
Shwetha, sos-reports don't have client logs in them. Could you please provide the client logs?

Comment 7 Pranith Kumar K 2016-01-18 10:47:10 UTC
*** Bug 1294732 has been marked as a duplicate of this bug. ***

Comment 10 Ravishankar N 2017-08-29 10:55:46 UTC
Related bug: BZ 1294597