Bug 1294632 - Missing entries after self-heal completion
Summary: Missing entries after self-heal completion
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: replicate
Version: rhgs-3.1
Hardware: Unspecified
OS: Unspecified
low
urgent
Target Milestone: ---
: ---
Assignee: Pranith Kumar K
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
: 1294732 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-12-29 11:21 UTC by spandura
Modified: 2023-09-14 03:15 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-16 18:05:48 UTC
Embargoed:


Attachments (Terms of Use)

Description spandura 2015-12-29 11:21:35 UTC
Description of problem:
==========================
On a 2x3 cold and hot tiered volume, brought down 1 brick from each subvolume. copied files to mount. Brought back bricks online. Self-heal got completed on hot and cold tier. Brought down other brick from each sub volume and tried to get the copied files. There were few copied files missing. 

Observation:
===========
For the files that are missing, the data file exist on the hot-tier but one of the brick in the cold-tier subvolume doesn't have 'link-to' file. 

Version-Release number of selected component (if applicable):
=============================================================
glusterfs-3.7.5-13.el7rhgs.x86_64

How reproducible:
=================
Tired once

Steps to Reproduce:
====================
1. create a 2x3 dis-rep cold and hot tiered volume. start the volume. Create fuse mount. 

2. Repeat the following steps for the operations: (create file/dirs, copy files/dirs from files created)

a. starting the operation on the mount
b. bring down certain bricks from each subvolume
c. after the operation is complete calculate arequal-checksum
d. bring back the bricks
e. wait for self-heal to complete
f. once self-heal is complete, calculate arequal-checksum
g. compare checksums calculated at (c) and (f). they should be same
h. bring down bricks from each subvolume
i. calculate arequal-checksum
j. compare checksums calculated at (f) and (j). they should be same.

Actual results:
================
Arequal checksum mismatched.


015-12-29 15:49:37,797 INFO compare_arequal_checksum_mount Arequal-Checksum on mount cutlass.lab.eng.blr.redhat.com:/mnt/glusterfs : 'after-self-heal-copy' is: 

Entry counts
Regular files   : 1057
Directories     : 69
Symbolic links  : 0
Other           : 0
Total           : 1126

Metadata checksums
Regular files   : 48974c
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 1cbd48496b3a49731c6b8c8b25536b98
Directories     : 4b071170126a5d7f
Symbolic links  : 0
Other           : 0
Total           : 4bd1d5b25c037f94

2015-12-29 15:49:37,797 INFO compare_arequal_checksum_mount Arequal-Checksum on mount cutlass.lab.eng.blr.redhat.com:/mnt/glusterfs : 'before-next-op-rename' is: 

Entry counts
Regular files   : 1052
Directories     : 69
Symbolic links  : 0
Other           : 0
Total           : 1121

Metadata checksums
Regular files   : 2cb0
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : e3f89df229df6151b02a6fadfa0895fa
Directories     : 2858412f24757255
Symbolic links  : 0
Other           : 0
Total           : 7b8ab370f7a286fe

2015-12-29 15:49:37,797 ERROR compare_arequal_checksum_mount Checksums on mount cutlass.lab.eng.blr.redhat.com:/mnt/glusterfs of 'after-self-heal-copy' and 'before-next-op-rename' doesn't match
2015-12-29 15:49:37,797 INFO run Executing find /mnt/glusterfs | uniq -d on cutlass.lab.eng.blr.redhat.com
2015-12-29 15:49:40,095 INFO run "find /mnt/glusterfs | uniq -d" on cutlass.lab.eng.blr.redhat.com: RETCODE is 0
2015-12-29 15:49:40,095 INFO get_duplicate_entries_from_mount No Duplicate Entries found under cutlass.lab.eng.blr.redhat.com:/mnt/glusterfs
2015-12-29 15:49:40,095 ERROR get_missing_entries_from_mount Missing entries from mount when comparing entries 'after-self-heal-copy' and entries 'before-next-op-rename': 
/mnt/glusterfs/E_file_copy_32 /mnt/glusterfs/E_file_copy_33 /mnt/glusterfs/E_file_copy_30 /mnt/glusterfs/E_file_copy_31 /mnt/glusterfs/E_file_copy_35

Expected results:
===================
arequal-checksums should match

Additional info:
==================
 
Volume Name: testvol
Type: Tier
Volume ID: 50b291c4-68ec-4b40-8ca3-cd2a1524f43f
Status: Started
Number of Bricks: 12
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 3 = 6
Brick1: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick3/testvol_tier5
Brick2: rhsauto019.lab.eng.blr.redhat.com:/bricks/brick3/testvol_tier4
Brick3: rhsauto022.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier3
Brick4: rhsauto021.lab.eng.blr.redhat.com:/bricks/brick1/testvol_tier2
Brick5: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick2/testvol_tier1
Brick6: rhsauto019.lab.eng.blr.redhat.com:/bricks/brick2/testvol_tier0
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 2 x 3 = 6
Brick7: rhsauto019.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick0
Brick8: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick1
Brick9: rhsauto021.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick2
Brick10: rhsauto022.lab.eng.blr.redhat.com:/bricks/brick0/testvol_brick3
Brick11: rhsauto019.lab.eng.blr.redhat.com:/bricks/brick1/testvol_brick4
Brick12: rhsauto020.lab.eng.blr.redhat.com:/bricks/brick1/testvol_brick5
Options Reconfigured:
performance.readdir-ahead: on
features.ctr-enabled: on
cluster.tier-mode: cache
cluster.watermark-low: 75
cluster.watermark-hi: 90


[root@rhsauto019:/etc/yum.repos.d] Dec-29-2015 10:34:30 $ls -l /bricks/brick*/testvol_*/E_file_copy_33
-rw-r--r--. 2 root root 33792 Dec 29 10:08 /bricks/brick3/testvol_tier4/E_file_copy_33
[root@rhsauto019:/etc/yum.repos.d] Dec-29-2015 10:34:32 $


[root@rhsauto020:/etc/yum.repos.d] Dec-29-2015 10:34:30 $ls -l /bricks/brick*/testvol_*/E_file_copy_33
---------T. 2 root root     0 Dec 29 10:08 /bricks/brick0/testvol_brick1/E_file_copy_33
-rw-r--r--. 2 root root 33792 Dec 29 10:08 /bricks/brick3/testvol_tier5/E_file_copy_33
[root@rhsauto020:/etc/yum.repos.d] Dec-29-2015 10:34:32 $

[root@rhsauto021:/etc/yum.repos.d] Dec-29-2015 10:34:30 $ls -l /bricks/brick*/testvol_*/E_file_copy_33
---------T. 2 root root 0 Dec 29 10:08 /bricks/brick0/testvol_brick2/E_file_copy_33
[root@rhsauto021:/etc/yum.repos.d] Dec-29-2015 10:34:32 $

[root@rhsauto022:/etc/yum.repos.d] Dec-29-2015 10:34:30 $ls -l /bricks/brick*/testvol_*/E_file_copy_33
-rw-r--r--. 2 root root 33792 Dec 29 10:08 /bricks/brick1/testvol_tier3/E_file_copy_33
[root@rhsauto022:/etc/yum.repos.d] Dec-29-2015 10:34:32 $

Comment 3 Anuradha 2015-12-30 13:27:44 UTC
Shwetha, sos-reports don't have client logs in them. Could you please provide the client logs?

Comment 7 Pranith Kumar K 2016-01-18 10:47:10 UTC
*** Bug 1294732 has been marked as a duplicate of this bug. ***

Comment 10 Ravishankar N 2017-08-29 10:55:46 UTC
Related bug: BZ 1294597

Comment 12 Red Hat Bugzilla 2023-09-14 03:15:33 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.