Bug 1231732

Summary: Renamed Files are missing after self-heal
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: spandura
Component: replicateAssignee: Anuradha <atalur>
Status: CLOSED ERRATA QA Contact: spandura
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.1CC: annair, nsathyan, ravishankar, rcyriac, rhs-bugs, smohan, spandura, storage-qa-internal, vagarwal
Target Milestone: ---   
Target Release: RHGS 3.1.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.7.1-8 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1238508 (view as bug list) Environment:
Last Closed: 2015-07-29 05:03:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1202842, 1238508, 1240183    
Attachments:
Description Flags
Sc
none
Logs from mgmt node and brick0.
none
Logs from client and brick1.
none
Logs from brick2 and brick3. none

Description spandura 2015-06-15 10:19:09 UTC
Description of problem:
=======================
In a 2 x 2 distribute-replicate volume, when bricks were brought down create/delete/rename operations on files and directories were performed. Bricks were brought online and self-heal got completed. After self-heal some of the renamed files are missing in the mount point. 

Version-Release number of selected component (if applicable):
==============================================================
glusterfs-3.7.1-1.el6rhs.x86_64

How reproducible:
==============
Often

Steps to Reproduce:
======================
step1)

    create 2 x 2 distribute-replicate volume. start the volume.
    create fuse mount.
    bring down brick1.
    From client execute entry_self_heal.sh <abs_path_mountpoint> "create" 1
    calculate arequal-checksum (after_data_creation)
    bring back brick1 ( service glusterd restart )
    trigger self-heal
    After self-heal is complete , calculate arequal-checksum (after_self_heal)
    compare arequal-checksum (after_data_creation) and arequal-checksum
(after_self_heal) . The arequal-checksums should match


    Step2)
    bring down brick2. 
    calculate arequal-checksum (before_data_creation)
    compare arequal-checksum (after_self_heal calculated above) and arequal-checksum (before_data_creation) . The arequal-checksums should match
    From client execute entry_self_heal.sh <abs_path_mountpoint> "delete" 1
    calculate arequal-checksum (after_data_creation)
    bring back brick2 ( service glusterd restart )
    trigger self-heal
    After self-heal is complete , calculate arequal-checksum (after_self_heal)
    compare arequal-checksum (after_data_creation) and arequal-checksum (after_self_heal) . The arequal-checksums should match

    step3)
    bring down brick3.
    calculate arequal-checksum (before_data_creation)
    compare arequal-checksum (after_self_heal calculated above) and arequal-checksum (before_data_creation) . The arequal-checksums should match
    From client execute entry_self_heal.sh <abs_path_mountpoint> "rename" 1
    calculate arequal-checksum (after_data_creation)
    bring back brick3 ( service glusterd restart )
    trigger self-heal
    After self-heal is complete , calculate arequal-checksum (after_self_heal)
    compare arequal-checksum (after_data_creation) and arequal-checksum (after_self_heal) . The arequal-checksums should match

    step4)
    bring down brick4
    calculate arequal-checksum (before_data_creation)
    compare arequal-checksum (after_self_heal calculated above) and arequal-checksum (before_data_creation) . The arequal-checksums should match
    From client execute entry_self_heal.sh <abs_path_mountpoint> "create" 2
    calculate arequal-checksum (after_data_creation)
    bring back brick4 ( service glusterd restart )
    trigger self-heal
    After self-heal is complete , calculate arequal-checksum (after_self_heal)
    compare arequal-checksum (after_data_creation) and arequal-checksum (after_self_heal) . The arequal-checksums should match

    step5)
    bring down brick1 and brick3
    calculate arequal-checksum (before_data_creation)
    compare arequal-checksum (after_self_heal calculated above) and arequal-checksum (before_data_creation) . The arequal-checksums should match
    compare arequal-checksum (after_self_heal calculated above) and arequal-checksum (before_data_creation) . The arequal-checksums should match
    From client execute entry_self_heal.sh <abs_path_mountpoint> "delete" 2
    calculate arequal-checksum (after_data_creation)
    bring back brick1 and brick3 ( service glusterd restart )
    trigger self-heal
    After self-heal is complete , calculate arequal-checksum (after_self_heal)
    compare arequal-checksum (after_data_creation) and arequal-checksum (after_self_heal) . The arequal-checksums should match

    step6)
    bring down brick1 and brick4.
    calculate arequal-checksum (before_data_creation)
    compare arequal-checksum (after_self_heal calculated above) and arequal-checksum (before_data_creation) . The arequal-checksums should match
    From client execute entry_self_heal.sh <abs_path_mountpoint> "rename" 2
    calculate arequal-checksum (after_data_creation)
    bring back brick1 and brick4 ( service glusterd restart )
    trigger self-heal
    After self-heal is complete , calculate arequal-checksum (after_self_heal)
    compare arequal-checksum (after_data_creation) and arequal-checksum (after_self_heal) . The arequal-checksums should match 

Actual results:
==================
:: [   FAIL   ] :: Files /arequal-data/rhsauto053.lab.eng.blr.redhat.com_gluster-mount_arequal_checksum_after_rename_2.log and /arequal-data/rhsauto053.lab.eng.blr.redhat.com_gluster-mount_arequal_checksum_after_self_heal_rename_2.log should not differ 
:: [ 18:55:17 ] :: arequal checksum of after_rename_2

Entry counts
Regular files   : 640
Directories     : 43
Symbolic links  : 0
Other           : 0
Total           : 683

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 8d5ae90e794c15b2befdda509790c2f7
Directories     : d02060104765b76
Symbolic links  : 0
Other           : 0
Total           : 3ea5355feaaa8c33
:: [ 18:55:17 ] :: arequal checksum of after_self_heal_rename_2

Entry counts
Regular files   : 594
Directories     : 43
Symbolic links  : 0
Other           : 0
Total           : 637

Metadata checksums
Regular files   : 3e9
Directories     : 24d74c
Symbolic links  : 3e9
Other           : 3e9

Checksums
Regular files   : 53abb6943e3d8c97da3cc7cb50ac2171
Directories     : d02060104765b76
Symbolic links  : 0
Other           : 0
Total           : 8495775e6ae7f690
:: [ 18:55:18 ] :: Checking if there are any duplicate entries under /gluster-mount
:: [   PASS   ] :: Duplicate entries not found under /gluster-mount 
:: [ 18:55:19 ] :: Checking if there are missing entries under /gluster-mount  after_rename_2 to after_self_heal_rename_2
:: [   FAIL   ] :: Missing entries found under /gluster-mount  after_rename_2 to after_self_heal_rename_2 
:: [ 18:55:19 ] :: Listing all the Missing entries on mount /gluster-mount  after_rename_2 to after_self_heal_rename_2
-/gluster-mount/E_dir_new_2_2_2_11/E_file_new_2_2_2_10
-/gluster-mount/E_dir_new_2_2_2_11/E_file_new_2_2_2_13
-/gluster-mount/E_dir_new_2_2_2_11/E_file_new_2_2_2_5
-/gluster-mount/E_dir_new_2_2_2_11/E_file_new_2_2_2_8
-/gluster-mount/E_dir_new_2_2_2_17/E_file_new_2_2_2_10
-/gluster-mount/E_dir_new_2_2_2_17/E_file_new_2_2_2_13
-/gluster-mount/E_dir_new_2_2_2_17/E_file_new_2_2_2_5
-/gluster-mount/E_dir_new_2_2_2_17/E_file_new_2_2_2_8
-/gluster-mount/E_dir_new_2_2_2_18/E_file_new_2_2_2_10
-/gluster-mount/E_dir_new_2_2_2_18/E_file_new_2_2_2_13
-/gluster-mount/E_dir_new_2_2_2_18/E_file_new_2_2_2_5
-/gluster-mount/E_dir_new_2_2_2_18/E_file_new_2_2_2_8
-/gluster-mount/E_dir_new_2_2_2_19/E_file_new_2_2_2_10
-/gluster-mount/E_dir_new_2_2_2_19/E_file_new_2_2_2_13
-/gluster-mount/E_dir_new_2_2_2_19/E_file_new_2_2_2_5
-/gluster-mount/E_dir_new_2_2_2_19/E_file_new_2_2_2_8
-/gluster-mount/E_dir_new_2_2_2_20/E_file_new_2_2_2_10
-/gluster-mount/E_dir_new_2_2_2_20/E_file_new_2_2_2_13
-/gluster-mount/E_dir_new_2_2_2_20/E_file_new_2_2_2_5
-/gluster-mount/E_dir_new_2_2_2_20/E_file_new_2_2_2_8
-/gluster-mount/E_dir_new_2_2_2_5/E_file_new_2_2_2_1
-/gluster-mount/E_dir_new_2_2_2_5/E_file_new_2_2_2_11
-/gluster-mount/E_dir_new_2_2_2_5/E_file_new_2_2_2_12
-/gluster-mount/E_dir_new_2_2_2_5/E_file_new_2_2_2_14
-/gluster-mount/E_dir_new_2_2_2_5/E_file_new_2_2_2_15
-/gluster-mount/E_dir_new_2_2_2_5/E_file_new_2_2_2_2
-/gluster-mount/E_dir_new_2_2_2_5/E_file_new_2_2_2_3
-/gluster-mount/E_dir_new_2_2_2_5/E_file_new_2_2_2_4
-/gluster-mount/E_dir_new_2_2_2_5/E_file_new_2_2_2_6
-/gluster-mount/E_dir_new_2_2_2_5/E_file_new_2_2_2_7
-/gluster-mount/E_dir_new_2_2_2_5/E_file_new_2_2_2_9
-/gluster-mount/E_dir_new_2_2_2_6/E_file_new_2_2_2_10
-/gluster-mount/E_dir_new_2_2_2_6/E_file_new_2_2_2_13
-/gluster-mount/E_dir_new_2_2_2_6/E_file_new_2_2_2_5
-/gluster-mount/E_dir_new_2_2_2_6/E_file_new_2_2_2_8
-/gluster-mount/E_dir_new_2_2_2_9/E_file_new_2_2_2_1
-/gluster-mount/E_dir_new_2_2_2_9/E_file_new_2_2_2_11
-/gluster-mount/E_dir_new_2_2_2_9/E_file_new_2_2_2_12
-/gluster-mount/E_dir_new_2_2_2_9/E_file_new_2_2_2_14
-/gluster-mount/E_dir_new_2_2_2_9/E_file_new_2_2_2_15
-/gluster-mount/E_dir_new_2_2_2_9/E_file_new_2_2_2_2
-/gluster-mount/E_dir_new_2_2_2_9/E_file_new_2_2_2_3
-/gluster-mount/E_dir_new_2_2_2_9/E_file_new_2_2_2_4
-/gluster-mount/E_dir_new_2_2_2_9/E_file_new_2_2_2_6
-/gluster-mount/E_dir_new_2_2_2_9/E_file_new_2_2_2_7
-/gluster-mount/E_dir_new_2_2_2_9/E_file_new_2_2_2_9
:: [ 18:55:20 ] :: Checking if there are additional entries under /gluster-mount  after_self_heal_rename_2 to after_rename_2
:: [   PASS   ] :: No Additional entries found under /gluster-mount  after_self_heal_rename_2 to after_rename_2 
:: [ 18:55:21 ] :: Total Number of files and directories in the volume : 637

Expected results:
====================
arequal-checksum should match.

Comment 2 spandura 2015-06-16 06:00:17 UTC
Link to the gluster logs : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/1231732/

Link to the beaker job : https://beaker.engineering.redhat.com/jobs/983606

Comment 3 spandura 2015-06-17 09:59:51 UTC
Created attachment 1039853 [details]
Sc

Comment 7 Anuradha 2015-07-02 07:31:58 UTC
Created attachment 1045375 [details]
Logs from mgmt node and brick0.

Comment 8 Anuradha 2015-07-02 07:33:39 UTC
Created attachment 1045376 [details]
Logs from client and brick1.

Comment 9 Anuradha 2015-07-02 07:35:40 UTC
Created attachment 1045377 [details]
Logs from brick2 and brick3.

Comment 10 Anuradha 2015-07-02 07:39:27 UTC
After checking the logs provided by Shwetha which contain the list of files from bricks after each operation is performed, it is verified that the files are indeed present on the brick but were missing from the mount.
RCA is done and patch is sent upstream for review.
http://review.gluster.org/#/c/11498/

Clearing needinfo on Shwetha.

Comment 11 Anuradha 2015-07-06 08:28:14 UTC
Patch posted for review on downstream :
https://code.engineering.redhat.com/gerrit/#/c/52357/

Upstream links :
master : http://review.gluster.org/11498/
3.7    : http://review.gluster.org/11544/

Comment 12 spandura 2015-07-13 04:40:39 UTC
Verified the test on 2 x 3 distribute-replicate volume on build "glusterfs-3.7.1-8.el6rhs.x86_64" . Bug is fixed. Moving the bug to verified state.

Comment 13 errata-xmlrpc 2015-07-29 05:03:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html