Bug 1138393

Summary: rebalance is not resulting in the hash layout changes being available to nfs client
Product: [Community] GlusterFS Reporter: Shyamsundar <srangana>
Component: distributeAssignee: Shyamsundar <srangana>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: 3.6.0CC: aavati, bugs, gluster-bugs, nbalacha, nsathyan, pcuzner, rcyriac, rgowdapp, shmohan, spalai, ssaha, ssamanta, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: aarch64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.6.0beta1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1125824 Environment:
Last Closed: 2014-11-11 08:38:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1120456, 1125824, 1139997, 1140338    
Bug Blocks: 1117822, 1125958    

Description Shyamsundar 2014-09-04 17:22:12 UTC
+++ This bug was initially created as a clone of Bug #1125824 +++

Description of problem:
Testing volume expansion and rebalance on a volume used by application, resulted in files being unable to be copied/deleted. originally I had a 4 brick dist-repl volume, and expanded this to an 8 brick configuration by
- running add-brick
- running rebalance start

The rebalance was executed during application activity (writes of cold buckets to the volume and reads to the volume across buckets during upto 36 concurrent search sessions. rebalance completed successfully.

However, two problems have been identified following the rebalance;

1. a subsequent benchmark test that attempts to refresh the environment by deleting existing files - failed (nfs.log attached)
2. the migration of data from one of the indexers to the RHS volume started to fail, leaving the data on local disk instead of migrating to the nfs mounted rhs volume.

Steps to Reproduce:
1. any attempt to delete the file's listed in the nfs.log, fail

[root@focil-rhs1 rawdata]# rm slicesv2.dat 
rm: remove regular file `slicesv2.dat'? y
rm: cannot remove `slicesv2.dat': Invalid argument

Actual results:
file deletion fails.

Expected results:
file access/manipulation following rebalance should work

Comment 1 Anand Avati 2014-09-04 18:48:01 UTC
REVIEW: http://review.gluster.org/8608 (cluster/dht: Fix dht_access treating directory like files) posted (#1) for review on release-3.6 by Shyamsundar Ranganathan (srangana)

Comment 2 Anand Avati 2014-09-04 20:25:47 UTC
REVIEW: http://review.gluster.org/8608 (cluster/dht: Fix dht_access treating directory like files) posted (#2) for review on release-3.6 by Shyamsundar Ranganathan (srangana)

Comment 3 Anand Avati 2014-09-05 15:11:12 UTC
REVIEW: http://review.gluster.org/8608 (cluster/dht: Fix dht_access treating directory like files) posted (#3) for review on release-3.6 by Shyamsundar Ranganathan (srangana)

Comment 4 Anand Avati 2014-09-05 15:20:07 UTC
REVIEW: http://review.gluster.org/8608 (cluster/dht: Fix dht_access treating directory like files) posted (#4) for review on release-3.6 by Shyamsundar Ranganathan (srangana)

Comment 5 Anand Avati 2014-09-09 17:52:49 UTC
COMMIT: http://review.gluster.org/8608 committed in release-3.6 by Vijay Bellur (vbellur) 
------
commit 7fa8f593e1375e6a917de0a24efa91f82aab05a4
Author: Shyam <srangana>
Date:   Thu Sep 4 14:10:02 2014 -0400

    cluster/dht: Fix dht_access treating directory like files
    
    When the cluster topology changes due to add-brick, all sub
    volumes of DHT will not contain the directories till a rebalance
    is completed. Till the rebalance is run, if a caller bypasses
    lookup and calls access due to saved/cached inode information
    (like NFS server does) then, dht_access misreads the error
    (ESTALE/ENOENT) from the new subvolumes and incorrectly tries
    to handle the inode as a file. This results in the directories
    in memory state in DHT to be corrupted and not heal even post
    a rebalance.
    
    This commit fixes the problem in dht_access thereby preventing
    DHT from misrepresenting a directory as a file in the case
    presented above.
    
    Change-Id: Idcdaa3837db71c8fe0a40ec0084a6c3dbe27e772
    BUG: 1138393
    Signed-off-by: Shyam <srangana>
    Reviewed-on-master: http://review.gluster.org/8462
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Vijay Bellur <vbellur>
    Reviewed-on: http://review.gluster.org/8608
    Reviewed-by: Jeff Darcy <jdarcy>

Comment 6 Niels de Vos 2014-09-22 12:45:42 UTC
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 7 Niels de Vos 2014-11-11 08:38:11 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users