REVIEW: http://review.gluster.org/8462 (cluster/dht: Fix dht_access treating directory like files) posted (#2) for review on master by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8462 (cluster/dht: Fix dht_access treating directory like files) posted (#3) for review on master by Shyamsundar Ranganathan (srangana)
Analysis on the regression failure of the patch: After adding another 2 subvolumes found the following messages: [2014-08-13 03:13:40.757715] W [MSGID: 108008] [afr-read-txn.c:218:afr_read_txn] 0-patchy-replicate-2: Unreadable subvolume -1 found with event generation 2. (Possible split-brain) [2014-08-13 03:13:40.758077] W [MSGID: 108008] [afr-read-txn.c:218:afr_read_txn] 0-patchy-replicate-3: Unreadable subvolume -1 found with event generation 2. (Possible split-brain) ++++ @messageid 108008 * @diagnosis There is an inconsistency in the file's data/metadata/gfid * amongst the bricks of a replica set. * @recommendedaction Resolve the split brain by clearing the AFR changelog * attributes from the appropriate brick and trigger self-heal. */ #define AFR_MSG_SPLIT_BRAIN (GLFS_COMP_BASE_AFR + 8) +++++ And I could see repetitive logs related to ESTALE for opendir on client 4,5,6,7 which were added later to the volume. Possible that lookup was not received by the inodes and hence dht_selfheal has not created the parents and hence ESTALE was seen. [2014-08-13 03:13:41.235328] W [client-rpc-fops.c:2677:client3_3_opendir_cbk] 0-patchy-client-4: remote operation failed: Stale file handle. Path: <gfid:d4497edf-eed6-4b1c-ac6f-039d1c92ccd6> (d4497edf-eed6-4b1c-ac6f-039d1c92ccd6) [2014-08-13 03:13:41.235387] W [client-rpc-fops.c:2677:client3_3_opendir_cbk] 0-patchy-client-5: remote operation failed: Stale file handle. Path: <gfid:d4497edf-eed6-4b1c-ac6f-039d1c92ccd6> (d4497edf-eed6-4b1c-ac6f-039d1c92ccd6) [2014-08-13 03:13:41.235435] W [client-rpc-fops.c:2677:client3_3_opendir_cbk] 0-patchy-client-6: remote operation failed: Stale file handle. Path: <gfid:d4497edf-eed6-4b1c-ac6f-039d1c92ccd6> (d4497edf-eed6-4b1c-ac6f-039d1c92ccd6) [2014-08-13 03:13:41.235671] W [client-rpc-fops.c:2677:client3_3_opendir_cbk] 0-patchy-client-7: remote operation failed: Stale file handle. Path: <gfid:d4497edf-eed6-4b1c-ac6f-039d1c92ccd6> (d4497edf-eed6-4b1c-ac6f-039d1c92ccd6) LOGS FOR ENOEMPTY ======================== [2014-08-13 03:13:54.131642] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: fd80ffa4: /dir0001 => -1 (Directory not empty) [2014-08-13 03:13:54.377206] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 5a81ffa4: /dir0002 => -1 (Directory not empty) [2014-08-13 03:13:54.630304] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: b781ffa4: /dir0003 => -1 (Directory not empty) [2014-08-13 03:13:54.875241] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 1382ffa4: /dir0004 => -1 (Directory not empty) [2014-08-13 03:13:55.130447] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 7882ffa4: /dir0005 => -1 (Directory not empty) [2014-08-13 03:13:55.387722] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: d182ffa4: /dir0006 => -1 (Directory not empty) [2014-08-13 03:13:55.641347] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 3183ffa4: /dir0007 => -1 (Directory not empty) [2014-08-13 03:13:55.900819] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 9783ffa4: /dir0008 => -1 (Directory not empty) [2014-08-13 03:13:56.157796] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: fa83ffa4: /dir0009 => -1 (Directory not empty) [2014-08-13 03:13:56.397986] W [nfs3.c:3518:nfs3svc_rmdir_cbk] 0-nfs: 5984ffa4: /dir0010 => -1 (Directory not empty) [2014-08-13 03:13:56.608139] W [socket.c:530:__socket_rwv] 0-patchy-client-7: readv on 23.253.200.127:49173 failed (No data available) [2014-08-13 03:13:56.608218] I [client.c:2215:client_rpc_notify] 0-patchy-client-7: disconnected from patchy-client-7. C I could not hit the issue on my vm. Hence, will rerun the regression test to get stat of the files left on the bricks(if finds) and add another rm -rf to the end to check whether the subsequent rm -rf works or not.
Got a machine on which the issue is reproduced consistently. Will debug and update. Hence, not sending any new test case.
REVIEW: http://review.gluster.org/8462 (cluster/dht: Fix dht_access treating directory like files) posted (#4) for review on master by Shyamsundar Ranganathan (srangana)
REVIEW: http://review.gluster.org/8462 (cluster/dht: Fix dht_access treating directory like files) posted (#5) for review on master by Shyamsundar Ranganathan (srangana)
COMMIT: http://review.gluster.org/8462 committed in master by Vijay Bellur (vbellur) ------ commit 6630fff4812f4e8617336b98d8e3ac35976e5990 Author: Shyam <srangana> Date: Tue Aug 12 10:48:27 2014 -0400 cluster/dht: Fix dht_access treating directory like files When the cluster topology changes due to add-brick, all sub volumes of DHT will not contain the directories till a rebalance is completed. Till the rebalance is run, if a caller bypasses lookup and calls access due to saved/cached inode information (like NFS server does) then, dht_access misreads the error (ESTALE/ENOENT) from the new subvolumes and incorrectly tries to handle the inode as a file. This results in the directories in memory state in DHT to be corrupted and not heal even post a rebalance. This commit fixes the problem in dht_access thereby preventing DHT from misrepresenting a directory as a file in the case presented above. Change-Id: Idcdaa3837db71c8fe0a40ec0084a6c3dbe27e772 BUG: 1125824 Signed-off-by: Shyam <srangana> Reviewed-on: http://review.gluster.org/8462 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Vijay Bellur <vbellur>
REVIEW: http://review.gluster.org/8721 (cluster/dht: Fix dht_access treating directory like files) posted (#2) for review on release-3.5 by N Balachandran (nbalacha)
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.7.0, please open a new bug report. glusterfs-3.7.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/10939 [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user