Description of problem: Rebalance process is hung in statfs call of quota and fails after time out ################################################################### 1. crated a 6x2 dist-rep volume 2. Ran ACA script which does deep directory creation and renaming of directories and files 3. while script is running did add-brick and rebalance Result: Rebalance will be hung for 1800 seconds which is call bail timeout then it runs to completion statedump: -------------- [global.callpool.stack.1.frame.1] ref_count=1 translator=test-server complete=0 [global.callpool.stack.1.frame.2] ref_count=0 translator=test-quota complete=0 parent=/brick2/test7 wind_from=io_stats_statfs wind_to=FIRST_CHILD(this)->fops->statfs unwind_to=io_stats_statfs_cbk [global.callpool.stack.1.frame.3] ref_count=1 translator=/brick2/test7 complete=0 parent=test-server wind_from=server_statfs_resume wind_to=bound_xl->fops->statfs unwind_to=server_statfs_cbk From rebalance logs =========== [2015-01-03 14:49:59.065353] E [rpc-clnt.c:201:call_bail] 0-test-client-1: bailing out frame type(GlusterFS 3.3) op(STATFS(14)) xid = 0x794 sent = 2015-01-03 14:19:58.397959. timeout = 1800 for 10.70.44.70:49152 Version-Release number of selected component (if applicable): How reproducible: When building ancestry fails, it results in frame loss as error is not handled properly. We saw an error log in brick process which said open failed on the same gfid (on which statfs was issued). This open most likely would've been issued as part of Ancestry building code in quota. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
REVIEW: http://review.gluster.org/9380 (features/quota: prevent statfs frame-loss when an error happens during ancestry building.) posted (#4) for review on master by Raghavendra G (rgowdapp)
REVIEW: http://review.gluster.org/9380 (features/quota: prevent statfs frame-loss when an error happens during ancestry building.) posted (#5) for review on master by Vijaikumar Mallikarjuna (vmallika)
I've dropped this bug from the glusterfs-3.7.1 tracker. Please clone this bug and have the clone depend on 1178619 (this bug) and block "glusterfs-3.7.1".
REVIEW: http://review.gluster.org/9380 (features/quota: prevent statfs frame-loss when an error happens during ancestry building.) posted (#6) for review on master by Raghavendra G (rgowdapp)
Fix for this BZ is already present in a GlusterFS release. You can find clone of this BZ, fixed in a GlusterFS release and closed. Hence closing this mainline BZ as well.
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report. glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/ [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user