rename has hung on the server side because of the following lockup and client witnesses a RENAME() RPC timeout + bail out. Excerpt from http://www.pitt.edu/~jaw171/brick-1.1429.dump [global.callpool.stack.1.frame.1] ref_count=1 translator=vol_home-server complete=0 [global.callpool.stack.1.frame.2] ref_count=0 translator=vol_home-locks complete=1 parent=vol_home-io-threads wind_from=iot_inodelk_wrapper wind_to=FIRST_CHILD (this)->fops->inodelk unwind_from=pl_common_inodelk unwind_to=iot_inodelk_cbk [global.callpool.stack.1.frame.3] ref_count=0 translator=vol_home-io-threads complete=1 parent=vol_home-index wind_from=default_inodelk wind_to=FIRST_CHILD(this)->fops->inodelk unwind_from=iot_inodelk_cbk unwind_to=default_inodelk_cbk [global.callpool.stack.1.frame.4] ref_count=0 translator=vol_home-index complete=1 parent=vol_home-marker wind_from=marker_rename_release_oldp_lock wind_to=FIRST_CHILD(this)->fops->inodelk unwind_from=default_inodelk_cbk unwind_to=marker_rename_release_newp_lock [global.callpool.stack.1.frame.5] ref_count=0 translator=vol_home-posix complete=1 parent=vol_home-access-control wind_from=posix_acl_getxattr wind_to=FIRST_CHILD(this)->fops->getxattr unwind_from=posix_getxattr unwind_to=posix_acl_getxattr_cbk [global.callpool.stack.1.frame.6] ref_count=0 translator=vol_home-access-control complete=1 parent=vol_home-locks wind_from=pl_getxattr wind_to=FIRST_CHILD(this)->fops->getxattr unwind_from=posix_acl_getxattr_cbk unwind_to=pl_getxattr_cbk [global.callpool.stack.1.frame.7] ref_count=0 translator=vol_home-locks complete=1 parent=vol_home-io-threads wind_from=iot_getxattr_wrapper wind_to=FIRST_CHILD (this)->fops->getxattr unwind_from=pl_getxattr_cbk unwind_to=iot_getxattr_cbk [global.callpool.stack.1.frame.8] ref_count=0 translator=vol_home-io-threads complete=1 parent=vol_home-index wind_from=index_getxattr wind_to=FIRST_CHILD(this)->fops->getxattr unwind_from=iot_getxattr_cbk unwind_to=default_getxattr_cbk [global.callpool.stack.1.frame.9] ref_count=0 translator=vol_home-index complete=1 parent=vol_home-marker wind_from=marker_get_oldpath_contribution wind_to=FIRST_CHILD(this)->fops->getxattr unwind_from=default_getxattr_cbk unwind_to=marker_get_newpath_contribution [global.callpool.stack.1.frame.10] ref_count=0 translator=vol_home-locks complete=1 parent=vol_home-io-threads wind_from=iot_inodelk_wrapper wind_to=FIRST_CHILD (this)->fops->inodelk unwind_from=pl_common_inodelk unwind_to=iot_inodelk_cbk [global.callpool.stack.1.frame.11] ref_count=0 translator=vol_home-io-threads complete=1 parent=vol_home-index wind_from=default_inodelk wind_to=FIRST_CHILD(this)->fops->inodelk unwind_from=iot_inodelk_cbk unwind_to=default_inodelk_cbk [global.callpool.stack.1.frame.12] ref_count=0 translator=vol_home-index complete=1 parent=vol_home-marker wind_from=marker_rename wind_to=FIRST_CHILD(this)->fops->inodelk unwind_from=default_inodelk_cbk unwind_to=marker_rename_inodelk_cbk [global.callpool.stack.1.frame.13] ref_count=0 translator=vol_home-marker complete=0 parent=/brick/1 wind_from=io_stats_rename wind_to=FIRST_CHILD(this)->fops->rename unwind_to=io_stats_rename_cbk [global.callpool.stack.1.frame.14] ref_count=1 translator=/brick/1 complete=0 parent=vol_home-server wind_from=server_rename_resume wind_to=bound_xl->fops->rename unwind_to=server_rename_cbk [global.callpool.stack.1.frame.15] ref_count=0 translator=vol_home-posix complete=1 parent=vol_home-access-control wind_from=posix_acl_lookup wind_to=FIRST_CHILD (this)->fops->lookup unwind_from=posix_lookup unwind_to=posix_acl_lookup_cbk [global.callpool.stack.1.frame.16] ref_count=0 translator=vol_home-access-control complete=1 parent=vol_home-locks wind_from=pl_lookup wind_to=FIRST_CHILD(this)->fops->lookup unwind_from=posix_acl_lookup_cbk unwind_to=pl_lookup_cbk [global.callpool.stack.1.frame.17] ref_count=0 translator=vol_home-locks complete=1 parent=vol_home-io-threads wind_from=iot_lookup_wrapper wind_to=FIRST_CHILD (this)->fops->lookup unwind_from=pl_lookup_cbk unwind_to=iot_lookup_cbk [global.callpool.stack.1.frame.18] ref_count=0 translator=vol_home-io-threads complete=1 parent=vol_home-index wind_from=index_lookup wind_to=FIRST_CHILD(this)->fops->lookup unwind_from=iot_lookup_cbk unwind_to=default_lookup_cbk [global.callpool.stack.1.frame.19] ref_count=0 translator=vol_home-index complete=1 parent=vol_home-marker wind_from=marker_lookup wind_to=FIRST_CHILD(this)->fops->lookup unwind_from=default_lookup_cbk unwind_to=marker_lookup_cbk [global.callpool.stack.1.frame.20] ref_count=0 translator=vol_home-marker complete=1 parent=/brick/1 wind_from=io_stats_lookup wind_to=FIRST_CHILD(this)->fops->lookup unwind_from=marker_lookup_cbk unwind_to=io_stats_lookup_cbk [global.callpool.stack.1.frame.21] ref_count=0 translator=/brick/1 complete=1 parent=vol_home-server wind_from=resolve_gfid_cbk wind_to=BOUND_XL (frame)->fops->lookup unwind_from=io_stats_lookup_cbk unwind_to=resolve_gfid_entry_cbk [global.callpool.stack.1.frame.22] ref_count=0 translator=vol_home-posix complete=1 parent=vol_home-access-control wind_from=posix_acl_lookup wind_to=FIRST_CHILD (this)->fops->lookup unwind_from=posix_lookup unwind_to=posix_acl_lookup_cbk [global.callpool.stack.1.frame.23] ref_count=0 translator=vol_home-access-control complete=1 parent=vol_home-locks wind_from=pl_lookup wind_to=FIRST_CHILD(this)->fops->lookup unwind_from=posix_acl_lookup_cbk unwind_to=pl_lookup_cbk [global.callpool.stack.1.frame.24] ref_count=0 translator=vol_home-locks complete=1 parent=vol_home-io-threads wind_from=iot_lookup_wrapper wind_to=FIRST_CHILD (this)->fops->lookup unwind_from=pl_lookup_cbk unwind_to=iot_lookup_cbk [global.callpool.stack.1.frame.25] ref_count=0 translator=vol_home-io-threads complete=1 parent=vol_home-index wind_from=index_lookup wind_to=FIRST_CHILD(this)->fops->lookup unwind_from=iot_lookup_cbk unwind_to=default_lookup_cbk [global.callpool.stack.1.frame.26] ref_count=0 translator=vol_home-index complete=1 parent=vol_home-marker wind_from=marker_lookup wind_to=FIRST_CHILD(this)->fops->lookup unwind_from=default_lookup_cbk unwind_to=marker_lookup_cbk [global.callpool.stack.1.frame.27] ref_count=0 translator=vol_home-marker complete=1 parent=/brick/1 wind_from=io_stats_lookup wind_to=FIRST_CHILD(this)->fops->lookup unwind_from=marker_lookup_cbk unwind_to=io_stats_lookup_cbk [global.callpool.stack.1.frame.28] ref_count=0 translator=/brick/1 complete=1 parent=vol_home-server wind_from=resolve_gfid wind_to=BOUND_XL (frame)->fops->lookup unwind_from=io_stats_lookup_cbk unwind_to=resolve_gfid_cbk
I tried to reproduce this and rename for small or large files works fine. glusterfs version : 3.3.0rc2 It will be helpful if we can get some more information like, vol-type, glusterfs version or steps to reproduce this issue.
I had a distributed volume on 3.2.5. I moved the data out of it, killed the volume and uninstalled GlusterFS (removed vol files, etc.). I then installed glusterfs-3.3.0-1.el6.x86_64 from the packages on gluster.org and created a new distributed volume (with different bricks) and moved the data back into it. Could the fact that the files were in another volume previously be causing the issue? Is it safe to move data from one volume to another? This issue still occurs on the current volume when one user tries to compile large programs with gcc. # gluster volume info Volume Name: vol_home Type: Distribute Volume ID: 07ec60be-ec0c-4579-a675-069bb34c12ab Status: Started Number of Bricks: 4 Transport-type: tcp Bricks: Brick1: storage0-dev.cssd.pitt.edu:/brick/0 Brick2: storage1-dev.cssd.pitt.edu:/brick/2 Brick3: storage0-dev.cssd.pitt.edu:/brick/1 Brick4: storage1-dev.cssd.pitt.edu:/brick/3 Options Reconfigured: diagnostics.brick-log-level: INFO diagnostics.client-log-level: INFO features.limit-usage: /home/cssd/jaw171:50GB,/cssd:200GB,/cssd/jaw171:100GB nfs.rpc-auth-allow: 10.54.50.*,127.* auth.allow: 10.54.50.*,127.* performance.io-cache: off cluster.min-free-disk: 5 performance.cache-size: 128000000 features.quota: on nfs.disable: on # uname -r 2.6.32-220.17.1.el6.x86_64 # cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.2 (Santiago) # rpm -qa | grep gluster glusterfs-fuse-3.3.0-1.el6.x86_64 glusterfs-server-3.3.0-1.el6.x86_64 glusterfs-3.3.0-1.el6.x86_64 # gluster --version glusterfs 3.3.0 built on May 31 2012 11:16:29
would be great to understand where we are about the fix for this.
REVIEW: http://review.gluster.org/5032 (features/marker-quota: more stringent error handling in rename.) posted (#1) for review on master by Raghavendra G (raghavendra)
REVIEW: http://review.gluster.org/5032 (features/marker-quota: more stringent error handling in rename.) posted (#2) for review on master by Raghavendra G (rgowdapp)
REVIEW: http://review.gluster.org/5032 (features/marker-quota: more stringent error handling in rename.) posted (#3) for review on master by Raghavendra G (rgowdapp)
REVIEW: http://review.gluster.org/5032 (features/marker-quota: more stringent error handling in rename.) posted (#4) for review on master by Raghavendra G (rgowdapp)
REVIEW: http://review.gluster.org/6913 (features/marker-quota: more stringent error handling in rename.) posted (#1) for review on release-3.5 by Raghavendra G (rgowdapp)
COMMIT: http://review.gluster.org/5032 committed in master by Anand Avati (avati) ------ commit 66113e3473555c31926045218dc8b79c61751d5d Author: Raghavendra G <raghavendra> Date: Sat May 18 11:52:09 2013 +0530 features/marker-quota: more stringent error handling in rename. If an error occurs and op_errno is not set to non-zero value, we can end up in loosing a frame resulting in a hung syscall. This patch adds code setting op_errno appropriately in storage/posix and makes marker to set err to a default non-zero value in case of op_errno being zero. Change-Id: Idc2c3e843b932709a69b32ba67deb284547168f2 BUG: 833586 Signed-off-by: Raghavendra G <raghavendra> Reviewed-on: http://review.gluster.org/5032 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Anand Avati <avati>
COMMIT: http://review.gluster.org/6913 committed in release-3.5 by Niels de Vos (ndevos) ------ commit be331ce48633943743bbbe9665f44204e4437dee Author: Raghavendra G <raghavendra> Date: Sat May 18 11:52:09 2013 +0530 features/marker-quota: more stringent error handling in rename. If an error occurs and op_errno is not set to non-zero value, we can end up in loosing a frame resulting in a hung syscall. This patch adds code setting op_errno appropriately in storage/posix and makes marker to set err to a default non-zero value in case of op_errno being zero. Change-Id: Idc2c3e843b932709a69b32ba67deb284547168f2 BUG: 833586 Signed-off-by: Raghavendra G <raghavendra> Reviewed-on: http://review.gluster.org/6913 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Niels de Vos <ndevos>
The first (and last?) Beta for GlusterFS 3.5.1 has been released [1]. Please verify if the release solves this bug report for you. In case the glusterfs-3.5.1beta release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED. Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-May/040377.html [2] http://supercolony.gluster.org/pipermail/gluster-users/
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.1, please reopen this bug report. glusterfs-3.5.1 has been announced on the Gluster Users mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-June/040723.html [2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user