+++ This bug was initially created as a clone of Bug #1139506 +++ +++ This bug was initially created as a clone of Bug #1139273 +++ Description of problem: client process crashed while doing multiple rename operations on glusterfs mount (afr+dht). Steps to Reproduce: 1. Run multiple rename operations. 2. add brick and rebalance (Not sure if this contributed to client crash, but stating for completeness) Additional info: [2014-09-08 12:27:05.086561] I [dht-rename.c:1345:dht_rename] 2-t0-dht: renaming /scratch/scratch/rename race0SmGRE (hash=t0-replicate-0/cache=t0-replicate-0) => /scratch/scratch/rename.file.29521 (hash=t0-rep licate-0/cache=t0-replicate-0) pending frames: frame : type(1) op(RENAME) frame : type(1) op(RENAME) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2014-09-08 12:27:05 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.6.0.28 /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3e09c1ff06] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x3e09c3a59f] /lib64/libc.so.6[0x3e07c326b0] /usr/lib64/libglusterfs.so.0(uuid_unpack+0x42)[0x3e09c5b1d2] /usr/lib64/libglusterfs.so.0(uuid_compare+0x33)[0x3e09c5b0c3] /lib64/libc.so.6[0x3e07c34a79] /lib64/libc.so.6(qsort_r+0x29c)[0x3e07c34f2c] /usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_lock_order_requests+0x31)[0x7f2450263ec1] /usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_local_lock_init+0x42)[0x7f2450266132] /usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_nonblocking_inodelk+0x80)[0x7f2450266630] /usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_rename_lock+0xdf)[0x7f245027102f] /usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_rename+0x201)[0x7f24502766f1] /usr/lib64/libglusterfs.so.0(default_rename+0x78)[0x3e09c25ec8] /usr/lib64/libglusterfs.so.0(default_rename+0x78)[0x3e09c25ec8] /usr/lib64/libglusterfs.so.0(default_rename+0x78)[0x3e09c25ec8] /usr/lib64/libglusterfs.so.0(default_rename+0x78)[0x3e09c25ec8] /usr/lib64/libglusterfs.so.0(default_rename+0x78)[0x3e09c25ec8] /usr/lib64/glusterfs/3.6.0.28/xlator/performance/md-cache.so(mdc_rename+0x157)[0x7f244b5cb5a7] /usr/lib64/glusterfs/3.6.0.28/xlator/debug/io-stats.so(io_stats_rename+0x164)[0x7f244b3b3784] /usr/lib64/libglusterfs.so.0(default_rename+0x78)[0x3e09c25ec8] /usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(fuse_rename_resume+0x2a0)[0x7f24539ee630] /usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(+0x88a6)[0x7f24539df8a6] /usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(fuse_resolve_continue+0x41)[0x7f24539df971] /usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(fuse_resolve_parent+0x18)[0x7f24539df308] /usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(+0x8608)[0x7f24539df608] /usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(+0x88ee)[0x7f24539df8ee] /usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(fuse_resolve_continue+0x41)[0x7f24539df971] /usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(fuse_resolve_parent+0x18)[0x7f24539df308] /usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(+0x8608)[0x7f24539df608] /usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(+0x88ce)[0x7f24539df8ce] /usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(fuse_resolve_and_resume+0x28)[0x7f24539df918] /usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(+0x202b0)[0x7f24539f72b0] /lib64/libpthread.so.0[0x3e080079d1] /lib64/libc.so.6(clone+0x6d)[0x3e07ce886d] --------- (gdb) bt #0 uuid_unpack (in=0x7f2427ffeff8 "\001", uu=0x7f2449ca7190) at ../../contrib/uuid/unpack.c:58 #1 0x0000003e09c5b0c3 in uuid_compare (uu1=<value optimized out>, uu2=0x7f2427fff000 <Address 0x7f2427fff000 out of bounds>) at ../../contrib/uuid/compare.c:46 #2 0x0000003e07c34a79 in msort_with_tmp () from /lib64/libc.so.6 #3 0x0000003e07c34f2c in qsort_r () from /lib64/libc.so.6 #4 0x00007f2450263ec1 in dht_lock_order_requests (locks=<value optimized out>, count=<value optimized out>) at dht-helper.c:1755 #5 0x00007f2450266132 in dht_local_lock_init (frame=<value optimized out>, lk_array=0x7f2427ffefd0, lk_count=2, inodelk_cbk=0x7f24502734d0 <dht_rename_lock_cbk>) at dht-helper.c:410 #6 0x00007f2450266630 in dht_nonblocking_inodelk (frame=0x7f245a2f34f8, lk_array=0x7f2427ffefd0, lk_count=2, inodelk_cbk=0x7f24502734d0 <dht_rename_lock_cbk>) at dht-helper.c:1640 #7 0x00007f245027102f in dht_rename_lock (frame=0x7f245a2f34f8) at dht-rename.c:1244 #8 0x00007f24502766f1 in dht_rename (frame=0x7f245a2f34f8, this=0xa0ee7f0, oldloc=0x7f242e84f660, newloc=0x7f242e84f6a0, xdata=<value optimized out>) at dht-rename.c:1351 #9 0x0000003e09c25ec8 in default_rename (frame=0x7f245a2f34f8, this=0x20a2fc0, oldloc=0x7f242e84f660, newloc=0x7f242e84f6a0, xdata=<value optimized out>) at defaults.c:1862 #10 0x0000003e09c25ec8 in default_rename (frame=0x7f245a2f34f8, this=0x9e1db30, oldloc=0x7f242e84f660, newloc=0x7f242e84f6a0, xdata=<value optimized out>) at defaults.c:1862 #11 0x0000003e09c25ec8 in default_rename (frame=0x7f245a2f34f8, this=0x2e22810, oldloc=0x7f242e84f660, newloc=0x7f242e84f6a0, xdata=<value optimized out>) at defaults.c:1862 #12 0x0000003e09c25ec8 in default_rename (frame=0x7f245a2f34f8, this=0x931d6e0, oldloc=0x7f242e84f660, newloc=0x7f242e84f6a0, xdata=<value optimized out>) at defaults.c:1862 #13 0x0000003e09c25ec8 in default_rename (frame=0x7f245a2f34f8, this=0x2624480, oldloc=0x7f242e84f660, newloc=0x7f242e84f6a0, xdata=<value optimized out>) at defaults.c:1862 #14 0x00007f244b5cb5a7 in mdc_rename (frame=0x7f245a322374, this=0xa812120, oldloc=0x7f242e84f660, newloc=0x7f242e84f6a0, xdata=0x0) at md-cache.c:1325 #15 0x00007f244b3b3784 in io_stats_rename (frame=0x7f245a324bc4, this=0x914f770, oldloc=0x7f242e84f660, newloc=0x7f242e84f6a0, xdata=0x0) at io-stats.c:2037 #16 0x0000003e09c25ec8 in default_rename (frame=0x7f245a324bc4, this=0x1de2210, oldloc=0x7f242e84f660, newloc=0x7f242e84f6a0, xdata=<value optimized out>) at defaults.c:1862 #17 0x00007f24539ee630 in fuse_rename_resume (state=0x7f242e84f640) at fuse-bridge.c:1778 #18 0x00007f24539df8a6 in fuse_resolve_done (state=<value optimized out>) at fuse-resolve.c:665 #19 fuse_resolve_all (state=<value optimized out>) at fuse-resolve.c:694 ---Type <return> to continue, or q <return> to quit--- #20 0x00007f24539df971 in fuse_resolve_continue (state=0x7f242e84f640) at fuse-resolve.c:710 #21 0x00007f24539df308 in fuse_resolve_parent (state=0x7f242e84f640) at fuse-resolve.c:313 #22 0x00007f24539df608 in fuse_resolve (state=0x7f242e84f640) at fuse-resolve.c:644 #23 0x00007f24539df8ee in fuse_resolve_all (state=<value optimized out>) at fuse-resolve.c:690 #24 0x00007f24539df971 in fuse_resolve_continue (state=0x7f242e84f640) at fuse-resolve.c:710 #25 0x00007f24539df308 in fuse_resolve_parent (state=0x7f242e84f640) at fuse-resolve.c:313 #26 0x00007f24539df608 in fuse_resolve (state=0x7f242e84f640) at fuse-resolve.c:644 #27 0x00007f24539df8ce in fuse_resolve_all (state=<value optimized out>) at fuse-resolve.c:683 #28 0x00007f24539df918 in fuse_resolve_and_resume (state=0x7f242e84f640, fn=0x7f24539ee390 <fuse_rename_resume>) at fuse-resolve.c:723 #29 0x00007f24539f72b0 in fuse_thread_proc (data=0x15c4120) at fuse-bridge.c:4862 #30 0x0000003e080079d1 in start_thread () from /lib64/libpthread.so.0 #31 0x0000003e07ce886d in clone () from /lib64/libc.so.6 (gdb) --- Additional comment from Anand Avati on 2014-09-09 02:20:05 EDT --- REVIEW: http://review.gluster.org/8659 (cluster/dht: fix memory corruption in locking api.) posted (#2) for review on master by Raghavendra G (rgowdapp) --- Additional comment from Anand Avati on 2014-09-09 23:29:50 EDT --- COMMIT: http://review.gluster.org/8659 committed in master by Vijay Bellur (vbellur) ------ commit ed4a754f7b6b103b23b2c3e29b8b749cd9db89f3 Author: Raghavendra G <rgowdapp> Date: Tue Sep 9 11:33:14 2014 +0530 cluster/dht: fix memory corruption in locking api. <man 3 qsort> The contents of the array are sorted in ascending order according to a comparison function pointed to by compar, which is called with two arguments that "point to the objects being compared". </man 3 qsort> qsort passes "pointers to members of the array" to comparision function. Since the members of the array happen to be (dht_lock_t *), the arguments passed to dht_lock_request_cmp are of type (dht_lock_t **). Previously we assumed them to be of type (dht_lock_t *), which resulted in memory corruption. Change-Id: Iee0758704434beaff3c3a1ad48d549cbdc9e1c96 BUG: 1139506 Signed-off-by: Raghavendra G <rgowdapp> Reviewed-on: http://review.gluster.org/8659 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Shyamsundar Ranganathan <srangana> Reviewed-by: Vijay Bellur <vbellur>
REVIEW: http://review.gluster.org/8750 (cluster/dht: fix memory corruption in locking api.) posted (#1) for review on release-3.6 by Shyamsundar Ranganathan (srangana)
COMMIT: http://review.gluster.org/8750 committed in release-3.6 by Vijay Bellur (vbellur) ------ commit a6499d32292ca5a1418e1c785d617317226b2f53 Author: Raghavendra G <rgowdapp> Date: Tue Sep 16 13:55:03 2014 -0400 cluster/dht: fix memory corruption in locking api. <man 3 qsort> The contents of the array are sorted in ascending order according to a comparison function pointed to by compar, which is called with two arguments that "point to the objects being compared". </man 3 qsort> qsort passes "pointers to members of the array" to comparision function. Since the members of the array happen to be (dht_lock_t *), the arguments passed to dht_lock_request_cmp are of type (dht_lock_t **). Previously we assumed them to be of type (dht_lock_t *), which resulted in memory corruption. Change-Id: Iee0758704434beaff3c3a1ad48d549cbdc9e1c96 BUG: 1142406 Signed-off-by: Raghavendra G <rgowdapp> Reviewed-on-master: http://review.gluster.org/8659 Tested-by: Gluster Build System <jenkins.com> Reviewed-by: Shyamsundar Ranganathan <srangana> Reviewed-by: Vijay Bellur <vbellur> Reviewed-on: http://review.gluster.org/8750
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED. Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html [2] http://supercolony.gluster.org/pipermail/gluster-users/
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report. glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html [2] http://supercolony.gluster.org/mailman/listinfo/gluster-users