Bug 1142406 - Core: client crash while doing rename operations on the mount
Summary: Core: client crash while doing rename operations on the mount
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.6.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Shyamsundar
QA Contact:
URL:
Whiteboard:
Depends On: 1139273 1139506
Blocks: glusterfs-3.6.0
TreeView+ depends on / blocked
 
Reported: 2014-09-16 17:29 UTC by Shyamsundar
Modified: 2014-11-11 08:39 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.6.0beta1
Doc Type: Bug Fix
Doc Text:
Clone Of: 1139506
Environment:
Last Closed: 2014-11-11 08:39:01 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Shyamsundar 2014-09-16 17:29:53 UTC
+++ This bug was initially created as a clone of Bug #1139506 +++

+++ This bug was initially created as a clone of Bug #1139273 +++

Description of problem:

client process crashed while doing multiple rename operations on glusterfs mount (afr+dht).


Steps to Reproduce:
1. Run multiple rename operations.
2. add brick and rebalance (Not sure if this contributed to client crash, but stating for completeness)


Additional info:

[2014-09-08 12:27:05.086561] I [dht-rename.c:1345:dht_rename] 2-t0-dht: renaming /scratch/scratch/rename
race0SmGRE (hash=t0-replicate-0/cache=t0-replicate-0) => /scratch/scratch/rename.file.29521 (hash=t0-rep
licate-0/cache=t0-replicate-0)
pending frames:
frame : type(1) op(RENAME)
frame : type(1) op(RENAME)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2014-09-08 12:27:05
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.28
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x3e09c1ff06]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x3e09c3a59f]
/lib64/libc.so.6[0x3e07c326b0]
/usr/lib64/libglusterfs.so.0(uuid_unpack+0x42)[0x3e09c5b1d2]
/usr/lib64/libglusterfs.so.0(uuid_compare+0x33)[0x3e09c5b0c3]
/lib64/libc.so.6[0x3e07c34a79]
/lib64/libc.so.6(qsort_r+0x29c)[0x3e07c34f2c]
/usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_lock_order_requests+0x31)[0x7f2450263ec1]
/usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_local_lock_init+0x42)[0x7f2450266132]
/usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_nonblocking_inodelk+0x80)[0x7f2450266630]
/usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_rename_lock+0xdf)[0x7f245027102f]
/usr/lib64/glusterfs/3.6.0.28/xlator/cluster/distribute.so(dht_rename+0x201)[0x7f24502766f1]
/usr/lib64/libglusterfs.so.0(default_rename+0x78)[0x3e09c25ec8]
/usr/lib64/libglusterfs.so.0(default_rename+0x78)[0x3e09c25ec8]
/usr/lib64/libglusterfs.so.0(default_rename+0x78)[0x3e09c25ec8]
/usr/lib64/libglusterfs.so.0(default_rename+0x78)[0x3e09c25ec8]
/usr/lib64/libglusterfs.so.0(default_rename+0x78)[0x3e09c25ec8]
/usr/lib64/glusterfs/3.6.0.28/xlator/performance/md-cache.so(mdc_rename+0x157)[0x7f244b5cb5a7]
/usr/lib64/glusterfs/3.6.0.28/xlator/debug/io-stats.so(io_stats_rename+0x164)[0x7f244b3b3784]
/usr/lib64/libglusterfs.so.0(default_rename+0x78)[0x3e09c25ec8]
/usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(fuse_rename_resume+0x2a0)[0x7f24539ee630]
/usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(+0x88a6)[0x7f24539df8a6]
/usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(fuse_resolve_continue+0x41)[0x7f24539df971]
/usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(fuse_resolve_parent+0x18)[0x7f24539df308]
/usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(+0x8608)[0x7f24539df608]
/usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(+0x88ee)[0x7f24539df8ee]
/usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(fuse_resolve_continue+0x41)[0x7f24539df971]
/usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(fuse_resolve_parent+0x18)[0x7f24539df308]
/usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(+0x8608)[0x7f24539df608]
/usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(+0x88ce)[0x7f24539df8ce]
/usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(fuse_resolve_and_resume+0x28)[0x7f24539df918]
/usr/lib64/glusterfs/3.6.0.28/xlator/mount/fuse.so(+0x202b0)[0x7f24539f72b0]
/lib64/libpthread.so.0[0x3e080079d1]
/lib64/libc.so.6(clone+0x6d)[0x3e07ce886d]
---------

(gdb) bt
#0  uuid_unpack (in=0x7f2427ffeff8 "\001", uu=0x7f2449ca7190) at ../../contrib/uuid/unpack.c:58
#1  0x0000003e09c5b0c3 in uuid_compare (uu1=<value optimized out>, 
    uu2=0x7f2427fff000 <Address 0x7f2427fff000 out of bounds>) at ../../contrib/uuid/compare.c:46
#2  0x0000003e07c34a79 in msort_with_tmp () from /lib64/libc.so.6
#3  0x0000003e07c34f2c in qsort_r () from /lib64/libc.so.6
#4  0x00007f2450263ec1 in dht_lock_order_requests (locks=<value optimized out>, 
    count=<value optimized out>) at dht-helper.c:1755
#5  0x00007f2450266132 in dht_local_lock_init (frame=<value optimized out>, lk_array=0x7f2427ffefd0, 
    lk_count=2, inodelk_cbk=0x7f24502734d0 <dht_rename_lock_cbk>) at dht-helper.c:410
#6  0x00007f2450266630 in dht_nonblocking_inodelk (frame=0x7f245a2f34f8, lk_array=0x7f2427ffefd0, 
    lk_count=2, inodelk_cbk=0x7f24502734d0 <dht_rename_lock_cbk>) at dht-helper.c:1640
#7  0x00007f245027102f in dht_rename_lock (frame=0x7f245a2f34f8) at dht-rename.c:1244
#8  0x00007f24502766f1 in dht_rename (frame=0x7f245a2f34f8, this=0xa0ee7f0, oldloc=0x7f242e84f660, 
    newloc=0x7f242e84f6a0, xdata=<value optimized out>) at dht-rename.c:1351
#9  0x0000003e09c25ec8 in default_rename (frame=0x7f245a2f34f8, this=0x20a2fc0, oldloc=0x7f242e84f660, 
    newloc=0x7f242e84f6a0, xdata=<value optimized out>) at defaults.c:1862
#10 0x0000003e09c25ec8 in default_rename (frame=0x7f245a2f34f8, this=0x9e1db30, oldloc=0x7f242e84f660, 
    newloc=0x7f242e84f6a0, xdata=<value optimized out>) at defaults.c:1862
#11 0x0000003e09c25ec8 in default_rename (frame=0x7f245a2f34f8, this=0x2e22810, oldloc=0x7f242e84f660, 
    newloc=0x7f242e84f6a0, xdata=<value optimized out>) at defaults.c:1862
#12 0x0000003e09c25ec8 in default_rename (frame=0x7f245a2f34f8, this=0x931d6e0, oldloc=0x7f242e84f660, 
    newloc=0x7f242e84f6a0, xdata=<value optimized out>) at defaults.c:1862
#13 0x0000003e09c25ec8 in default_rename (frame=0x7f245a2f34f8, this=0x2624480, oldloc=0x7f242e84f660, 
    newloc=0x7f242e84f6a0, xdata=<value optimized out>) at defaults.c:1862
#14 0x00007f244b5cb5a7 in mdc_rename (frame=0x7f245a322374, this=0xa812120, oldloc=0x7f242e84f660, 
    newloc=0x7f242e84f6a0, xdata=0x0) at md-cache.c:1325
#15 0x00007f244b3b3784 in io_stats_rename (frame=0x7f245a324bc4, this=0x914f770, 
    oldloc=0x7f242e84f660, newloc=0x7f242e84f6a0, xdata=0x0) at io-stats.c:2037
#16 0x0000003e09c25ec8 in default_rename (frame=0x7f245a324bc4, this=0x1de2210, oldloc=0x7f242e84f660, 
    newloc=0x7f242e84f6a0, xdata=<value optimized out>) at defaults.c:1862
#17 0x00007f24539ee630 in fuse_rename_resume (state=0x7f242e84f640) at fuse-bridge.c:1778
#18 0x00007f24539df8a6 in fuse_resolve_done (state=<value optimized out>) at fuse-resolve.c:665
#19 fuse_resolve_all (state=<value optimized out>) at fuse-resolve.c:694
---Type <return> to continue, or q <return> to quit---
#20 0x00007f24539df971 in fuse_resolve_continue (state=0x7f242e84f640) at fuse-resolve.c:710
#21 0x00007f24539df308 in fuse_resolve_parent (state=0x7f242e84f640) at fuse-resolve.c:313
#22 0x00007f24539df608 in fuse_resolve (state=0x7f242e84f640) at fuse-resolve.c:644
#23 0x00007f24539df8ee in fuse_resolve_all (state=<value optimized out>) at fuse-resolve.c:690
#24 0x00007f24539df971 in fuse_resolve_continue (state=0x7f242e84f640) at fuse-resolve.c:710
#25 0x00007f24539df308 in fuse_resolve_parent (state=0x7f242e84f640) at fuse-resolve.c:313
#26 0x00007f24539df608 in fuse_resolve (state=0x7f242e84f640) at fuse-resolve.c:644
#27 0x00007f24539df8ce in fuse_resolve_all (state=<value optimized out>) at fuse-resolve.c:683
#28 0x00007f24539df918 in fuse_resolve_and_resume (state=0x7f242e84f640, 
    fn=0x7f24539ee390 <fuse_rename_resume>) at fuse-resolve.c:723
#29 0x00007f24539f72b0 in fuse_thread_proc (data=0x15c4120) at fuse-bridge.c:4862
#30 0x0000003e080079d1 in start_thread () from /lib64/libpthread.so.0
#31 0x0000003e07ce886d in clone () from /lib64/libc.so.6
(gdb)

--- Additional comment from Anand Avati on 2014-09-09 02:20:05 EDT ---

REVIEW: http://review.gluster.org/8659 (cluster/dht: fix memory corruption in locking api.) posted (#2) for review on master by Raghavendra G (rgowdapp)

--- Additional comment from Anand Avati on 2014-09-09 23:29:50 EDT ---

COMMIT: http://review.gluster.org/8659 committed in master by Vijay Bellur (vbellur) 
------
commit ed4a754f7b6b103b23b2c3e29b8b749cd9db89f3
Author: Raghavendra G <rgowdapp>
Date:   Tue Sep 9 11:33:14 2014 +0530

    cluster/dht: fix memory corruption in locking api.
    
    <man 3 qsort>
    
         The  contents  of the array are sorted in ascending order
         according to a comparison function pointed to by compar, which is
         called with two arguments that "point to the objects being
         compared".
    
    </man 3 qsort>
    
    qsort passes "pointers to members of the array" to comparision
    function. Since the members of the array happen to be (dht_lock_t *),
    the arguments passed to dht_lock_request_cmp are of type (dht_lock_t
    **). Previously we assumed them to be of type (dht_lock_t *), which
    resulted in memory corruption.
    
    Change-Id: Iee0758704434beaff3c3a1ad48d549cbdc9e1c96
    BUG: 1139506
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on: http://review.gluster.org/8659
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Shyamsundar Ranganathan <srangana>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 1 Anand Avati 2014-09-16 17:56:03 UTC
REVIEW: http://review.gluster.org/8750 (cluster/dht: fix memory corruption in locking api.) posted (#1) for review on release-3.6 by Shyamsundar Ranganathan (srangana)

Comment 2 Anand Avati 2014-09-17 04:26:04 UTC
COMMIT: http://review.gluster.org/8750 committed in release-3.6 by Vijay Bellur (vbellur) 
------
commit a6499d32292ca5a1418e1c785d617317226b2f53
Author: Raghavendra G <rgowdapp>
Date:   Tue Sep 16 13:55:03 2014 -0400

    cluster/dht: fix memory corruption in locking api.
    
    <man 3 qsort>
    
         The  contents  of the array are sorted in ascending order
         according to a comparison function pointed to by compar, which is
         called with two arguments that "point to the objects being
         compared".
    
    </man 3 qsort>
    
    qsort passes "pointers to members of the array" to comparision
    function. Since the members of the array happen to be (dht_lock_t *),
    the arguments passed to dht_lock_request_cmp are of type (dht_lock_t
    **). Previously we assumed them to be of type (dht_lock_t *), which
    resulted in memory corruption.
    
    Change-Id: Iee0758704434beaff3c3a1ad48d549cbdc9e1c96
    BUG: 1142406
    Signed-off-by: Raghavendra G <rgowdapp>
    Reviewed-on-master: http://review.gluster.org/8659
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Shyamsundar Ranganathan <srangana>
    Reviewed-by: Vijay Bellur <vbellur>
    Reviewed-on: http://review.gluster.org/8750

Comment 3 Niels de Vos 2014-09-22 12:46:22 UTC
A beta release for GlusterFS 3.6.0 has been released. Please verify if the release solves this bug report for you. In case the glusterfs-3.6.0beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-September/018836.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 4 Niels de Vos 2014-11-11 08:39:01 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.6.1, please reopen this bug report.

glusterfs-3.6.1 has been announced [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019410.html
[2] http://supercolony.gluster.org/mailman/listinfo/gluster-users


Note You need to log in before you can comment on or make changes to this bug.