Bug 798179

Summary: [728de5be7ce2975efb59bb5928fd7261d5ec7760]: client crashed in mdc_lookup with assert during unref
Product: [Community] GlusterFS Reporter: Rahul C S <rahulcs>
Component: stat-prefetchAssignee: Amar Tumballi <amarts>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: pre-releaseCC: ashetty, gluster-bugs, rfortier, vbellur, vbhat, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:39:11 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 817967    

Description Rahul C S 2012-02-28 09:04:17 UTC
Description of problem:
[2012-02-27 05:22:09.751140] E [afr-lk-common.c:568:afr_unlock_inodelk_cbk] 0-vol-replicate-0: /system_light/run29351/test/file97138: unlock
failed on 0, reason: Invalid argument
[2012-02-27 05:23:47.720289] W [client3_1-fops.c:1266:client3_1_finodelk_cbk] 0-vol-client-0: remote operation failed: Invalid argument
[2012-02-27 05:23:47.720356] E [afr-lk-common.c:568:afr_unlock_inodelk_cbk] 0-vol-replicate-0: /system_light/run29351/test/file99922: unlock
failed on 0, reason: Invalid argument
[2012-02-27 06:01:52.562271] W [client3_1-fops.c:1228:client3_1_inodelk_cbk] 0-vol-client-0: remote operation failed: Invalid argument
[2012-02-27 06:01:52.607917] E [afr-lk-common.c:568:afr_unlock_inodelk_cbk] 0-vol-replicate-0: /system_light/run29351/sbench.5044: unlock fai
led on 0, reason: Invalid argument
[2012-02-27 06:09:02.871766] W [fd-lk.c:407:print_lock_list] 0-fd-lk: lock list:
[2012-02-27 06:09:05.273563] W [fd-lk.c:407:print_lock_list] 0-fd-lk: lock list:
[2012-02-27 06:09:06.380844] W [fd-lk.c:407:print_lock_list] 0-fd-lk: lock list:
[2012-02-27 06:09:07.893275] W [fd-lk.c:407:print_lock_list] 0-fd-lk: lock list:
[2012-02-27 06:09:09.346986] W [fd-lk.c:407:print_lock_list] 0-fd-lk: lock list:
pending frames:
frame : type(1) op(LOOKUP) 
frame : type(1) op(LOOKUP) 
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2012-02-27 06:09:10
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3git
/lib64/libc.so.6[0x3e5d632ac0]
/lib64/libc.so.6(gsignal+0x35)[0x3e5d632a45]
/lib64/libc.so.6(abort+0x175)[0x3e5d634225]
/lib64/libc.so.6(__assert_fail+0xf5)[0x3e5d62b9d5]
/usr/local/lib/libglusterfs.so.0(__gf_free+0xa3)[0x7fb181b830ef]
/usr/local/lib/libglusterfs.so.0(dict_destroy+0xbf)[0x7fb181b48d39]
/usr/local/lib/libglusterfs.so.0(dict_unref+0xb3)[0x7fb181b48e73]
/usr/local/lib/glusterfs/3git/xlator/performance/md-cache.so(mdc_lookup+0x340)[0x7fb17c45c73b]
/usr/local/lib/glusterfs/3git/xlator/debug/io-stats.so(io_stats_lookup+0x28c)[0x7fb17c24ca02]
/usr/local/lib/glusterfs/3git/xlator/mount/fuse.so(+0xb501)[0x7fb18042b501]
/usr/local/lib/glusterfs/3git/xlator/mount/fuse.so(+0x1e4e0)[0x7fb18043e4e0]
/lib64/libpthread.so.0[0x3e5da077e1]
/lib64/libc.so.6(clone+0x6d)[0x3e5d6e68ed]
---------

Core backtrace:
Core was generated by `/usr/local/sbin/glusterfs --volfile-id=vol --volfile-server=10.1.11.152 mount/'.
Program terminated with signal 6, Aborted.
#0  0x0000003e5d632a45 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6.x86_64 libgcc-4.4.5-6.el6.x86_64
(gdb) bt
#0  0x0000003e5d632a45 in raise () from /lib64/libc.so.6
#1  0x0000003e5d634225 in abort () from /lib64/libc.so.6
#2  0x0000003e5d62b9d5 in __assert_fail () from /lib64/libc.so.6
#3  0x00007fb181b830ef in __gf_free (free_ptr=0xc198420) at mem-pool.c:273
#4  0x00007fb181b48d39 in dict_destroy (this=0xc02b460) at dict.c:418
#5  0x00007fb181b48e73 in dict_unref (this=0xc02b460) at dict.c:454
#6  0x00007fb17c45c73b in mdc_lookup (frame=0x7fb1809954dc, this=0xac5f60, loc=0x7fb159446740, xattr_req=0x7fb1594c0df0) at md-cache.c:634
#7  0x00007fb17c24ca02 in io_stats_lookup (frame=0x7fb1809aaad8, this=0xac7160, loc=0x7fb159446740, xattr_req=0x7fb1594c0df0)
    at io-stats.c:1855
#8  0x00007fb18042b501 in fuse_getattr (this=0xaa5720, finh=0x7fb1593cc3b0, msg=0x7fb1593cc3d8) at fuse-bridge.c:571
#9  0x00007fb18043e4e0 in fuse_thread_proc (data=0xaa5720) at fuse-bridge.c:3970
#10 0x0000003e5da077e1 in start_thread () from /lib64/libpthread.so.0
#11 0x0000003e5d6e68ed in clone () from /lib64/libc.so.6
(gdb) f 6
#6  0x00007fb17c45c73b in mdc_lookup (frame=0x7fb1809954dc, this=0xac5f60, loc=0x7fb159446740, xattr_req=0x7fb1594c0df0) at md-cache.c:634
634                     dict_unref (xattr_rsp);
(gdb) l
629   
630             MDC_STACK_UNWIND (lookup, frame, 0, 0, loc->inode, &stbuf,
631                               xattr_rsp, &postparent);
632   
633             if (xattr_rsp)
634                     dict_unref (xattr_rsp);
635   
636             return 0;
637   
638     uncached:
(gdb) p *xattr_rsp
$12 = {is_static = 0 '\000', hash_size = 1, count = 5, refcount = 0, members = 0xc08f80, members_list = 0x984e0c0,
  extra_free = 0xb710730 "", extra_stdfree = 0x0, lock = 1}
(gdb) f 5
#5  0x00007fb181b48e73 in dict_unref (this=0xc02b460) at dict.c:454
454                     dict_destroy (this);
(gdb) f 4
#4  0x00007fb181b48d39 in dict_destroy (this=0xc02b460) at dict.c:418
418                     GF_FREE (prev->key);
(gdb) f 3
#3  0x00007fb181b830ef in __gf_free (free_ptr=0xc198420) at mem-pool.c:273
273                     GF_ASSERT (0);
(gdb) l
268
269             ptr = (char *)free_ptr - 8 - 4;
270
271             if (GF_MEM_HEADER_MAGIC != *(uint32_t *)ptr) {
272                     //Possible corruption, assert here
273                     GF_ASSERT (0);
274             }
275
276             *(uint32_t *)ptr = 0;
277
(gdb)

Distributed replicate volume with fuse client running sanity & nfs client running rdd. Did a brick up/down with stat-prefetch on & off

Comment 1 Anand Avati 2012-03-01 03:45:56 UTC
CHANGE: http://review.gluster.com/2834 (perf/md-cache: hold lock on modification of md_cache structure) merged in master by Vijay Bellur (vijay)

Comment 2 Amar Tumballi 2012-03-01 06:29:43 UTC
*** Bug 798508 has been marked as a duplicate of this bug. ***

Comment 4 Anush Shetty 2012-05-19 05:45:54 UTC
Verified with 3.3.0qa41