Bug 795321

Summary: gluster Client crashed during the automation testrun
Product: [Community] GlusterFS Reporter: Shwetha Panduranga <shwetha.h.panduranga>
Component: replicateAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED DUPLICATE QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: mainlineCC: gluster-bugs, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-24 08:36:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Shwetha Panduranga 2012-02-20 08:58:04 UTC
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.6-3.el6.x86_64
(gdb) bt
#0  0x000000368d232905 in raise () from /lib64/libc.so.6
#1  0x000000368d2340e5 in abort () from /lib64/libc.so.6
#2  0x000000368d22b9be in __assert_fail_base () from /lib64/libc.so.6
#3  0x000000368d22ba80 in __assert_fail () from /lib64/libc.so.6
#4  0x00007fcc18acded6 in afr_self_heal_parent_entrylk (frame=0x7fcc1be1896c, this=0x24700b0, lock_cbk=0x7fcc18acda99 <afr_sh_post_nb_entrylk_conflicting_sh_cbk>)
    at afr-self-heal-common.c:1894
#5  0x00007fcc18acdf53 in afr_self_heal_conflicting_entries (frame=0x7fcc1be1896c, this=0x24700b0) at afr-self-heal-common.c:1905
#6  0x00007fcc18acea4d in afr_self_heal (frame=0x7fcc1ce5c724, this=0x24700b0, inode=0x7fcc1291f0e0) at afr-self-heal-common.c:2146
#7  0x00007fcc18aeebf8 in afr_launch_self_heal (frame=0x7fcc1ce5c724, this=0x24700b0, inode=0x7fcc1291f0e0, background=_gf_true, ia_type=IA_IFREG, 
    reason=0x7fffd5c02880 "stale subvolume 1 detected", gfid_sh_success_cbk=0, unwind=0) at afr-common.c:1292
#8  0x00007fcc18abbc26 in afr_perform_data_self_heal (frame=0x7fcc1ce5c724, this=0x24700b0) at afr-open.c:117
#9  0x00007fcc18abc20a in afr_open_cbk (frame=0x7fcc1ce5c724, cookie=0x1, this=0x24700b0, op_ret=0, op_errno=0, fd=0x7fcc1252504c) at afr-open.c:187
#10 0x00007fcc18d2898b in client3_1_open_cbk (req=0x7fcc1352a4b4, iov=0x7fcc1352a4f4, count=1, myframe=0x7fcc1ce5ce88) at client3_1-fops.c:375
#11 0x00007fcc1e0066a4 in rpc_clnt_handle_reply (clnt=0x2480050, pollin=0x2467300) at rpc-clnt.c:790
#12 0x00007fcc1e006a2b in rpc_clnt_notify (trans=0x2480350, mydata=0x2480080, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2467300) at rpc-clnt.c:909
#13 0x00007fcc1e002c08 in rpc_transport_notify (this=0x2480350, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2467300) at rpc-transport.c:498
#14 0x00007fcc19b7223d in socket_event_poll_in (this=0x2480350) at socket.c:1675
#15 0x00007fcc19b727c1 in socket_event_handler (fd=9, idx=2, data=0x2480350, poll_in=1, poll_out=0, poll_err=0) at socket.c:1790
#16 0x00007fcc1e25bc9c in event_dispatch_epoll_handler (event_pool=0x2462b80, events=0x2468030, i=0) at event.c:794
#17 0x00007fcc1e25bebf in event_dispatch_epoll (event_pool=0x2462b80) at event.c:856
#18 0x00007fcc1e25c24a in event_dispatch (event_pool=0x2462b80) at event.c:956
#19 0x0000000000407c5e in main (argc=4, argv=0x7fffd5c02f98) at glusterfsd.c:1601
(gdb) f 4
#4  0x00007fcc18acded6 in afr_self_heal_parent_entrylk (frame=0x7fcc1be1896c, this=0x24700b0, lock_cbk=0x7fcc18acda99 <afr_sh_post_nb_entrylk_conflicting_sh_cbk>)
    at afr-self-heal-common.c:1894
1894	        GF_ASSERT (local->loc.parent);
(gdb) l
1889	
1890	        gf_log (this->name, GF_LOG_TRACE,
1891	                "attempting to recreate missing entries for path=%s",
1892	                local->loc.path);
1893	
1894	        GF_ASSERT (local->loc.parent);
1895	        afr_build_parent_loc (&sh->parent_loc, &local->loc);
1896	
1897	        afr_sh_entrylk (frame, this, &sh->parent_loc, NULL,
1898	                        lock_cbk);


Client Log:-
-----------------
[2012-02-20 03:33:05.219403] I [client-handshake.c:923:client_setvolume_cbk] 0-replicate-client-1: Connected to 10.1.11.111:24009, attached to remote volume '/export'.
[2012-02-20 03:33:05.219438] D [client-handshake.c:795:client_post_handshake] 0-replicate-client-1: no open fds - notifying all parents child up
[2012-02-20 03:33:05.219456] I [afr-common.c:3461:afr_notify] 0-replicate-replicate-0: subvol 1 came up, start crawl
[2012-02-20 03:33:05.219471] D [fuse-bridge.c:3717:notify] 0-fuse: got event 8 on graph 0
[2012-02-20 03:33:17.866430] D [afr-self-heal-common.c:124:afr_sh_print_pending_matrix] 0-replicate-replicate-0: pending_matrix: [ 0 0 ]
[2012-02-20 03:33:17.866484] D [afr-self-heal-common.c:124:afr_sh_print_pending_matrix] 0-replicate-replicate-0: pending_matrix: [ 0 0 ]
[2012-02-20 03:33:17.866499] D [afr-self-heal-common.c:729:afr_mark_sources] 0-replicate-replicate-0: Number of sources: 0
[2012-02-20 03:33:17.866511] D [afr-self-heal-data.c:799:afr_lookup_select_read_child_by_txn_type] 0-replicate-replicate-0: returning read_child: 0
[2012-02-20 03:33:17.866521] D [afr-common.c:1245:afr_lookup_select_read_child] 0-replicate-replicate-0: Source selected as 0 for /
[2012-02-20 03:33:17.866539] D [afr-common.c:1052:afr_lookup_build_response_params] 0-replicate-replicate-0: Building lookup response from 0
[2012-02-20 03:33:17.867711] D [afr-common.c:129:afr_lookup_xattr_req_prepare] 0-replicate-replicate-0: /file1: failed to get the gfid from dict
[2012-02-20 03:33:17.868156] D [afr-self-heal-common.c:124:afr_sh_print_pending_matrix] 0-replicate-replicate-0: pending_matrix: [ 0 1 ]
[2012-02-20 03:33:17.868180] D [afr-self-heal-common.c:124:afr_sh_print_pending_matrix] 0-replicate-replicate-0: pending_matrix: [ 0 0 ]
[2012-02-20 03:33:17.868191] D [afr-self-heal-common.c:729:afr_mark_sources] 0-replicate-replicate-0: Number of sources: 1
[2012-02-20 03:33:17.868201] D [afr-self-heal-data.c:799:afr_lookup_select_read_child_by_txn_type] 0-replicate-replicate-0: returning read_child: 0
[2012-02-20 03:33:17.868211] D [afr-common.c:1245:afr_lookup_select_read_child] 0-replicate-replicate-0: Source selected as 0 for /file1
[2012-02-20 03:33:17.868221] D [afr-common.c:1052:afr_lookup_build_response_params] 0-replicate-replicate-0: Building lookup response from 0
[2012-02-20 03:33:17.868233] I [afr-common.c:1139:afr_detect_self_heal_by_iatt] 0-replicate-replicate-0: size differs for /file1
[2012-02-20 03:33:17.868245] D [afr-common.c:1114:afr_lookup_set_self_heal_params_by_xattr] 0-replicate-replicate-0: data self-heal is pending for /file1.
[2012-02-20 03:33:19.592673] D [afr-self-heal-common.c:124:afr_sh_print_pending_matrix] 0-replicate-replicate-0: pending_matrix: [ 0 0 ]
[2012-02-20 03:33:19.592727] D [afr-self-heal-common.c:124:afr_sh_print_pending_matrix] 0-replicate-replicate-0: pending_matrix: [ 0 0 ]
[2012-02-20 03:33:19.592743] D [afr-self-heal-common.c:729:afr_mark_sources] 0-replicate-replicate-0: Number of sources: 0
[2012-02-20 03:33:19.592755] D [afr-self-heal-data.c:799:afr_lookup_select_read_child_by_txn_type] 0-replicate-replicate-0: returning read_child: 0
[2012-02-20 03:33:19.592766] D [afr-common.c:1245:afr_lookup_select_read_child] 0-replicate-replicate-0: Source selected as 0 for /
[2012-02-20 03:33:19.592797] D [afr-common.c:1052:afr_lookup_build_response_params] 0-replicate-replicate-0: Building lookup response from 0
[2012-02-20 03:33:19.593636] D [afr-common.c:129:afr_lookup_xattr_req_prepare] 0-replicate-replicate-0: /file1: failed to get the gfid from dict
[2012-02-20 03:33:17.868156] D [afr-self-heal-common.c:124:afr_sh_print_pending_matrix] 0-replicate-replicate-0: pending_matrix: [ 0 1 ]
[2012-02-20 03:33:17.868180] D [afr-self-heal-common.c:124:afr_sh_print_pending_matrix] 0-replicate-replicate-0: pending_matrix: [ 0 0 ]
[2012-02-20 03:33:17.868191] D [afr-self-heal-common.c:729:afr_mark_sources] 0-replicate-replicate-0: Number of sources: 1
[2012-02-20 03:33:17.868201] D [afr-self-heal-data.c:799:afr_lookup_select_read_child_by_txn_type] 0-replicate-replicate-0: returning read_child: 0
[2012-02-20 03:33:17.868211] D [afr-common.c:1245:afr_lookup_select_read_child] 0-replicate-replicate-0: Source selected as 0 for /file1
[2012-02-20 03:33:17.868221] D [afr-common.c:1052:afr_lookup_build_response_params] 0-replicate-replicate-0: Building lookup response from 0
[2012-02-20 03:33:17.868233] I [afr-common.c:1139:afr_detect_self_heal_by_iatt] 0-replicate-replicate-0: size differs for /file1
[2012-02-20 03:33:17.868245] D [afr-common.c:1114:afr_lookup_set_self_heal_params_by_xattr] 0-replicate-replicate-0: data self-heal is pending for /file1.
[2012-02-20 03:33:19.592673] D [afr-self-heal-common.c:124:afr_sh_print_pending_matrix] 0-replicate-replicate-0: pending_matrix: [ 0 0 ]
[2012-02-20 03:33:19.592727] D [afr-self-heal-common.c:124:afr_sh_print_pending_matrix] 0-replicate-replicate-0: pending_matrix: [ 0 0 ]
[2012-02-20 03:33:19.592743] D [afr-self-heal-common.c:729:afr_mark_sources] 0-replicate-replicate-0: Number of sources: 0
[2012-02-20 03:33:19.592755] D [afr-self-heal-data.c:799:afr_lookup_select_read_child_by_txn_type] 0-replicate-replicate-0: returning read_child: 0
[2012-02-20 03:33:19.592766] D [afr-common.c:1245:afr_lookup_select_read_child] 0-replicate-replicate-0: Source selected as 0 for /
[2012-02-20 03:33:19.592797] D [afr-common.c:1052:afr_lookup_build_response_params] 0-replicate-replicate-0: Building lookup response from 0
[2012-02-20 03:33:19.593636] D [afr-common.c:129:afr_lookup_xattr_req_prepare] 0-replicate-replicate-0: /file1: failed to get the gfid from dict
[2012-02-20 03:33:19.594173] D [afr-self-heal-common.c:124:afr_sh_print_pending_matrix] 0-replicate-replicate-0: pending_matrix: [ 0 1 ]
[2012-02-20 03:33:19.594198] D [afr-self-heal-common.c:124:afr_sh_print_pending_matrix] 0-replicate-replicate-0: pending_matrix: [ 0 0 ]
[2012-02-20 03:33:19.594210] D [afr-self-heal-common.c:729:afr_mark_sources] 0-replicate-replicate-0: Number of sources: 1
[2012-02-20 03:33:19.594221] D [afr-self-heal-data.c:799:afr_lookup_select_read_child_by_txn_type] 0-replicate-replicate-0: returning read_child: 0
[2012-02-20 03:33:19.594240] D [afr-common.c:1245:afr_lookup_select_read_child] 0-replicate-replicate-0: Source selected as 0 for /file1
[2012-02-20 03:33:19.594261] D [afr-common.c:1052:afr_lookup_build_response_params] 0-replicate-replicate-0: Building lookup response from 0
[2012-02-20 03:33:19.594284] I [afr-common.c:1139:afr_detect_self_heal_by_iatt] 0-replicate-replicate-0: size differs for /file1
[2012-02-20 03:33:19.594304] D [afr-common.c:1114:afr_lookup_set_self_heal_params_by_xattr] 0-replicate-replicate-0: data self-heal is pending for /file1.
[2012-02-20 03:33:19.594572] E [stat-prefetch.c:691:sp_remove_caches_from_all_fds_opened] (-->/usr/local/lib/glusterfs/3git/xlator/mount/fuse.so(fuse_readv_resume+0x444) [0x7fcc1bacbe60] (-->/usr/local/lib/glusterfs/3git/xlator/debug/io-stats.so(io_stats_readv+0x2b1) [0x7fcc13df5fdd] (-->/usr/local/lib/glusterfs/3git/xlator/performance/stat-prefetch.so(sp_readv+0x238) [0x7fcc1802daec]))) 0-replicate-stat-prefetch: invalid argument: inode
[2012-02-20 03:33:19.594991] I [afr-common.c:1290:afr_launch_self_heal] 0-replicate-replicate-0: background  meta-data data missing-entry gfid self-heal triggered. path: /file1, reason: stale subvolume 1 detected
pending frames:
frame : type(1) op(READ)
frame : type(1) op(READ)

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2012-02-20 03:33:19
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3git
/lib64/libc.so.6[0x368d232980]
/lib64/libc.so.6(gsignal+0x35)[0x368d232905]
/lib64/libc.so.6(abort+0x175)[0x368d2340e5]
/lib64/libc.so.6[0x368d22b9be]
/lib64/libc.so.6(__assert_perror_fail+0x0)[0x368d22ba80]
/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(+0x3fed6)[0x7fcc18acded6]
/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(+0x3ff53)[0x7fcc18acdf53]
/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_self_heal+0x45e)[0x7fcc18acea4d]
/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_launch_self_heal+0x228)[0x7fcc18aeebf8]
/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_perform_data_self_heal+0x130)[0x7fcc18abbc26]
/usr/local/lib/glusterfs/3git/xlator/cluster/replicate.so(afr_open_cbk+0x3ab)[0x7fcc18abc20a]
/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_open_cbk+0x475)[0x7fcc18d2898b]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x211)[0x7fcc1e0066a4]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x2bd)[0x7fcc1e006a2b]
/usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x130)[0x7fcc1e002c08]
/usr/local/lib/glusterfs/3git/rpc-transport/socket.so(socket_event_poll_in+0x54)[0x7fcc19b7223d]
/usr/local/lib/glusterfs/3git/rpc-transport/socket.so(socket_event_handler+0x21d)[0x7fcc19b727c1]
/usr/local/lib/libglusterfs.so.0(+0x4bc9c)[0x7fcc1e25bc9c]
/usr/local/lib/libglusterfs.so.0(+0x4bebf)[0x7fcc1e25bebf]
/usr/local/lib/libglusterfs.so.0(event_dispatch+0x88)[0x7fcc1e25c24a]
/usr/local/sbin/glusterfs(main+0x238)[0x407c5e]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x368d21ecdd]
/usr/local/sbin/glusterfs[0x403ec9]

Comment 1 Shwetha Panduranga 2012-02-23 05:50:17 UTC
Attaching bt full:-


#0  0x000000368d232905 in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x000000368d2340e5 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x000000368d22b9be in __assert_fail_base () from /lib64/libc.so.6
No symbol table info available.
#3  0x000000368d22ba80 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#4  0x00007fb1838bbe29 in afr_self_heal_parent_entrylk (frame=0x7fb186cd81d8, this=0x1e15ee0, lock_cbk=0x7fb1838bb9ec <afr_sh_post_nb_entrylk_conflicting_sh_cbk>)
    at afr-self-heal-common.c:1894
        local = 0x7fb1826de970
        sh = 0x7fb1826e0f20
        __FUNCTION__ = "afr_self_heal_parent_entrylk"
        __PRETTY_FUNCTION__ = "afr_self_heal_parent_entrylk"
#5  0x00007fb1838bbea6 in afr_self_heal_conflicting_entries (frame=0x7fb186cd81d8, this=0x1e15ee0) at afr-self-heal-common.c:1905
No locals.
#6  0x00007fb1838bc9a6 in afr_self_heal (frame=0x7fb186edc89c, this=0x1e15ee0, inode=0x7fb18200b0e0) at afr-self-heal-common.c:2145
        local = 0x7fb1826ee048
        sh = 0x7fb1826e0f20
        priv = 0x1e2f280
        op_errno = 12
        ret = 0
        orig_sh = 0x7fb1826f05f8
        sh_frame = 0x7fb186cd81d8
        sh_local = 0x7fb1826de970
        loc = 0x0
        __PRETTY_FUNCTION__ = "afr_self_heal"
        __FUNCTION__ = "afr_self_heal"
#7  0x00007fb1838de77c in afr_launch_self_heal (frame=0x7fb186edc89c, this=0x1e15ee0, inode=0x7fb18200b0e0, background=_gf_true, ia_type=IA_IFREG, 
    reason=0x7fff39a95130 "stale subvolume 1 detected", gfid_sh_success_cbk=0, unwind=0) at afr-common.c:1292
        local = 0x7fb1826ee048
        sh_type_str = " meta-data data missing-entry gfid", '\000' <repeats 221 times>
        bg = 0x7fb1838f40ff "background"
        __PRETTY_FUNCTION__ = "afr_launch_self_heal"
        __FUNCTION__ = "afr_launch_self_heal"
#8  0x00007fb1838a9b06 in afr_perform_data_self_heal (frame=0x7fb186edc89c, this=0x1e15ee0) at afr-open.c:117

Comment 2 Pranith Kumar K 2012-02-24 08:36:07 UTC

*** This bug has been marked as a duplicate of bug 786060 ***