Bug 765515 (GLUSTER-3783) - [glusterfs-3.2.5qa5]: glusterfs client crashed since gfid was NULL
Summary: [glusterfs-3.2.5qa5]: glusterfs client crashed since gfid was NULL
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-3783
Product: GlusterFS
Classification: Community
Component: replicate
Version: pre-release
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
: GLUSTER-3781 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-11-04 19:14 UTC by Raghavendra Bhat
Modified: 2011-11-23 08:48 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Raghavendra Bhat 2011-11-04 16:17:41 UTC
In another crash, found that in fuse state structure both state->gfid and state->resolve.gfid is NULL


p *loc
$11 = {path = 0x7f82080109a0 "/playground/run12202/p0", name = 0x7f820800aca5 "p0", inode = 0x7f82144691f8, parent = 0x7f8214469330, 
  gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>, ino = 0}
(gdb) p inode->gfid
No symbol "inode" in current context.
(gdb) p *loc->inode
$12 = {table = 0x12bc410, gfid = '\000' <repeats 15 times>, lock = 1, nlookup = 0, ref = 6, ino = 0, ia_type = IA_INVAL, fd_list = {
    next = 0x7f8214469238, prev = 0x7f8214469238}, dentry_list = {next = 0x7f8214469248, prev = 0x7f8214469248}, hash = {
    next = 0x7f8214469258, prev = 0x7f8214469258}, list = {next = 0x7f82144693a0, prev = 0x12bc470}, _ctx = 0x7f8208028b30}
(gdb)

Comment 1 Raghavendra Bhat 2011-11-04 16:28:28 UTC
Log file entry says that gfid is not found. 

[2011-11-04 15:11:16.58835] W [fuse-resolve.c:273:fuse_resolve_deep_cbk] 0-fuse: /playground/run12202/fss6a8c: no gfid found
[2011-11-04 15:11:16.103468] W [fuse-resolve.c:328:fuse_resolve_path_deep] 0-fuse: /playground/run12202/p0: no gfid found

According to new changes, when ever there is a graph chages, we get the gfid for a inode from the old graph. But if the gfid of the inode is NULL in the old graph also, then we just log the message continue the lookup without any gfid.


21             old_graph_xl    = state->old_xl;
322             old_graph_inode = inode_from_path (old_graph_xl->itable,
323                                                resolve->deep_loc.path);
(gdb)
324             if (old_graph_inode && !uuid_is_null (old_graph_inode->gfid)) {
325                     uuid_copy (resolve->deep_loc.gfid, old_graph_inode->gfid);
326             } else {
327                     gf_log (THIS->name, GF_LOG_WARNING, "%s: no gfid found",
328                             resolve->deep_loc.path);
329             }
330
331             FUSE_FOP_COOKIE (state, state->itable->xl, fuse_resolve_deep_cbk,
332                              (void *)(long)i,
333                              GF_FOP_LOOKUP, lookup, &resolve->deep_loc, NULL);
(gdb)
334
335             return 0;
336     resolved:
337             fuse_resolve_deep_continue (state);
338             return 0;
339     }
340
341
342     static int
343     fuse_resolve_path_simple (fuse_state_t *state)
(gdb) p state->resolve_now->deep_loc
$15 = {path = 0x7f82080109a0 "/playground/run12202/p0", name = 0x7f820800aca5 "p0", inode = 0x7f82144691f8, parent = 0x7f8214469330,
  gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>, ino = 0}
(gdb) p *state->resolve_now->deep_loc.inode
$16 = {table = 0x12bc410, gfid = '\000' <repeats 15 times>, lock = 1, nlookup = 0, ref = 6, ino = 0, ia_type = IA_INVAL, fd_list = {
    next = 0x7f8214469238, prev = 0x7f8214469238}, dentry_list = {next = 0x7f8214469248, prev = 0x7f8214469248}, hash = {
    next = 0x7f8214469258, prev = 0x7f8214469258}, list = {next = 0x7f82144693a0, prev = 0x12bc470}, _ctx = 0x7f8208028b30}
(gdb) p state-?old_xl
A syntax error in expression, near `?old_xl'.
(gdb) p state->old_xl
$17 = (xlator_t *) 0x129a420
(gdb) l fuse_resolve_path_deep

Comment 2 Raghavendra Bhat 2011-11-04 19:14:06 UTC
glusterfs client crashed while iozone was running.

Setup:

2 replica volume

2 fuse clients & 1 nfs client

1st fuse client: sanity script

2nd fuse client: iozone -a (crashed client)

nfs client: fs-perf test

while these tests were just started, enabled quota and put a quota limit pn a directlry (in which all the clients were doing the tests), enabled profile. Found that the 2nd fuse client had crashed with the below backtrace.


Core was generated by `/opt/glusterfs/3.2.5qa5/sbin/glusterfs --log-level=INFO --volfile-id=mirror --v'.
Program terminated with signal 11, Segmentation fault.
#0  uuid_copy (dst=0x19b6bd8 "", src=0x0) at ../../contrib/uuid/copy.c:44
44                      *cp1++ = *cp2++;
#0  uuid_copy (dst=0x19b6bd8 "", src=0x0) at ../../contrib/uuid/copy.c:44
#1  0x00007f700683fd46 in afr_lookup (frame=0x7f7008c985ec, this=0x19a0350, loc=0x7f6ff8000b48, xattr_req=0x199ae50)
    at afr-common.c:1709
#2  0x00007f6fffdf63af in quota_lookup (frame=0x7f7008c9a2c0, this=0x19aadd0, loc=0x7f6ff8000b48, xattr_req=0x199ae50) at quota.c:723
#3  0x00007f7009bd5a6e in default_lookup (frame=<value optimized out>, this=0x19ab930, loc=0x7f6ff8000b48, xattr_req=0x199ae50)
    at defaults.c:1156
#4  0x00007f7009bd5a6e in default_lookup (frame=<value optimized out>, this=0x19ac3d0, loc=0x7f6ff8000b48, xattr_req=0x199ae50)
    at defaults.c:1156
#5  0x00007f70061d3634 in ioc_lookup (frame=0x7f7008c98cf8, this=0x19ace00, loc=0x7f6ff8000b48, xattr_req=0x199ae50)
    at io-cache.c:294
#6  0x00007f7005fc61b6 in qr_lookup (frame=0x7f7008c97ee0, this=0x19ad850, loc=0x7f6ff8000b48, xattr_req=0x199ae50)
    at quick-read.c:534
#7  0x00007f7005db253c in sp_lookup (frame=0x7f7008c9791c, this=0x19ae2a0, loc=0x7f6ff8000b48, xattr_req=0x0) at stat-prefetch.c:1244
#8  0x00007f7005b92c38 in io_stats_lookup (frame=<value optimized out>, this=0x19aef00, loc=0x7f6ff8000b48, xattr_req=0x0)
    at io-stats.c:1745
#9  0x00007f700878307d in fuse_lookup_resume (state=0x7f6ff8000b30) at fuse-bridge.c:260
#10 0x00007f7008770302 in fuse_resolve_done (state=0x19b6bd8) at fuse-resolve.c:641
#11 fuse_resolve_all (state=0x19b6bd8) at fuse-resolve.c:671
#12 0x00007f700877024d in fuse_resolve (state=0x7f6ff8000b30) at fuse-resolve.c:621
#13 0x00007f7008770364 in fuse_resolve_all (state=0x19b6bd8) at fuse-resolve.c:667
#14 0x00007f700876fac6 in fuse_resolve_deep_continue (state=0x7f6ff8000b30) at fuse-resolve.c:209
#15 0x00007f7008770ae1 in fuse_resolve_deep_cbk (frame=<value optimized out>, cookie=0x2, this=<value optimized out>, op_ret=-1, 
    op_errno=<value optimized out>, inode=0x0, buf=0x19b4dc0, xattr=0x0, postparent=0x19b4e30) at fuse-resolve.c:282
#16 0x00007f7005b8fb44 in io_stats_lookup_cbk (frame=0x7f7008c9a928, cookie=<value optimized out>, this=<value optimized out>, 
    op_ret=-1, op_errno=2, inode=0x0, buf=0x19b4dc0, xattr=0x0, postparent=0x19b4e30) at io-stats.c:1391
#17 0x00007f7005db2bf9 in sp_lookup_cbk (frame=0x7f7008c9791c, cookie=<value optimized out>, this=<value optimized out>, op_ret=-1, 
    op_errno=2, inode=0x0, buf=0x19b4dc0, dict=0x0, postparent=0x19b4e30) at stat-prefetch.c:919
#18 0x00007f7005fc41dd in qr_lookup_cbk (frame=0x7f7008c97ee0, cookie=<value optimized out>, this=<value optimized out>, op_ret=-1, 
    op_errno=2, inode=0x0, buf=0x19b4dc0, dict=0x0, postparent=0x19b4e30) at quick-read.c:439
#19 0x00007f70061d687e in ioc_lookup_cbk (frame=0x7f7008c98cf8, cookie=<value optimized out>, this=<value optimized out>, op_ret=-1, 
    op_errno=2, inode=0x0, stbuf=0x19b4dc0, dict=0x0, postparent=0x19b4e30) at io-cache.c:265
#20 0x00007f7009bdfcba in default_lookup_cbk (frame=0x7f7008c9a4ac, cookie=<value optimized out>, this=<value optimized out>, 
    op_ret=-1, op_errno=2, inode=0x0, buf=0x19b4dc0, dict=0x0, postparent=0x19b4e30) at defaults.c:46
#21 0x00007f7009bdfcba in default_lookup_cbk (frame=0x7f7008c99218, cookie=<value optimized out>, this=<value optimized out>, 
    op_ret=-1, op_errno=2, inode=0x0, buf=0x19b4dc0, dict=0x0, postparent=0x19b4e30) at defaults.c:46
#22 0x00007f6fffdf7e2e in quota_lookup_cbk (frame=0x7f7008c9a2c0, cookie=<value optimized out>, this=<value optimized out>, 
    op_ret=-1, op_errno=2, inode=0x0, buf=0x19b4dc0, dict=0x0, postparent=0x19b4e30) at quota.c:669
#23 0x00007f7006840437 in afr_lookup_done (frame=0x7f7008c985ec, cookie=0x0, this=0x19a0350, op_ret=<value optimized out>, 
    op_errno=<value optimized out>, inode=<value optimized out>, buf=0x7fffceb7a270, xattr=0x0, postparent=0x7fffceb7a200)
    at afr-common.c:1424
#24 afr_lookup_cbk (frame=0x7f7008c985ec, cookie=0x0, this=0x19a0350, op_ret=<value optimized out>, op_errno=<value optimized out>, 
    inode=<value optimized out>, buf=0x7fffceb7a270, xattr=0x0, postparent=0x7fffceb7a200) at afr-common.c:1585
#25 0x00007f7006a79316 in client3_1_lookup_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, 
    myframe=0x7f7008c987d8) at client3_1-fops.c:2233
#26 0x00007f70099b0502 in rpc_clnt_handle_reply (clnt=0x19b42e0, pollin=<value optimized out>) at rpc-clnt.c:741
#27 0x00007f70099b06fd in rpc_clnt_notify (trans=<value optimized out>, mydata=0x19b4310, event=<value optimized out>, 
    data=0x390f78ae98) at rpc-clnt.c:854
#28 0x00007f70099ab317 in rpc_transport_notify (this=0x19b6bd8, event=RPC_TRANSPORT_ACCEPT, data=0x390f78ae98) at rpc-transport.c:919
#29 0x00007f70078a85ef in socket_event_poll_in (this=0x19b4450) at socket.c:1647
#30 0x00007f70078a8798 in socket_event_handler (fd=<value optimized out>, idx=3, data=0x19b4450, poll_in=1, poll_out=0, poll_err=0)
    at socket.c:1762
#31 0x00007f7009bf5781 in event_dispatch_epoll_handler (event_pool=0x197b360) at event.c:794
#32 event_dispatch_epoll (event_pool=0x197b360) at event.c:856
#33 0x000000000040566e in main (argc=5, argv=0x7fffceb7ab58) at glusterfsd.c:1509
(gdb) f 0
#0  uuid_copy (dst=0x19b6bd8 "", src=0x0) at ../../contrib/uuid/copy.c:44
44                      *cp1++ = *cp2++;
(gdb) f 1
#1  0x00007f700683fd46 in afr_lookup (frame=0x7f7008c985ec, this=0x19a0350, loc=0x7f6ff8000b48, xattr_req=0x199ae50)
    at afr-common.c:1709
1709            afr_lookup_save_gfid (local->cont.lookup.gfid_req, gfid_req, loc);
(gdb) p gfid_req
$1 = (void *) 0x0
(gdb) p *((afr_local_t *)frame->local)->xattr_req
$4 = {is_static = 0 '\000', hash_size = 1, count = 5, refcount = 1, members = 0x19a2af0, members_list = 0x19a2940, extra_free = 0x0, 
  extra_stdfree = 0x0, lock = 1}
(gdb) p *((afr_local_t *)frame->local)->xattr_req->members_list 
$5 = {hash_next = 0x19a51f0, prev = 0x0, next = 0x19a51f0, value = 0x19a5220, key = 0x19a0cd0 "glusterfs.entrylk-count"}
(gdb) p *((afr_local_t *)frame->local)->xattr_req->members_list->next 
$6 = {hash_next = 0x19a1010, prev = 0x19a2940, next = 0x19a1010, value = 0x19b38d0, key = 0x19a1d30 "glusterfs.inodelk-count"}
(gdb) p *((afr_local_t *)frame->local)->xattr_req->members_list->next->next 
$7 = {hash_next = 0x19a1200, prev = 0x19a51f0, next = 0x19a1200, value = 0x19a0ca0, key = 0x19b38a0 "trusted.afr.mirror-client-1"}
(gdb) p *((afr_local_t *)frame->local)->xattr_req->members_list->next->next->next 
$8 = {hash_next = 0x19a4bd0, prev = 0x19a1010, next = 0x19a4bd0, value = 0x19a5010, key = 0x19a29c0 "trusted.afr.mirror-client-0"}
(gdb) p *((afr_local_t *)frame->local)->xattr_req->members_list->next->next->next->next 
$9 = {hash_next = 0x0, prev = 0x19a1200, next = 0x0, value = 0x1998480, key = 0x19a52e0 "glusterfs.content"}
(gdb) p *((afr_local_t *)frame->local)->xattr_req->members_list->next->next->next->next->next 
Cannot access memory at address 0x0

 ret = dict_get_ptr (local->xattr_req, "gfid-req", gfid_req);
179             if (ret) {
(gdb) 
180                     gf_log (this->name, GF_LOG_DEBUG,
181                             "%s: failed to get the gfid from dict", loc->path);
182                     *gfid_req = NULL;
183             } else {
184                     if (loc->parent != NULL)
185                             dict_del (local->xattr_req, "gfid-

in afr_lookup_xattr_lookup_prepare, gfid can be sent as NULL if it cannot find in the dict. afr_lookup should check gfid for NULL instead of directly dereferencing it.

Comment 3 Anand Avati 2011-11-05 09:56:35 UTC
CHANGE: http://review.gluster.com/676 (Change-Id: I57fe8db82425a064ca01b23a47baca297c67560c) merged in release-3.2 by Vijay Bellur (vijay)

Comment 4 Pranith Kumar K 2011-11-10 08:46:46 UTC
*** Bug 3781 has been marked as a duplicate of this bug. ***

Comment 5 Anand Avati 2011-11-23 05:32:51 UTC
CHANGE: http://review.gluster.com/739 (Change-Id: I6295245a7f40ba4f786f1f9f35b337f3f711128d) merged in master by Vijay Bellur (vijay)


Note You need to log in before you can comment on or make changes to this bug.