Bug 815028

Summary: [cd24be007c64bd10d8c28e8e9b1c988478a32c8c]: glusterfs client crashed after starting
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Scott Haines <shaines>
Component: glusterfsAssignee: Amar Tumballi <amarts>
Status: CLOSED ERRATA QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: 1.0CC: gluster-bugs, rabhat, sdharane, vbellur, vraman
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 768840 Environment:
Last Closed: 2012-05-01 11:17:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 768840    
Bug Blocks: 811632    

Description Scott Haines 2012-04-22 07:16:38 UTC
+++ This bug was initially created as a clone of Bug #768840 +++

Description of problem:
In the nightly sanity tests glusterfs client crashed due to assert (gfid was NULL).

Version-Release number of selected component (if applicable):
git build with head at cd24be007c64bd10d8c28e8e9b1c988478a32c8c.
How reproducible:
Nightly sanity tests some times crashes.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Backtrace:

59: volume vol
 60:     type debug/io-stats
 61:     option latency-measurement off
 62:     option count-fop-hits off
 63:     subvolumes vol-stat-prefetch
 64: end-volume

+------------------------------------------------------------------------------+
[2011-12-17 05:05:50.937004] I [rpc-clnt.c:1597:rpc_clnt_reconfig] 0-vol-client-0: changing port to 24009 (from 0)
[2011-12-17 05:05:50.937292] I [rpc-clnt.c:1597:rpc_clnt_reconfig] 0-vol-client-1: changing port to 24010 (from 0)
[2011-12-17 05:05:50.937518] I [rpc-clnt.c:1597:rpc_clnt_reconfig] 0-vol-client-2: changing port to 24011 (from 0)
[2011-12-17 05:05:50.937664] I [rpc-clnt.c:1597:rpc_clnt_reconfig] 0-vol-client-3: changing port to 24012 (from 0)
[2011-12-17 05:05:54.936181] I [client-handshake.c:1085:select_server_supported_programs] 0-vol-client-1: Using Program GlusterFS 3git, Num (1
298437), Version (310)
[2011-12-17 05:05:54.936419] I [client-handshake.c:1085:select_server_supported_programs] 0-vol-client-2: Using Program GlusterFS 3git, Num (1
298437), Version (310)
[2011-12-17 05:05:54.936631] I [client-handshake.c:917:client_setvolume_cbk] 0-vol-client-2: Connected to 127.0.0.1:24011, attached to remote 
volume '/export/nightly/data/export3'.
[2011-12-17 05:05:54.936677] I [client-handshake.c:917:client_setvolume_cbk] 0-vol-client-1: Connected to 127.0.0.1:24010, attached to remote 
volume '/export/nightly/data/export2'.
[2011-12-17 05:05:54.936883] I [client-handshake.c:1085:select_server_supported_programs] 0-vol-client-0: Using Program GlusterFS 3git, Num (1
298437), Version (310)
[2011-12-17 05:05:54.936945] I [client-handshake.c:1085:select_server_supported_programs] 0-vol-client-3: Using Program GlusterFS 3git, Num (1
298437), Version (310)
[2011-12-17 05:05:54.940457] I [client-handshake.c:917:client_setvolume_cbk] 0-vol-client-3: Connected to 127.0.0.1:24012, attached to remote 
volume '/export/nightly/data/export4'.
[2011-12-17 05:05:54.940506] I [client-handshake.c:917:client_setvolume_cbk] 0-vol-client-0: Connected to 127.0.0.1:24009, attached to remote 
volume '/export/nightly/data/export1'.
[2011-12-17 05:05:54.946834] I [fuse-bridge.c:3637:fuse_graph_setup] 0-fuse: switched to graph 0
[2011-12-17 05:05:54.946976] I [fuse-bridge.c:3217:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.10
pending frames:
frame : type(1) op(READDIR)
frame : type(1) op(READDIR)

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2011-12-17 05:06:09
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3git
/lib64/libc.so.6[0x3e58c302d0]
/lib64/libc.so.6(gsignal+0x35)[0x3e58c30265]
/lib64/libc.so.6(abort+0x110)[0x3e58c31d10]
/lib64/libc.so.6(__assert_fail+0xf6)[0x3e58c296e6]
/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_stat+0x11e)[0x2aaaab1c0991]
/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client_stat+0x137)[0x2aaaab1acfd6]
/usr/local/lib/glusterfs/3git/xlator/cluster/stripe.so(stripe_readdirp_cbk+0x636)[0x2aaaab3fb4df]
/usr/local/lib/glusterfs/3git/xlator/protocol/client.so(client3_1_readdirp_cbk+0x2f2)[0x2aaaab1be0cd]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x1f4)[0x2ac935e4f25e]
/usr/local/lib/libgfrpc.so.0(rpc_clnt_notify+0x2b9)[0x2ac935e4f5b6]
/usr/local/lib/libgfrpc.so.0(rpc_transport_notify+0x149)[0x2ac935e4b9f3]
/usr/local/lib/glusterfs/3git/rpc-transport/socket.so(socket_event_poll_in+0x4b)[0x2aaaaad7dee3]
/usr/local/lib/glusterfs/3git/rpc-transport/socket.so(socket_event_handler+0x213)[0x2aaaaad7e425]
/usr/local/lib/libglusterfs.so.0[0x2ac935bf3854]
/usr/local/lib/libglusterfs.so.0[0x2ac935bf3a59]
/usr/local/lib/libglusterfs.so.0(event_dispatch+0x85)[0x2ac935bf3db3]
glusterfs(main+0x229)[0x4078c2]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3e58c1d994]
glusterfs[0x403c29]

--- Additional comment from rabhat on 2011-12-19 01:37:51 EST ---

This is the backtrace of the core generated.

Core was generated by `glusterfs --volfile-server=Nightly-Sanity --volfile-id=vol /export/nightly/moun'.
Program terminated with signal 6, Aborted.
#0  0x0000003e58c30265 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x0000003e58c30265 in raise () from /lib64/libc.so.6
#1  0x0000003e58c31d10 in abort () from /lib64/libc.so.6
#2  0x0000003e58c296e6 in __assert_fail () from /lib64/libc.so.6
#3  0x00002aaaab1c0991 in client3_1_stat (frame=0x2b5edec1a5ac, this=0x1cc35f50, data=0x7fff367b0520)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:2676
#4  0x00002aaaab1acfd6 in client_stat (frame=0x2b5edec1a5ac, this=0x1cc35f50, loc=0x7fff367b0680)
    at ../../../../../xlators/protocol/client/src/client.c:296
#5  0x00002aaaab3fb4df in stripe_readdirp_cbk (frame=0x2b5edec1a454, cookie=0x2b5edec1a500, this=0x1cc3a110, op_ret=9, op_errno=2, 
    orig_entries=0x7fff367b0790) at ../../../../../xlators/cluster/stripe/src/stripe.c:4226
#6  0x00002aaaab1be0cd in client3_1_readdirp_cbk (req=0x2aaaacae904c, iov=0x2aaaacae908c, count=1, myframe=0x2b5edec1a500)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1955
#7  0x00002b5eddb2725e in rpc_clnt_handle_reply (clnt=0x1cc4df30, pollin=0x1cc4f140) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:789
#8  0x00002b5eddb275b6 in rpc_clnt_notify (trans=0x1cc4e200, mydata=0x1cc4df60, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1cc4f140)
    at ../../../../rpc/rpc-lib/src/rpc-clnt.c:908
#9  0x00002b5eddb239f3 in rpc_transport_notify (this=0x1cc4e200, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x1cc4f140)
    at ../../../../rpc/rpc-lib/src/rpc-transport.c:498
#10 0x00002aaaaad7dee3 in socket_event_poll_in (this=0x1cc4e200) at ../../../../../rpc/rpc-transport/socket/src/socket.c:1675
#11 0x00002aaaaad7e425 in socket_event_handler (fd=14, idx=4, data=0x1cc4e200, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../../rpc/rpc-transport/socket/src/socket.c:1790
#12 0x00002b5edd8cb854 in event_dispatch_epoll_handler (event_pool=0x1cc2a9a0, events=0x1cc30310, i=0)
    at ../../../libglusterfs/src/event.c:794
#13 0x00002b5edd8cba59 in event_dispatch_epoll (event_pool=0x1cc2a9a0) at ../../../libglusterfs/src/event.c:856
#14 0x00002b5edd8cbdb3 in event_dispatch (event_pool=0x1cc2a9a0) at ../../../libglusterfs/src/event.c:956
#15 0x00000000004078c2 in main (argc=6, argv=0x7fff367b0d98) at ../../../glusterfsd/src/glusterfsd.c:1601
(gdb) f 3
#3  0x00002aaaab1c0991 in client3_1_stat (frame=0x2b5edec1a5ac, this=0x1cc35f50, data=0x7fff367b0520)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:2676
2676            GF_ASSERT_AND_GOTO_WITH_ERROR (this->name,
(gdb) l
2671            if (!uuid_is_null (args->loc->inode->gfid))
2672                    memcpy (req.gfid,  args->loc->inode->gfid, 16);
2673            else
2674                    memcpy (req.gfid, args->loc->gfid, 16);
2675
2676            GF_ASSERT_AND_GOTO_WITH_ERROR (this->name,
2677                                           !uuid_is_null (*((uuid_t*)req.gfid)),
2678                                           unwind, op_errno, EINVAL);
2679            req.path = (char *)args->loc->path;
2680            conf = this->private;
(gdb)  p req.gfid
$1 = '\000' <repeats 15 times>
(gdb) p *args->loc
$2 = {path = 0x1cc51cd0 "/run8724/coverage/dir/hardlink", name = 0x1cc51ce6 "hardlink", inode = 0x2aaaacdb329c, parent = 0x2aaaacdb3208, 
  gfid = '\000' <repeats 15 times>, pargfid = '\000' <repeats 15 times>}
(gdb) p *args->loc->inode
$3 = {table = 0x1cc50270, gfid = '\000' <repeats 15 times>, lock = 1, nlookup = 0, ref = 1, ia_type = IA_INVAL, fd_list = {
    next = 0x2aaaacdb32cc, prev = 0x2aaaacdb32cc}, dentry_list = {next = 0x2aaaacdb32dc, prev = 0x2aaaacdb32dc}, hash = {
    next = 0x2aaaacdb32ec, prev = 0x2aaaacdb32ec}, list = {next = 0x2aaaacdb3268, prev = 0x1cc502d0}, _ctx = 0x1cc53870}
(gdb) f 6
#6  0x00002aaaab1be0cd in client3_1_readdirp_cbk (req=0x2aaaacae904c, iov=0x2aaaacae908c, count=1, myframe=0x2b5edec1a500)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1955
1955            STACK_UNWIND_STRICT (readdirp, frame, rsp.op_ret,
(gdb) up
#7  0x00002b5eddb2725e in rpc_clnt_handle_reply (clnt=0x1cc4df30, pollin=0x1cc4f140) at ../../../../rpc/rpc-lib/src/rpc-clnt.c:789
789             req->cbkfn (req, req->rsp, req->rspcnt, saved_frame->frame);
(gdb) down
#6  0x00002aaaab1be0cd in client3_1_readdirp_cbk (req=0x2aaaacae904c, iov=0x2aaaacae908c, count=1, myframe=0x2b5edec1a500)
    at ../../../../../xlators/protocol/client/src/client3_1-fops.c:1955
1955            STACK_UNWIND_STRICT (readdirp, frame, rsp.op_ret,
(gdb) down
#5  0x00002aaaab3fb4df in stripe_readdirp_cbk (frame=0x2b5edec1a454, cookie=0x2b5edec1a500, this=0x1cc3a110, op_ret=9, op_errno=2, 
    orig_entries=0x7fff367b0790) at ../../../../../xlators/cluster/stripe/src/stripe.c:4226
4226                            STACK_WIND_COOKIE (frame, stripe_readdirp_entry_stat_cbk,
(gdb) inode = inode_new (local->fd->inode->table);
(gdb) 
4205                    if (!inode)
4206                            goto out;
4207
4208                    loc.inode = inode;
4209                    loc.parent = local->fd->inode;
4210                    ret = inode_path (local->fd->inode, local_entry->d_name, &path);
4211                    if (ret != -1) {
4212                            loc.path = path;
4213                    } else  if (inode) {
4214                            ret = inode_path (inode, NULL, &path);
(gdb) 
4215                            if (ret != -1) {
4216                                    loc.path = path;
4217                            } else {
4218                                    goto out;
4219                            }
4220                    }
4221
4222                    loc.name = strrchr (loc.path, '/');
4223                    loc.name++;
4224                    trav = this->children;
(gdb) 
4225                    while (trav) {
4226                            STACK_WIND_COOKIE (frame, stripe_readdirp_entry_stat_cbk,
4227                                               local_entry, trav->xlator,
4228                                               trav->xlator->fops->stat, &loc)


In stripe_readdirp_cbk we are creating a new inode, without any gfid set and stackwinding to the below xlators. protocol/client cant see the gfid in both loc and loc->inode and thus asserts.

--- Additional comment from aavati on 2011-12-19 11:58:21 EST ---

CHANGE: http://review.gluster.com/806 (cluster/stripe: copy the gfid from the dentry structure to loc) merged in master by Vijay Bellur (vijay)

--- Additional comment from aavati on 2012-01-04 02:09:52 EST ---

CHANGE: http://review.gluster.com/2544 (cluster/stripe: copy the gfid from the dentry structure to loc) merged in release-3.2 by Vijay Bellur (vijay)

Comment 3 errata-xmlrpc 2012-05-01 11:17:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0538.html