Hide Forgot
While running mixed test with tools like iozone,dbench,fileop,bonnie with 4 servers ,1 nfs server ,35 clients. nfs server crashed and core file is available in ec2-184-73-54-166.compute-1.amazonaws.com:/core.1882 ---(gdb) bt full #0 0x00002aaaab43f368 in epoll_wait () from /lib64/libc.so.6 No symbol table info available. #1 0x00002aaaaad0742f in event_dispatch_epoll (event_pool=0x60c8b0) at event.c:850 events = (struct epoll_event *) 0x6255e0 size = 1 i = 1 ret = 0 __FUNCTION__ = "event_dispatch_epoll" #2 0x00002aaaaad077a4 in event_dispatch (event_pool=0x60c8b0) at event.c:975 ret = -1 __FUNCTION__ = "event_dispatch" #3 0x0000000000406344 in main (argc=8, argv=0x7fffb8158228) at glusterfsd.c:1494 ctx = (glusterfs_ctx_t *) 0x60b010 cmd_args = (cmd_args_t *) 0x60b010 pool = (call_pool_t *) 0x60b3c0 stbuf = {st_dev = 750006350, st_ino = 191, st_nlink = 7383384208, st_mode = 2875990392, st_uid = 10922, st_gid = 2863428448, pad0 = 10922, st_rdev = 140736281805216, st_size = 46912496234496, st_blksize = 4199511, st_blocks = 46909632806990, st_atim = {tv_sec = 6442450944, tv_nsec = 0}, st_mtim = { tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 5658608, tv_nsec = 191}, __unused = {5658608, 140736281805078, 140736281805079}} tmp_logfile = '\0' <repeats 1023 times> tmp_logfile_dyn = 0x0 tmp_logfilebase = 0x0 timestr = '\0' <repeats 255 times> utime = 1271850650 tm = (struct tm *) 0x0 ret = 0 lim = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615} specfp = (FILE *) 0x60cbd0 graph = (xlator_t *) 0x61da90 trav = (xlator_t *) 0x0 fuse_volume_found = 0 xl_count = 13 process_mode = 0 '\0' pipe_fd = {6, 7} gf_success = 0 gf_failure = -1 __FUNCTION__ = "main" --- log file: ============ [2010-04-21 09:55:03] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE [2010-04-21 09:55:03] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE [2010-04-21 09:55:03] D [nfs3-helpers.c:2179:nfs3_log_rw_call] nfs-nfsv3: XID: bb87918c, WRITE: args: FH: hashcount 3, xlid 0, gen 5462556942851377093, ino 140837337, offset: 29261824, count: 65536, UNSTABLE [2010-04-21 09:55:03] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE [2010-04-21 09:55:03] D [nfs3-helpers.c:2179:nfs3_log_rw_call] nfs-nfsv3: XID: 9bd01ba, WRITE: args: FH: hashcount 3, xlid 0, gen 5462556942851377087, ino 140837341, offset: 88588288, count: 65536, UNSTABLE [2010-04-21 09:55:04] E [client-protocol.c:313:call_bail] ec2-75-101-197-117.compute-1.amazonaws.com-1: bailing out frame FINODELK(31) frame sent = 2010-04-21 09:24:59. frame-timeout = 1800 pending frames: patchset: git://git.sv.gnu.org/gluster.git signal received: 11 time of crash: 2010-04-21 09:55:04 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.0.0git /lib64/libc.so.6[0x2aaaab39c280] /opt/glusterfs/gnfs/lib/glusterfs/3.0.0git/xlator/protocol/client.so(client_finodelk_cbk+0xe8)[0x2aaaabf498d5] /opt/glusterfs/gnfs/lib/glusterfs/3.0.0git/xlator/protocol/client.so[0x2aaaabf36d68] /opt/glusterfs/gnfs/lib/libglusterfs.so.0(gf_timer_proc+0x182)[0x2aaaaacf0c6a] /lib64/libpthread.so.0[0x2aaaab157367] /lib64/libc.so.6(clone+0x6d)[0x2aaaab43ef7d] ---------
Observations: 1. The crash happens after a call bail out: [2010-04-21 09:55:04] E [client-protocol.c:313:call_bail] ec2-75-101-197-117.compute-1.amazonaws.com-1: bailing out frame FINODELK(31 2. Within client_finodelk_cbk the core points to the STACK_UNWIND. 3. I've checked the sanity of various variables that are inited or use inside STACK_UNWIND. (gdb) p *frame $10 = {root = 0x2aaab012c910, parent = 0x2aaab0170d20, next = 0x2aaab0170d20, prev = 0x2aaab012c998, local = 0x0, this = 0x6142a0, ret = 0x2aaaac189992 <afr_lock_cbk>, ref_count = 0, lock = 1, cookie = 0x0, complete = _gf_true, op = GF_FOP_STAT, begin = {tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}} The following variables are being used from frame: fn = frame->ret; \ _parent = frame->parent; \ _parent->ref_count--; \ old_THIS = THIS; \ THIS = _parent->this; \ frame->complete = _gf_true; \ \ if (((xlator_t *) old_THIS)->ctx->measure_latency) { \ gettimeofday (&frame->end, NULL); \ } \ Printing each of these in order: (gdb) p fn $15 = (ret_fn_t) 0x2aaaac189992 <afr_lock_cbk> (gdb) p _parent $16 = (call_frame_t *) 0x2aaab0170d20 (gdb) p *_parent $17 = {root = 0x2aaab012c910, parent = 0x2aaab02b52c0, next = 0x2aaab02b52c0, prev = 0x2aaab0117b60, local = 0x2aaab010fcc0, this = 0x616210, ret = 0x2aaaac3cffdc <dht_writev_cbk>, ref_count = 0, lock = 1, cookie = 0x2aaab0170d20, complete = _gf_false, op = GF_FOP_STAT, begin = { tv_sec = 0, tv_usec = 0}, end = {tv_sec = 0, tv_usec = 0}} (gdb) p *_parent->this $18 = {name = 0x6161f0 "mirror-1", type = 0x616b70 "cluster/replicate", next = 0x6150b0, prev = 0x616d20, parents = 0x617de0, children = 0x616c60, options = 0x615a30, fops = 0x2aaaac3b30a0, mops = 0x2aaaac3b34e0, cbks = 0x2aaaac3b3360, dumpops = 0x2aaaac3b3340, volume_options = {next = 0x616be0, prev = 0x616be0}, fini = 0x2aaaac16a1de <fini>, init = 0x2aaaac1694fc <init>, notify = 0x2aaaac169175 <notify>, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 45 times>}, ctx = 0x60b010, itable = 0x0, ready = 1 '\001', init_succeeded = 1 '\001', private = 0x620000} (gdb) p old_THIS $19 = (xlator_t *) 0x2aaaaaf34640 (gdb) p *old_THIS $20 = {name = 0x2aaaaad16a51 "glusterfs", type = 0x2aaaaad16a5b "global", next = 0x0, prev = 0x0, parents = 0x0, children = 0x0, options = 0x0, fops = 0x0, mops = 0x0, cbks = 0x0, dumpops = 0x0, volume_options = {next = 0x0, prev = 0x0}, fini = 0, init = 0, notify = 0, latencies = {{min = 0, max = 0, total = 0, std = 0, mean = 0, count = 0} <repeats 45 times>}, ctx = 0x0, itable = 0x0, ready = 0 '\0', init_succeeded = 0 '\0', private = 0x0} old_THIS->ctx is NULL and that is referenced next in: if (((xlator_t *) old_THIS)->ctx->measure_latency) { \ gettimeofday (&frame->end, NULL); \ } \ This may be a hint of the problem. Continuing to investigate.
This results in a crash in STACK_UNWIND after a call bail out. Since the call bail does not occur in the context of an xlator, the old_THIS in STACK_UNWIND is handed a global_xlator which did not have its ctx set, resulting in a NULL de-reference.
PATCH: http://patches.gluster.com/patch/3155 in master (core: Set ctx in global_xlator to prevent NULL ptr reference)
Regression Test: The crash happens due to dereferencing of a call-bail out. We can simulate a call bailout by setting the frame-timeout in proto/client to a very low value so that even a slight amount of IO load can result in a bail-out. Do not use io-threads. It increases concurrency and reduces wait times so it'll be hard to produce a bail-out with that in the graph. Test Case: 1. Before mounting on the NFS client, run the following command: $ echo "128" > /proc/sys/sunrpc/tcp_slot_table_entries This will increase the amount of load generated by the NFS client. 2. Mount the nfs server. You'll obviously have to have a network link between proto/client and server so that the network delay also makes it easier to simulate call bail-out. Feel free to use a simple posix config minus io-threads under the protocol/server. 3. On the NFS mount. Run: $ iozone -t 16 -s 1g -r 1m -F <mountpoint>/f1 <mountpoint>/f2......so on We're starting 16 threads to generate as much load as possible on the disks at the server, so that the increased IO response time combined with the low frame-timeout will cause a call bail-out. When the call bailout happens you'll observe either that the dd fails, or you'll have to manually inspect the log file to confirm, in case the NFS client retransmits the write request.
Verified with nfs beta rc7 ,it's fixed now.