Bug 763043 (GLUSTER-1311) - crash during nfs alpha test
Summary: crash during nfs alpha test
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: GLUSTER-1311
Product: GlusterFS
Classification: Community
Component: protocol
Version: 3.1-alpha
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Amar Tumballi
QA Contact:
URL:
Whiteboard:
: GLUSTER-1354 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-08-10 05:05 UTC by Lakshmipathi G
Modified: 2015-12-01 16:45 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
Regression: RTP
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Lakshmipathi G 2010-08-10 05:05:21 UTC
While running nfs alpha mixed tests with 4 glusterfs servers and 4 gnfs servers and 8 clients . Following glusterfs core found.
--------------
(gdb) bt full
#0  0x00002aaaaacee419 in hash_dentry (parent=0x6ecf68, name=0x0, mod=14057) at inode.c:63
	hash = 0
	ret = 0
#1  0x00002aaaaacef59e in __dentry_grep (table=0x63d428, parent=0x6ecf68, name=0x0) at inode.c:565
	hash = 0
	dentry = (dentry_t *) 0x0
	tmp = (dentry_t *) 0x0
#2  0x00002aaaaacef666 in inode_grep (table=0x63d428, parent=0x6ecf68, name=0x0) at inode.c:586
	inode = (inode_t *) 0x0
	dentry = (dentry_t *) 0x0
#3  0x00002aaaacb95069 in resolve_entry_simple (frame=0x2aaab4553500) at server-resolve.c:364
	state = (server_state_t *) 0x2aaab4686f08
	this = (xlator_t *) 0x6346d8
	resolve = (server_resolve_t *) 0x2aaab4686f78
	parent = (inode_t *) 0x6ecf68
	inode = (inode_t *) 0x0
	ret = 0
	__FUNCTION__ = "resolve_entry_simple"
#4  0x00002aaaacb951ed in server_resolve_entry (frame=0x2aaab4553500) at server-resolve.c:419
	state = (server_state_t *) 0x2aaab4686f08
	ret = 0
	loc = (loc_t *) 0x2aaab4686f28
#5  0x00002aaaacb95524 in server_resolve (frame=0x2aaab4553500) at server-resolve.c:548
	state = (server_state_t *) 0x2aaab4686f08
	resolve = (server_resolve_t *) 0x2aaab4686f78
#6  0x00002aaaacb95655 in server_resolve_all (frame=0x2aaab4553500) at server-resolve.c:605
	state = (server_state_t *) 0x2aaab4686f08
	this = (xlator_t *) 0x6346d8
	__FUNCTION__ = "server_resolve_all"
#7  0x00002aaaacb9574d in resolve_and_resume (frame=0x2aaab4553500, fn=0x2aaaacba5ff9 <server_lookup_resume>) at server-resolve.c:635
	state = (server_state_t *) 0x2aaab4686f08
#8  0x00002aaaacbab37b in server_lookup (req=0x2aaaabb85948) at server3_1-fops.c:4813
	frame = (call_frame_t *) 0x2aaab4553500
	conn = (server_connection_t *) 0x64acb8
	state = (server_state_t *) 0x2aaab4686f08
	xattr_req = (dict_t *) 0x0
	buf = 0x0
	args = {gfs_id = 27, ino = 0, par = 37684044, gen = 5502220811211473897, flags = 0, path = 0x7fff663d4e20 "", bname = 0x7fff663d0e20 "", dict = {
    dict_len = 0, dict_val = 0x7fff663cce20 ""}}
	ret = 0
	path = '\0' <repeats 16383 times>
	bname = '\0' <repeats 16383 times>
	dict_val = '\0' <repeats 16383 times>
	__FUNCTION__ = "server_lookup"
#9  0x00002aaaaaf2bde5 in rpcsvc_handle_rpc_call (conn=0x6568e8, msg=0x2aaab4436bb8) at rpcsvc.c:1195
	actor = (rpcsvc_actor_t *) 0x2aaaacdb4bc0
	req = (rpcsvc_request_t *) 0x2aaaabb85948
	ret = -1
	__FUNCTION__ = "rpcsvc_handle_rpc_call"
#10 0x00002aaaaaf2bfc6 in rpcsvc_notify (trans=0x65fae8, mydata=0x6568e8, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2aaab4436bb8) at rpcsvc.c:1241
	conn = (rpcsvc_conn_t *) 0x6568e8
	ret = -1
	msg = (rpc_transport_pollin_t *) 0x2aaab4436bb8
	new_trans = (rpc_transport_t *) 0x0
	__FUNCTION__ = "rpcsvc_notify"
#11 0x00002aaaaaf3109b in rpc_transport_notify (this=0x65fae8, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2aaab4436bb8) at rpc-transport.c:1239
	ret = -1
#12 0x00002aaaacfbe7c3 in socket_event_poll_in (this=0x65fae8) at socket.c:1406
	ret = 0
	pollin = (rpc_transport_pollin_t *) 0x2aaab4436bb8
#13 0x00002aaaacfbeab7 in socket_event_handler (fd=8, idx=3, data=0x65fae8, poll_in=1, poll_out=0, poll_err=0) at socket.c:1512
	this = (rpc_transport_t *) 0x65fae8
	priv = (socket_private_t *) 0x65acc8
	ret = 0
	__FUNCTION__ = "socket_event_handler"
#14 0x00002aaaaad038e4 in event_dispatch_epoll_handler (event_pool=0x62b218, events=0x63a988, i=0) at event.c:812
	event_data = (struct event_data *) 0x63a98c
	handler = (event_handler_t) 0x2aaaacfbea00 <socket_event_handler>
	data = (void *) 0x65fae8
	idx = 3
	ret = -1
	__FUNCTION__ = "event_dispatch_epoll_handler"
#15 0x00002aaaaad03ac5 in event_dispatch_epoll (event_pool=0x62b218) at event.c:876
	events = (struct epoll_event *) 0x63a988
	size = 1
	i = 0
	ret = 1
	__FUNCTION__ = "event_dispatch_epoll"
#16 0x00002aaaaad03ddb in event_dispatch (event_pool=0x62b218) at event.c:984
	ret = -1
	__FUNCTION__ = "event_dispatch"
#17 0x0000000000405062 in main (argc=7, argv=0x7fff663d9368) at glusterfsd.c:1273
	ctx = (glusterfs_ctx_t *) 0x629010
	ret = 0
(gdb) 


==========
log file-
=======
[2010-08-06 18:25:28.511788] T [server3_1-fops.c:254:server_inodelk_cbk] server-tcp: 10758552: INODELK /nfsalpha2/ip-10-245-210-193/test7/linux-2.6.35/arch/mn10300/include/asm/page_offset.h (41338621) ==> -1 (Success)
[2010-08-06 18:25:28.511808] T [rpcsvc.c:1513:rpcsvc_submit_generic] rpc-service: Tx message: 16
[2010-08-06 18:25:28.511825] T [rpcsvc.c:1319:rpcsvc_record_build_header] rpc-service: Reply fraglen 40, payload: 16, rpc hdr: 24
[2010-08-06 18:25:28.520723] T [rpcsvc-auth.c:276:rpcsvc_auth_request_init] rpc-service: Auth handler: AUTH_GLUSTERFS
[2010-08-06 18:25:28.599687] T [rpcsvc.c:1119:rpcsvc_request_create] rpc-service: RPC XID: b0dbdc, Ver: 2, Program: 1298437, ProgVers: 310, Proc: 27
[2010-08-06 18:25:28.594181] T [rpcsvc.c:1513:rpcsvc_submit_generic] rpc-service: Tx message: 332
[2010-08-06 18:25:28.593957] T [rpcsvc.c:1513:rpcsvc_submit_generic] rpc-service: Tx message: 200
[2010-08-06 18:25:28.599733] T [auth-glusterfs.c:176:auth_glusterfs_authenticate] rpc-service: Auth Info: pid: 0, uid: 0, gid: 0, owner: 0
[2010-08-06 18:25:28.599786] T [rpcsvc.c:955:rpcsvc_program_actor] rpc-service: Actor found: GlusterFS-3.1.0 - LOOKUP
pending frames:

patchset: v3.1.0qa3
signal received: 11
time of crash: 2010-08-06 18:25:28
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.1.0qa3
[2010-08-06 18:25:28.599752] T [rpcsvc.c:1319:rpcsvc_record_build_header] rpc-service: Reply fraglen 224, payload: 200, rpc hdr: 24
[2010-08-06 18:25:28.599727] T [rpcsvc.c:1319:rpcsvc_record_build_header] rpc-service: Reply fraglen 356, payload: 332, rpc hdr: 24
/lib64/libc.so.6[0x2aaaab7aaf30]
/opt/glusterfs/3.1.0qa3//lib/libglusterfs.so.0[0x2aaaaacee419]
/opt/glusterfs/3.1.0qa3//lib/libglusterfs.so.0(__dentry_grep+0x42)[0x2aaaaacef59e]
/opt/glusterfs/3.1.0qa3//lib/libglusterfs.so.0(inode_grep+0x3e)[0x2aaaaacef666]
/opt/glusterfs/3.1.0qa3//lib/glusterfs/3.1.0qa3/xlator/protocol/server.so(resolve_entry_simple+0x1c9)[0x2aaaacb95069]
/opt/glusterfs/3.1.0qa3//lib/glusterfs/3.1.0qa3/xlator/protocol/server.so(server_resolve_entry+0x4a)[0x2aaaacb951ed]
/opt/glusterfs/3.1.0qa3//lib/glusterfs/3.1.0qa3/xlator/protocol/server.so(server_resolve+0x69)[0x2aaaacb95524]
/opt/glusterfs/3.1.0qa3//lib/glusterfs/3.1.0qa3/xlator/protocol/server.so(server_resolve_all+0x76)[0x2aaaacb95655]
/opt/glusterfs/3.1.0qa3//lib/glusterfs/3.1.0qa3/xlator/protocol/server.so(resolve_and_resume+0x3c)[0x2aaaacb9574d]
/opt/glusterfs/3.1.0qa3//lib/glusterfs/3.1.0qa3/xlator/protocol/server.so(server_lookup+0x38b)[0x2aaaacbab37b]
/opt/glusterfs/3.1.0qa3//lib/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x19b)[0x2aaaaaf2bde5]
/opt/glusterfs/3.1.0qa3//lib/libgfrpc.so.0(rpcsvc_notify+0x167)[0x2aaaaaf2bfc6]
/opt/glusterfs/3.1.0qa3//lib/libgfrpc.so.0(rpc_transport_notify+0xd8)[0x2aaaaaf3109b]
/opt/glusterfs/3.1.0qa3//lib/glusterfs/3.1.0qa3/rpc-transport/socket.so(socket_event_poll_in+0x4b)[0x2aaaacfbe7c3]
/opt/glusterfs/3.1.0qa3//lib/glusterfs/3.1.0qa3/rpc-transport/socket.so(socket_event_handler+0xb7)[0x2aaaacfbeab7]
/opt/glusterfs/3.1.0qa3//lib/libglusterfs.so.0[0x2aaaaad038e4]
/opt/glusterfs/3.1.0qa3//lib/libglusterfs.so.0[0x2aaaaad03ac5]
/opt/glusterfs/3.1.0qa3//lib/libglusterfs.so.0(event_dispatch+0x73)[0x2aaaaad03ddb]
/opt/glusterfs/3.1.0qa3/sbin/glusterfsd(main+0xec)[0x405062]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2aaaab798074]
/opt/glusterfs/3.1.0qa3/sbin/glusterfsd[0x4027a9]
---------

Comment 1 Shehjar Tikoo 2010-08-10 07:07:35 UTC
Looks like resolve_and_resume path does not set bname resulting in a crash. Re-assigning to Avati.

Comment 2 Anand Avati 2010-08-12 07:55:07 UTC
PATCH: http://patches.gluster.com/patch/4086 in master (argument sanity checks added in inode.c)

Comment 3 Shehjar Tikoo 2010-08-13 03:32:30 UTC
hey amar, We just saw this crash again which made me think. I dont understand how this patch will fix the problem of proto/server calling the resolve function with bname as NULL. Sure, the sanity checks will fix the NULL dereference and the crash but the fop overall will fail because the resolution will fail because of bname being NULL. Comments?

Comment 4 Shehjar Tikoo 2010-08-13 03:36:02 UTC
*** Bug 1354 has been marked as a duplicate of this bug. ***

Comment 5 Amar Tumballi 2010-08-13 03:39:12 UTC
(In reply to comment #3)
> hey amar, We just saw this crash again which made me think. I dont understand
> how this patch will fix the problem of proto/server calling the resolve
> function with bname as NULL. Sure, the sanity checks will fix the NULL
> dereference and the crash but the fop overall will fail because the resolution
> will fail because of bname being NULL. Comments?

Actually, if you want to do a inode_grep (), then bname should never be NULL. Yes, its true that the root cause for this is not yet fixed. But the sanity checks will make sure that the core is not crashing for some mistakes in higher layer. We can open new bug for 'fop' failing and have to investigate why 'bname' is NULL.


Note You need to log in before you can comment on or make changes to this bug.