This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 763047 - (GLUSTER-1315) gnfs crash with fileop in client xdr decode path
gnfs crash with fileop in client xdr decode path
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: nfs (Show other bugs)
3.1-alpha
All Linux
low Severity high
: ---
: ---
Assigned To: Amar Tumballi
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-08-10 02:02 EDT by Lakshmipathi G
Modified: 2015-12-01 11:45 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: ---
Regression: RTP
Mount Type: nfs
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Lakshmipathi G 2010-08-09 23:02:21 EDT
part of log file can be found under /share/tickets/1315/gnfs.tar
Comment 1 Lakshmipathi G 2010-08-10 02:02:05 EDT
while running mixed (fileop) tests on 4glfs server + 4 gnfs server and 8 clients with glusterfs-3.1.0qa3 - gnfs server crashed.

(gdb) bt full
#0  0x00002aaaab8786ab in xdrmem_getint32 () from /lib64/libc.so.6
No symbol table info available.
#1  0x00002aaaab87e97c in xdr_u_quad_t () from /lib64/libc.so.6
No symbol table info available.
#2  0x00002aaaab14560f in xdr_gf_common_rsp (xdrs=0x7fffabb00570, objp=0x7fffabb00610) at glusterfs3-xdr.c:1659
No locals.
#3  0x00002aaaaaf34d16 in xdr_to_generic (inmsg={iov_base = 0x100000000000, iov_len = 16}, args=0x7fffabb00610, proc=0x2aaaab1455f2 <xdr_gf_common_rsp>)
    at rpc-common.c:40
	xdr = {x_op = XDR_DECODE, x_ops = 0x2aaaabac8880, x_public = 0x2aaa06872d12 <Address 0x2aaa06872d12 out of bounds>, 
  x_private = 0x100000000004 <Address 0x100000000004 out of bounds>, x_base = 0x100000000000 <Address 0x100000000000 out of bounds>, x_handy = 12}
	ret = -1
#4  0x00002aaaab147ae7 in xdr_to_common_rsp (outmsg={iov_base = 0x100000000000, iov_len = 16}, rsp=0x7fffabb00610) at glusterfs3.c:1126
No locals.
#5  0x00002aaaac34832e in client3_1_entrylk_cbk (req=0x2aaaad98d3e0, iov=0x2aaaad98d420, count=1, myframe=0x2f907d20) at client3_1-fops.c:1124
	frame = (call_frame_t *) 0x2f907d20
	rsp = {gfs_id = 0, op_ret = 0, op_errno = 0}
	ret = 0
	__FUNCTION__ = "client3_1_entrylk_cbk"
#6  0x00002aaaaaf33044 in rpc_clnt_handle_reply (clnt=0x638ba0, pollin=0x2f9170f0) at rpc-clnt.c:672
	conn = (rpc_clnt_connection_t *) 0x638bd0
	saved_frame = (struct saved_frame *) 0x2aaaada4b2b0
	request_info = (rpc_request_info_t *) 0x2f913e50
	ret = 0
	req = (struct rpc_req *) 0x2aaaad98d3e0
	__FUNCTION__ = "rpc_clnt_handle_reply"
#7  0x00002aaaaaf33355 in rpc_clnt_notify (trans=0x638d30, mydata=0x638bd0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2f9170f0) at rpc-clnt.c:781
	conn = (rpc_clnt_connection_t *) 0x638bd0
	clnt = (struct rpc_clnt *) 0x638ba0
	ret = -1
	req_info = (rpc_request_info_t *) 0x0
	pollin = (rpc_transport_pollin_t *) 0x2f9170f0
	tv = {tv_sec = 0, tv_usec = 0}
#8  0x00002aaaaaf3109b in rpc_transport_notify (this=0x638d30, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2f9170f0) at rpc-transport.c:1239
	ret = -1
#9  0x00002aaaaccc27c3 in socket_event_poll_in (this=0x638d30) at socket.c:1406
	ret = 0
	pollin = (rpc_transport_pollin_t *) 0x2f9170f0
#10 0x00002aaaaccc2ab7 in socket_event_handler (fd=9, idx=1, data=0x638d30, poll_in=1, poll_out=0, poll_err=0) at socket.c:1512
	this = (rpc_transport_t *) 0x638d30
	priv = (socket_private_t *) 0x6390c0
	ret = 0
	__FUNCTION__ = "socket_event_handler"
#11 0x00002aaaaad038e4 in event_dispatch_epoll_handler (event_pool=0x629330, events=0x639c00, i=0) at event.c:812
	event_data = (struct event_data *) 0x639c04
	handler = (event_handler_t) 0x2aaaaccc2a00 <socket_event_handler>
	data = (void *) 0x638d30
	idx = 1
	ret = -1
	__FUNCTION__ = "event_dispatch_epoll_handler"
#12 0x00002aaaaad03ac5 in event_dispatch_epoll (event_pool=0x629330) at event.c:876
	events = (struct epoll_event *) 0x639c00
	size = 1
	i = 0
	ret = 1
	__FUNCTION__ = "event_dispatch_epoll"
#13 0x00002aaaaad03ddb in event_dispatch (event_pool=0x629330) at event.c:984
	ret = -1
	__FUNCTION__ = "event_dispatch"
#14 0x0000000000405062 in main (argc=7, argv=0x7fffabb00b58) at glusterfsd.c:1273
	ctx = (glusterfs_ctx_t *) 0x629010
	ret = 0



log file-
================================
[2010-08-09 10:29:37.226035] T [nfs3.c:1039:nfs3_lookup] nfs-nfsv3: FH to Volume: mirror
[2010-08-09 10:29:37.226050] T [nfs3-helpers.c:2940:nfs3_fh_resolve_entry_hard] nfs-nfsv3: FH hard resolution: ino: 205328953, gen: 5502220922886637152, entry: fileop_L1_58_L2_17, hashidx: 0
[2010-08-09 10:29:37.226082] T [nfs.c:409:nfs_user_create] nfs: uid: 0, gid 0, gids: 0
[2010-08-09 10:29:37.226098] T [nfs-fops.c:279:nfs_fop_lookup] nfs: Lookup: /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17
[2010-08-09 10:29:37.226144] T [rpc-clnt.c:1164:rpc_clnt_record] : Auth Info: pid: 0, uid: 0, gid: 0, owner: 0
[2010-08-09 10:29:37.226162] T [rpc-clnt.c:1056:rpc_clnt_record_build_header] rpc-clnt: Request fraglen 640, payload: 512, rpc hdr: 128
[2010-08-09 10:29:37.227409] T [rpc-clnt.c:596:rpc_clnt_reply_init] rpc-clnt: RPC XID: 109522193 Program: GlusterFS 3.1, ProgVers: 310, Proc: 27
[2010-08-09 10:29:37.227446] D [afr.c:533:afr_lookup_collect_xattr] mirror: entry self-heal is pending for /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17.
[2010-08-09 10:29:37.227468] N [afr.c:688:afr_lookup_done] mirror: background  entry self-heal triggered. path: /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17
[2010-08-09 10:29:37.227484] T [afr-self-heal-common.c:1574:afr_self_heal] mirror: performing self heal on /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17 (metadata=0 data=0 entry=1)
[2010-08-09 10:29:37.227508] T [afr-self-heal-common.c:1622:afr_self_heal] mirror: proceeding to metadata check on /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17
[2010-08-09 10:29:37.227523] T [afr-self-heal-common.c:781:afr_sh_missing_entries_done] mirror: proceeding to metadata check on /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17
[2010-08-09 10:29:37.227538] T [afr-self-heal-metadata.c:95:afr_sh_metadata_done] mirror: proceeding to entry check on /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17
[2010-08-09 10:29:37.227553] T [afr-self-heal-entry.c:2514:afr_sh_entry_lock] mirror: locking /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17 on subvolume ec2-204-236-197-168.compute-1.amazonaws.com-1
[2010-08-09 10:29:37.227573] T [rpc-clnt.c:1164:rpc_clnt_record] : Auth Info: pid: 0, uid: 0, gid: 0, owner: 0
[2010-08-09 10:29:37.227590] T [rpc-clnt.c:1056:rpc_clnt_record_build_header] rpc-clnt: Request fraglen 256, payload: 128, rpc hdr: 128
[2010-08-09 10:29:37.227872] T [rpc-clnt.c:596:rpc_clnt_reply_init] rpc-clnt: RPC XID: 109522194 Program: GlusterFS 3.1, ProgVers: 310, Proc: 31
pending frames:

patchset: v3.1.0qa3
signal received: 11
time of crash: 2010-08-09 10:29:37
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.1.0qa3
/lib64/libc.so.6[0x2aaaab7aaf30]
/lib64/libc.so.6[0x2aaaab8786ab]
/lib64/libc.so.6(xdr_int64_t+0x8c)[0x2aaaab87e97c]
/opt/glusterfs/3.1.0qa3//lib/libgfxdr.so.0(xdr_gf_common_rsp+0x1d)[0x2aaaab14560f]
/opt/glusterfs/3.1.0qa3//lib/libgfrpc.so.0(xdr_to_generic+0x75)[0x2aaaaaf34d16]
/opt/glusterfs/3.1.0qa3//lib/libgfxdr.so.0(xdr_to_common_rsp+0x35)[0x2aaaab147ae7]
/opt/glusterfs/3.1.0qa3//lib/glusterfs/3.1.0qa3/xlator/protocol/client.so(client3_1_entrylk_cbk+0x6e)[0x2aaaac34832e]
/opt/glusterfs/3.1.0qa3//lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x1f1)[0x2aaaaaf33044]
/opt/glusterfs/3.1.0qa3//lib/libgfrpc.so.0(rpc_clnt_notify+0x270)[0x2aaaaaf33355]
/opt/glusterfs/3.1.0qa3//lib/libgfrpc.so.0(rpc_transport_notify+0xd8)[0x2aaaaaf3109b]
/opt/glusterfs/3.1.0qa3//lib/glusterfs/3.1.0qa3/rpc-transport/socket.so(socket_event_poll_in+0x4b)[0x2aaaaccc27c3]
/opt/glusterfs/3.1.0qa3//lib/glusterfs/3.1.0qa3/rpc-transport/socket.so(socket_event_handler+0xb7)[0x2aaaaccc2ab7]
/opt/glusterfs/3.1.0qa3//lib/libglusterfs.so.0[0x2aaaaad038e4]
/opt/glusterfs/3.1.0qa3//lib/libglusterfs.so.0[0x2aaaaad03ac5]
/opt/glusterfs/3.1.0qa3//lib/libglusterfs.so.0(event_dispatch+0x73)[0x2aaaaad03ddb]
/opt/glusterfs/3.1.0qa3/sbin/glusterfsd(main+0xec)[0x405062]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2aaaab798074]
/opt/glusterfs/3.1.0qa3/sbin/glusterfsd[0x4027a9]
---------
================================
Comment 2 Amar Tumballi 2010-08-13 06:31:15 EDT
here check frame 4

#4  0x00002aaaab147ae7 in xdr_to_common_rsp (outmsg={iov_base = 0x100000000000,

Clearly shows that outmsg is corrupted (or assigned)
Comment 3 Lakshmipathi G 2010-08-15 22:42:18 EDT
While testing fileop with 4x3 distributed replicate with 3.1.0qa5
It passed first time when i ran  with lesser depth value
/opt/qa/tools/fileop -f 30 -t

But ran again with higher depth it failed and gnfs crashes.

Log file can be found /share/tickets/1315/1315.log

core file-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(gdb) bt full
#0  0x00002aaaab568343 in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00002aaaaacee55d in inode_ref (inode=0x4514a70) at inode.c:478
	table = (inode_table_t *) 0x0
#2  0x00002aaaaace0463 in loc_copy (dst=0x4514a78, src=0x2aaaaac4b408) at xlator.c:1023
	ret = -1
#3  0x00002aaaacc8c3ce in dht_lookup (frame=0x2aaaac3354a0, this=0x63d050, loc=0x2aaaaac4b408, xattr_req=0x0) at dht-common.c:815
	subvol = (xlator_t *) 0x0
	hashed_subvol = (xlator_t *) 0x0
	cached_subvol = (xlator_t *) 0x0
	local = (dht_local_t *) 0x4514a70
	conf = (dht_conf_t *) 0x63f920
	ret = -1
	op_errno = -1
	layout = (dht_layout_t *) 0x0
	i = 0
	call_cnt = 0
	__FUNCTION__ = "dht_lookup"
#4  0x00002aaaacec3518 in nfs_fop_lookup (nfsx=0x63e080, xl=0x63d050, nfu=0x40affd10, loc=0x2aaaaac4b408, cbk=0x2aaaacee3cc8 <nfs3_fh_resolve_entry_lookup_cbk>, 
    local=0x2aaaaac4b020) at nfs-fops.c:284
	_new = (call_frame_t *) 0x2aaaac3354a0
	old_THIS = (xlator_t *) 0x2aaaaaf22560
	tmp_cbk = (fop_lookup_cbk_t) 0x2aaaacec2e21 <nfs_fop_lookup_cbk>
	frame = (call_frame_t *) 0x2aaaac536e78
	ret = -14
	nfl = (struct nfs_fop_local *) 0x2aaaaab25280
	__FUNCTION__ = "nfs_fop_lookup"
#5  0x00002aaaacecb58a in nfs_lookup (nfsx=0x63e080, xl=0x63d050, nfu=0x40affd10, pathloc=0x2aaaaac4b408, cbk=0x2aaaacee3cc8 <nfs3_fh_resolve_entry_lookup_cbk>, 
    local=0x2aaaaac4b020) at nfs-generics.c:86
	ret = -14
#6  0x00002aaaacee5097 in nfs3_fh_resolve_entry_hard (cs=0x2aaaaac4b020) at nfs3-helpers.c:2949
	ret = -2
	nfu = {uid = 0, gids = {0 <repeats 17 times>}, ngrps = 1}
	__FUNCTION__ = "nfs3_fh_resolve_entry_hard"
#7  0x00002aaaacee5232 in nfs3_fh_resolve_entry (cs=0x2aaaaac4b020) at nfs3-helpers.c:2995
	ret = -14
#8  0x00002aaaacee537b in nfs3_fh_resolve_and_resume (cs=0x2aaaaac4b020, fh=0x40affed0, entry=0x40afff10 "fileop_dir_29_8_13", 
Missing separate debuginfos, use: debuginfo-install gcc.x86_64 glibc.x86_64
 resum_fn=0x2aaaaced5838 <nfs3_mkdir_resume>) at nfs3-helpers.c:3022
	ret = -14
#9  0x00002aaaaced5d68 in nfs3_mkdir (req=0x67f230, dirfh=0x40affed0, name=0x40afff10 "fileop_dir_29_8_13", sattr=0x40affe88) at nfs3.c:2439
	vol = (xlator_t *) 0x63d050
	stat = NFS3ERR_SERVERFAULT
	ret = -14
	nfs3 = (struct nfs3_state *) 0x65a270
	cs = (nfs3_call_state_t *) 0x2aaaaac4b020
	__FUNCTION__ = "nfs3_mkdir"
#10 0x00002aaaaced5ece in nfs3svc_mkdir (req=0x67f230) at nfs3.c:2473
	name = "fileop_dir_29_8_13\000\000\000\000\000\000\000 �@\000\000\000\000\000\020\000\000\000\000\000\000,\017\177��*\000\000\000\020\000\000\000\000\000\000\200٬��*\000\000`\002�@", '\0' <repeats 21 times>, " �@\000\000\000\000\000\020\000\000\000\000\000\000,\017\177��*\000\000����", '\0' <repeats 28 times>, "P�b", '\0' <repeats 13 times>, "`\002�@\000\000\000\0001\017Ϊ�*\000\000�Y���*\000\000\034U���*\000\000\000\000\000\000\000\000\000\000��P\004\000\000\000\000p\237P\004\000\000\000\000P\000�@\000\000\000\000P\000�@\000\000\000\000"...
	dirfh = {ident = ":O", hashcount = 2, xlatorid = 0, gen = 5505199980916578812, ino = 634077276, entryhash = {257, 62611, 0 <repeats 19 times>}}
	args = {where = {dir = {data = {data_len = 26, data_val = 0x40affed0 ":O\002"}}, name = 0x40afff10 "fileop_dir_29_8_13"}, attributes = {mode = {set_it = 1, 
      set_mode3_u = {mode = 493}}, uid = {set_it = 0, set_uid3_u = {uid = 0}}, gid = {set_it = 0, set_gid3_u = {gid = 0}}, size = {set_it = 0, set_size3_u = {
        size = 0}}, atime = {set_it = DONT_CHANGE, set_atime_u = {atime = {seconds = 0, nseconds = 0}}}, mtime = {set_it = DONT_CHANGE, set_mtime_u = {mtime = {
          seconds = 0, nseconds = 0}}}}}
	ret = -1
	__FUNCTION__ = "nfs3svc_mkdir"
#11 0x00002aaaaceebaec in nfs_rpcsvc_handle_rpc_call (conn=0x64b450) at ../../../../xlators/nfs/lib/src//rpcsvc.c:1901
	actor = (rpcsvc_actor_t *) 0x2aaaad104140
	req = (rpcsvc_request_t *) 0x67f230
	ret = -1
#12 0x00002aaaaceec7ee in nfs_rpcsvc_record_update_state (conn=0x64b450, dataread=0) at ../../../../xlators/nfs/lib/src//rpcsvc.c:2383
	rs = (rpcsvc_record_state_t *) 0x64b468
	svc = (rpcsvc_t *) 0x0
	__FUNCTION__ = "nfs_rpcsvc_record_update_state"
#13 0x00002aaaaceec944 in nfs_rpcsvc_conn_data_poll_in (conn=0x64b450) at ../../../../xlators/nfs/lib/src//rpcsvc.c:2426
	dataread = 124
	readsize = 124
	readaddr = 0x2aaaac134000 "���g"
	ret = -1
	__FUNCTION__ = "nfs_rpcsvc_conn_data_poll_in"
#14 0x00002aaaaceecd68 in nfs_rpcsvc_conn_data_handler (fd=21, idx=3, data=0x64b450, poll_in=1, poll_out=0, poll_err=0)
    at ../../../../xlators/nfs/lib/src//rpcsvc.c:2555
	conn = (rpcsvc_conn_t *) 0x64b450
	ret = 0
#15 0x00002aaaaad02165 in event_dispatch_epoll_handler (event_pool=0x63f2f0, events=0x659360, i=0) at event.c:812
	event_data = (struct event_data *) 0x659364
	handler = (event_handler_t) 0x2aaaaceecce1 <nfs_rpcsvc_conn_data_handler>
	data = (void *) 0x64b450
	idx = 3
	ret = -1
	__FUNCTION__ = "event_dispatch_epoll_handler"
#16 0x00002aaaaad02338 in event_dispatch_epoll (event_pool=0x63f2f0) at event.c:876
	events = (struct epoll_event *) 0x659360
	size = 1
	i = 0
	ret = 1
	__FUNCTION__ = "event_dispatch_epoll"
#17 0x00002aaaaad02608 in event_dispatch (event_pool=0x63f2f0) at event.c:984
	ret = -1
	__FUNCTION__ = "event_dispatch"
#18 0x00002aaaacee8236 in nfs_rpcsvc_stage_proc (arg=0x62b300) at ../../../../xlators/nfs/lib/src//rpcsvc.c:64
	stg = (rpcsvc_stage_t *) 0x62b300
#19 0x00002aaaab566407 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#20 0x00002aaaab84fb0d in clone () from /lib64/libc.so.6
No symbol table info available.
Comment 4 Amar Tumballi 2010-08-29 04:44:13 EDT
Is this bug still valid? if yes, please let me know when it happens.
Comment 5 Lakshmipathi G 2010-08-29 23:07:37 EDT
I'll check with upcoming gnfs-glfs version with dvm integration and update the status.
Comment 6 Amar Tumballi 2010-09-16 23:30:24 EDT
(In reply to comment #5)
> I'll check with upcoming gnfs-glfs version with dvm integration and update the
> status.

Hi Lakshmi,

Can you check the issues now? latest release should have proper fix for gNFS with gfid changes.
Comment 7 Vijay Bellur 2010-09-29 23:36:57 EDT
> 
> Hi Lakshmi,
> 
> Can you check the issues now? latest release should have proper fix for gNFS
> with gfid changes.

Can this be verified now?
Comment 8 Shehjar Tikoo 2010-09-30 00:01:34 EDT
Closing. I am confident this is not present anymore.

Note You need to log in before you can comment on or make changes to this bug.