| Summary: | gnfs crash with fileop in client xdr decode path | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Lakshmipathi G <lakshmipathi> |
| Component: | nfs | Assignee: | Amar Tumballi <amarts> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | high | Docs Contact: | |
| Priority: | low | ||
| Version: | 3.1-alpha | CC: | gluster-bugs, shehjart, tejas, vijay, vraman |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | RTP | Mount Type: | nfs |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Lakshmipathi G
2010-08-10 03:02:21 UTC
while running mixed (fileop) tests on 4glfs server + 4 gnfs server and 8 clients with glusterfs-3.1.0qa3 - gnfs server crashed.
(gdb) bt full
#0 0x00002aaaab8786ab in xdrmem_getint32 () from /lib64/libc.so.6
No symbol table info available.
#1 0x00002aaaab87e97c in xdr_u_quad_t () from /lib64/libc.so.6
No symbol table info available.
#2 0x00002aaaab14560f in xdr_gf_common_rsp (xdrs=0x7fffabb00570, objp=0x7fffabb00610) at glusterfs3-xdr.c:1659
No locals.
#3 0x00002aaaaaf34d16 in xdr_to_generic (inmsg={iov_base = 0x100000000000, iov_len = 16}, args=0x7fffabb00610, proc=0x2aaaab1455f2 <xdr_gf_common_rsp>)
at rpc-common.c:40
xdr = {x_op = XDR_DECODE, x_ops = 0x2aaaabac8880, x_public = 0x2aaa06872d12 <Address 0x2aaa06872d12 out of bounds>,
x_private = 0x100000000004 <Address 0x100000000004 out of bounds>, x_base = 0x100000000000 <Address 0x100000000000 out of bounds>, x_handy = 12}
ret = -1
#4 0x00002aaaab147ae7 in xdr_to_common_rsp (outmsg={iov_base = 0x100000000000, iov_len = 16}, rsp=0x7fffabb00610) at glusterfs3.c:1126
No locals.
#5 0x00002aaaac34832e in client3_1_entrylk_cbk (req=0x2aaaad98d3e0, iov=0x2aaaad98d420, count=1, myframe=0x2f907d20) at client3_1-fops.c:1124
frame = (call_frame_t *) 0x2f907d20
rsp = {gfs_id = 0, op_ret = 0, op_errno = 0}
ret = 0
__FUNCTION__ = "client3_1_entrylk_cbk"
#6 0x00002aaaaaf33044 in rpc_clnt_handle_reply (clnt=0x638ba0, pollin=0x2f9170f0) at rpc-clnt.c:672
conn = (rpc_clnt_connection_t *) 0x638bd0
saved_frame = (struct saved_frame *) 0x2aaaada4b2b0
request_info = (rpc_request_info_t *) 0x2f913e50
ret = 0
req = (struct rpc_req *) 0x2aaaad98d3e0
__FUNCTION__ = "rpc_clnt_handle_reply"
#7 0x00002aaaaaf33355 in rpc_clnt_notify (trans=0x638d30, mydata=0x638bd0, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2f9170f0) at rpc-clnt.c:781
conn = (rpc_clnt_connection_t *) 0x638bd0
clnt = (struct rpc_clnt *) 0x638ba0
ret = -1
req_info = (rpc_request_info_t *) 0x0
pollin = (rpc_transport_pollin_t *) 0x2f9170f0
tv = {tv_sec = 0, tv_usec = 0}
#8 0x00002aaaaaf3109b in rpc_transport_notify (this=0x638d30, event=RPC_TRANSPORT_MSG_RECEIVED, data=0x2f9170f0) at rpc-transport.c:1239
ret = -1
#9 0x00002aaaaccc27c3 in socket_event_poll_in (this=0x638d30) at socket.c:1406
ret = 0
pollin = (rpc_transport_pollin_t *) 0x2f9170f0
#10 0x00002aaaaccc2ab7 in socket_event_handler (fd=9, idx=1, data=0x638d30, poll_in=1, poll_out=0, poll_err=0) at socket.c:1512
this = (rpc_transport_t *) 0x638d30
priv = (socket_private_t *) 0x6390c0
ret = 0
__FUNCTION__ = "socket_event_handler"
#11 0x00002aaaaad038e4 in event_dispatch_epoll_handler (event_pool=0x629330, events=0x639c00, i=0) at event.c:812
event_data = (struct event_data *) 0x639c04
handler = (event_handler_t) 0x2aaaaccc2a00 <socket_event_handler>
data = (void *) 0x638d30
idx = 1
ret = -1
__FUNCTION__ = "event_dispatch_epoll_handler"
#12 0x00002aaaaad03ac5 in event_dispatch_epoll (event_pool=0x629330) at event.c:876
events = (struct epoll_event *) 0x639c00
size = 1
i = 0
ret = 1
__FUNCTION__ = "event_dispatch_epoll"
#13 0x00002aaaaad03ddb in event_dispatch (event_pool=0x629330) at event.c:984
ret = -1
__FUNCTION__ = "event_dispatch"
#14 0x0000000000405062 in main (argc=7, argv=0x7fffabb00b58) at glusterfsd.c:1273
ctx = (glusterfs_ctx_t *) 0x629010
ret = 0
log file-
================================
[2010-08-09 10:29:37.226035] T [nfs3.c:1039:nfs3_lookup] nfs-nfsv3: FH to Volume: mirror
[2010-08-09 10:29:37.226050] T [nfs3-helpers.c:2940:nfs3_fh_resolve_entry_hard] nfs-nfsv3: FH hard resolution: ino: 205328953, gen: 5502220922886637152, entry: fileop_L1_58_L2_17, hashidx: 0
[2010-08-09 10:29:37.226082] T [nfs.c:409:nfs_user_create] nfs: uid: 0, gid 0, gids: 0
[2010-08-09 10:29:37.226098] T [nfs-fops.c:279:nfs_fop_lookup] nfs: Lookup: /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17
[2010-08-09 10:29:37.226144] T [rpc-clnt.c:1164:rpc_clnt_record] : Auth Info: pid: 0, uid: 0, gid: 0, owner: 0
[2010-08-09 10:29:37.226162] T [rpc-clnt.c:1056:rpc_clnt_record_build_header] rpc-clnt: Request fraglen 640, payload: 512, rpc hdr: 128
[2010-08-09 10:29:37.227409] T [rpc-clnt.c:596:rpc_clnt_reply_init] rpc-clnt: RPC XID: 109522193 Program: GlusterFS 3.1, ProgVers: 310, Proc: 27
[2010-08-09 10:29:37.227446] D [afr.c:533:afr_lookup_collect_xattr] mirror: entry self-heal is pending for /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17.
[2010-08-09 10:29:37.227468] N [afr.c:688:afr_lookup_done] mirror: background entry self-heal triggered. path: /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17
[2010-08-09 10:29:37.227484] T [afr-self-heal-common.c:1574:afr_self_heal] mirror: performing self heal on /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17 (metadata=0 data=0 entry=1)
[2010-08-09 10:29:37.227508] T [afr-self-heal-common.c:1622:afr_self_heal] mirror: proceeding to metadata check on /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17
[2010-08-09 10:29:37.227523] T [afr-self-heal-common.c:781:afr_sh_missing_entries_done] mirror: proceeding to metadata check on /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17
[2010-08-09 10:29:37.227538] T [afr-self-heal-metadata.c:95:afr_sh_metadata_done] mirror: proceeding to entry check on /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17
[2010-08-09 10:29:37.227553] T [afr-self-heal-entry.c:2514:afr_sh_entry_lock] mirror: locking /nfsalpha2/ip-10-212-117-143/test5/fileop_L1_58/fileop_L1_58_L2_17 on subvolume ec2-204-236-197-168.compute-1.amazonaws.com-1
[2010-08-09 10:29:37.227573] T [rpc-clnt.c:1164:rpc_clnt_record] : Auth Info: pid: 0, uid: 0, gid: 0, owner: 0
[2010-08-09 10:29:37.227590] T [rpc-clnt.c:1056:rpc_clnt_record_build_header] rpc-clnt: Request fraglen 256, payload: 128, rpc hdr: 128
[2010-08-09 10:29:37.227872] T [rpc-clnt.c:596:rpc_clnt_reply_init] rpc-clnt: RPC XID: 109522194 Program: GlusterFS 3.1, ProgVers: 310, Proc: 31
pending frames:
patchset: v3.1.0qa3
signal received: 11
time of crash: 2010-08-09 10:29:37
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.1.0qa3
/lib64/libc.so.6[0x2aaaab7aaf30]
/lib64/libc.so.6[0x2aaaab8786ab]
/lib64/libc.so.6(xdr_int64_t+0x8c)[0x2aaaab87e97c]
/opt/glusterfs/3.1.0qa3//lib/libgfxdr.so.0(xdr_gf_common_rsp+0x1d)[0x2aaaab14560f]
/opt/glusterfs/3.1.0qa3//lib/libgfrpc.so.0(xdr_to_generic+0x75)[0x2aaaaaf34d16]
/opt/glusterfs/3.1.0qa3//lib/libgfxdr.so.0(xdr_to_common_rsp+0x35)[0x2aaaab147ae7]
/opt/glusterfs/3.1.0qa3//lib/glusterfs/3.1.0qa3/xlator/protocol/client.so(client3_1_entrylk_cbk+0x6e)[0x2aaaac34832e]
/opt/glusterfs/3.1.0qa3//lib/libgfrpc.so.0(rpc_clnt_handle_reply+0x1f1)[0x2aaaaaf33044]
/opt/glusterfs/3.1.0qa3//lib/libgfrpc.so.0(rpc_clnt_notify+0x270)[0x2aaaaaf33355]
/opt/glusterfs/3.1.0qa3//lib/libgfrpc.so.0(rpc_transport_notify+0xd8)[0x2aaaaaf3109b]
/opt/glusterfs/3.1.0qa3//lib/glusterfs/3.1.0qa3/rpc-transport/socket.so(socket_event_poll_in+0x4b)[0x2aaaaccc27c3]
/opt/glusterfs/3.1.0qa3//lib/glusterfs/3.1.0qa3/rpc-transport/socket.so(socket_event_handler+0xb7)[0x2aaaaccc2ab7]
/opt/glusterfs/3.1.0qa3//lib/libglusterfs.so.0[0x2aaaaad038e4]
/opt/glusterfs/3.1.0qa3//lib/libglusterfs.so.0[0x2aaaaad03ac5]
/opt/glusterfs/3.1.0qa3//lib/libglusterfs.so.0(event_dispatch+0x73)[0x2aaaaad03ddb]
/opt/glusterfs/3.1.0qa3/sbin/glusterfsd(main+0xec)[0x405062]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2aaaab798074]
/opt/glusterfs/3.1.0qa3/sbin/glusterfsd[0x4027a9]
---------
================================
here check frame 4
#4 0x00002aaaab147ae7 in xdr_to_common_rsp (outmsg={iov_base = 0x100000000000,
Clearly shows that outmsg is corrupted (or assigned)
While testing fileop with 4x3 distributed replicate with 3.1.0qa5
It passed first time when i ran with lesser depth value
/opt/qa/tools/fileop -f 30 -t
But ran again with higher depth it failed and gnfs crashes.
Log file can be found /share/tickets/1315/1315.log
core file-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(gdb) bt full
#0 0x00002aaaab568343 in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#1 0x00002aaaaacee55d in inode_ref (inode=0x4514a70) at inode.c:478
table = (inode_table_t *) 0x0
#2 0x00002aaaaace0463 in loc_copy (dst=0x4514a78, src=0x2aaaaac4b408) at xlator.c:1023
ret = -1
#3 0x00002aaaacc8c3ce in dht_lookup (frame=0x2aaaac3354a0, this=0x63d050, loc=0x2aaaaac4b408, xattr_req=0x0) at dht-common.c:815
subvol = (xlator_t *) 0x0
hashed_subvol = (xlator_t *) 0x0
cached_subvol = (xlator_t *) 0x0
local = (dht_local_t *) 0x4514a70
conf = (dht_conf_t *) 0x63f920
ret = -1
op_errno = -1
layout = (dht_layout_t *) 0x0
i = 0
call_cnt = 0
__FUNCTION__ = "dht_lookup"
#4 0x00002aaaacec3518 in nfs_fop_lookup (nfsx=0x63e080, xl=0x63d050, nfu=0x40affd10, loc=0x2aaaaac4b408, cbk=0x2aaaacee3cc8 <nfs3_fh_resolve_entry_lookup_cbk>,
local=0x2aaaaac4b020) at nfs-fops.c:284
_new = (call_frame_t *) 0x2aaaac3354a0
old_THIS = (xlator_t *) 0x2aaaaaf22560
tmp_cbk = (fop_lookup_cbk_t) 0x2aaaacec2e21 <nfs_fop_lookup_cbk>
frame = (call_frame_t *) 0x2aaaac536e78
ret = -14
nfl = (struct nfs_fop_local *) 0x2aaaaab25280
__FUNCTION__ = "nfs_fop_lookup"
#5 0x00002aaaacecb58a in nfs_lookup (nfsx=0x63e080, xl=0x63d050, nfu=0x40affd10, pathloc=0x2aaaaac4b408, cbk=0x2aaaacee3cc8 <nfs3_fh_resolve_entry_lookup_cbk>,
local=0x2aaaaac4b020) at nfs-generics.c:86
ret = -14
#6 0x00002aaaacee5097 in nfs3_fh_resolve_entry_hard (cs=0x2aaaaac4b020) at nfs3-helpers.c:2949
ret = -2
nfu = {uid = 0, gids = {0 <repeats 17 times>}, ngrps = 1}
__FUNCTION__ = "nfs3_fh_resolve_entry_hard"
#7 0x00002aaaacee5232 in nfs3_fh_resolve_entry (cs=0x2aaaaac4b020) at nfs3-helpers.c:2995
ret = -14
#8 0x00002aaaacee537b in nfs3_fh_resolve_and_resume (cs=0x2aaaaac4b020, fh=0x40affed0, entry=0x40afff10 "fileop_dir_29_8_13",
Missing separate debuginfos, use: debuginfo-install gcc.x86_64 glibc.x86_64
resum_fn=0x2aaaaced5838 <nfs3_mkdir_resume>) at nfs3-helpers.c:3022
ret = -14
#9 0x00002aaaaced5d68 in nfs3_mkdir (req=0x67f230, dirfh=0x40affed0, name=0x40afff10 "fileop_dir_29_8_13", sattr=0x40affe88) at nfs3.c:2439
vol = (xlator_t *) 0x63d050
stat = NFS3ERR_SERVERFAULT
ret = -14
nfs3 = (struct nfs3_state *) 0x65a270
cs = (nfs3_call_state_t *) 0x2aaaaac4b020
__FUNCTION__ = "nfs3_mkdir"
#10 0x00002aaaaced5ece in nfs3svc_mkdir (req=0x67f230) at nfs3.c:2473
name = "fileop_dir_29_8_13\000\000\000\000\000\000\000 �@\000\000\000\000\000\020\000\000\000\000\000\000,\017\177��*\000\000\000\020\000\000\000\000\000\000\200٬��*\000\000`\002�@", '\0' <repeats 21 times>, " �@\000\000\000\000\000\020\000\000\000\000\000\000,\017\177��*\000\000����", '\0' <repeats 28 times>, "P�b", '\0' <repeats 13 times>, "`\002�@\000\000\000\0001\017Ϊ�*\000\000�Y���*\000\000\034U���*\000\000\000\000\000\000\000\000\000\000��P\004\000\000\000\000p\237P\004\000\000\000\000P\000�@\000\000\000\000P\000�@\000\000\000\000"...
dirfh = {ident = ":O", hashcount = 2, xlatorid = 0, gen = 5505199980916578812, ino = 634077276, entryhash = {257, 62611, 0 <repeats 19 times>}}
args = {where = {dir = {data = {data_len = 26, data_val = 0x40affed0 ":O\002"}}, name = 0x40afff10 "fileop_dir_29_8_13"}, attributes = {mode = {set_it = 1,
set_mode3_u = {mode = 493}}, uid = {set_it = 0, set_uid3_u = {uid = 0}}, gid = {set_it = 0, set_gid3_u = {gid = 0}}, size = {set_it = 0, set_size3_u = {
size = 0}}, atime = {set_it = DONT_CHANGE, set_atime_u = {atime = {seconds = 0, nseconds = 0}}}, mtime = {set_it = DONT_CHANGE, set_mtime_u = {mtime = {
seconds = 0, nseconds = 0}}}}}
ret = -1
__FUNCTION__ = "nfs3svc_mkdir"
#11 0x00002aaaaceebaec in nfs_rpcsvc_handle_rpc_call (conn=0x64b450) at ../../../../xlators/nfs/lib/src//rpcsvc.c:1901
actor = (rpcsvc_actor_t *) 0x2aaaad104140
req = (rpcsvc_request_t *) 0x67f230
ret = -1
#12 0x00002aaaaceec7ee in nfs_rpcsvc_record_update_state (conn=0x64b450, dataread=0) at ../../../../xlators/nfs/lib/src//rpcsvc.c:2383
rs = (rpcsvc_record_state_t *) 0x64b468
svc = (rpcsvc_t *) 0x0
__FUNCTION__ = "nfs_rpcsvc_record_update_state"
#13 0x00002aaaaceec944 in nfs_rpcsvc_conn_data_poll_in (conn=0x64b450) at ../../../../xlators/nfs/lib/src//rpcsvc.c:2426
dataread = 124
readsize = 124
readaddr = 0x2aaaac134000 "���g"
ret = -1
__FUNCTION__ = "nfs_rpcsvc_conn_data_poll_in"
#14 0x00002aaaaceecd68 in nfs_rpcsvc_conn_data_handler (fd=21, idx=3, data=0x64b450, poll_in=1, poll_out=0, poll_err=0)
at ../../../../xlators/nfs/lib/src//rpcsvc.c:2555
conn = (rpcsvc_conn_t *) 0x64b450
ret = 0
#15 0x00002aaaaad02165 in event_dispatch_epoll_handler (event_pool=0x63f2f0, events=0x659360, i=0) at event.c:812
event_data = (struct event_data *) 0x659364
handler = (event_handler_t) 0x2aaaaceecce1 <nfs_rpcsvc_conn_data_handler>
data = (void *) 0x64b450
idx = 3
ret = -1
__FUNCTION__ = "event_dispatch_epoll_handler"
#16 0x00002aaaaad02338 in event_dispatch_epoll (event_pool=0x63f2f0) at event.c:876
events = (struct epoll_event *) 0x659360
size = 1
i = 0
ret = 1
__FUNCTION__ = "event_dispatch_epoll"
#17 0x00002aaaaad02608 in event_dispatch (event_pool=0x63f2f0) at event.c:984
ret = -1
__FUNCTION__ = "event_dispatch"
#18 0x00002aaaacee8236 in nfs_rpcsvc_stage_proc (arg=0x62b300) at ../../../../xlators/nfs/lib/src//rpcsvc.c:64
stg = (rpcsvc_stage_t *) 0x62b300
#19 0x00002aaaab566407 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#20 0x00002aaaab84fb0d in clone () from /lib64/libc.so.6
No symbol table info available.
Is this bug still valid? if yes, please let me know when it happens. I'll check with upcoming gnfs-glfs version with dvm integration and update the status. (In reply to comment #5) > I'll check with upcoming gnfs-glfs version with dvm integration and update the > status. Hi Lakshmi, Can you check the issues now? latest release should have proper fix for gNFS with gfid changes.
>
> Hi Lakshmi,
>
> Can you check the issues now? latest release should have proper fix for gNFS
> with gfid changes.
Can this be verified now?
Closing. I am confident this is not present anymore. |