| Summary: | [NFS Xlator] NFS Server crash with mixed tests | ||
|---|---|---|---|
| Product: | [Community] GlusterFS | Reporter: | Anush Shetty <anush> |
| Component: | nfs | Assignee: | Shehjar Tikoo <shehjart> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | |
| Severity: | medium | Docs Contact: | |
| Priority: | high | ||
| Version: | nfs-alpha | CC: | gluster-bugs, lakshmipathi |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | --- | |
| Regression: | RTP | Mount Type: | nfs |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Anush Shetty
2010-04-26 06:17:36 UTC
While starting mixed tests with 50 NFS clients on Distribute+Replicate setup, NFS server crashed with the following core
Core was generated by `/opt/glusterfs/nfs-git/sbin/glusterfsd -f /ebs-raid/export/gluster/cloud/specs_'.
Program terminated with signal 6, Aborted.
[New process 29006]
[New process 29008]
[New process 29007]
#0 0x00002aaaab39c215 in raise () from /lib64/libc.so.6
(gdb) bt full
#0 0x00002aaaab39c215 in raise () from /lib64/libc.so.6
No symbol table info available.
#1 0x00002aaaab39dcc0 in abort () from /lib64/libc.so.6
No symbol table info available.
#2 0x00002aaaab395696 in __assert_fail () from /lib64/libc.so.6
No symbol table info available.
#3 0x00002aaaaacf1cec in __inode_unref (inode=0x19b9120) at inode.c:356
__PRETTY_FUNCTION__ = "__inode_unref"
#4 0x00002aaaaacf1dc5 in inode_unref (inode=0x19b9120) at inode.c:395
table = (inode_table_t *) 0xde3850
#5 0x00002aaaaace0dbd in loc_wipe (loc=0x2aaab8024cf0) at xlator.c:986
No locals.
#6 0x00002aaaac15e17c in afr_local_cleanup (local=0x2aaab8024cc0, this=0x615110) at afr.c:334
i = 32767
priv = (afr_private_t *) 0x6204a0
#7 0x00002aaaac17532c in afr_unlink_done (frame=0x2aaab8736948, this=0x615110) at afr-dir-write.c:1723
__local = (afr_local_t *) 0x2aaab8024cc0
__this = (xlator_t *) 0x615110
local = (afr_local_t *) 0x2aaab8024cc0
#8 0x00002aaaac1856f7 in afr_unlock_common_cbk (frame=0x2aaab8736948, cookie=0x2aaab872e6c0, this=0x615110, op_ret=0, op_errno=0) at afr-transaction.c:514
local = (afr_local_t *) 0x2aaab8024cc0
call_count = 0
#9 0x00002aaaabf49a6a in client_entrylk_cbk (frame=0x2aaab872e6c0, hdr=0x2aaab826ae70, hdrlen=108, iobuf=0x0) at client-protocol.c:5073
fn = (ret_fn_t) 0x2aaaac18567c <afr_unlock_common_cbk>
_parent = (call_frame_t *) 0x2aaab8736948
old_THIS = (xlator_t *) 0x613540
rsp = (gf_fop_entrylk_rsp_t *) 0x2aaab826aedc
op_ret = 0
op_errno = 0
#10 0x00002aaaabf4c55c in protocol_client_interpret (this=0x613540, trans=0x621ea0, hdr_p=0x2aaab826ae70 "", hdrlen=108, iobuf=0x0)
at client-protocol.c:6093
ret = -1
frame = (call_frame_t *) 0x2aaab872e6c0
hdr = (gf_hdr_common_t *) 0x2aaab826ae70
callid = 71506
type = 4
op = 32
conn = (client_connection_t *) 0x6223a0
__FUNCTION__ = "protocol_client_interpret"
#11 0x00002aaaabf4d222 in protocol_client_pollin (this=0x613540, trans=0x621ea0) at client-protocol.c:6391
conf = (client_conf_t *) 0x6218c0
ret = 0
iobuf = (struct iobuf *) 0x0
hdr = 0x2aaab826ae70 ""
hdrlen = 108
#12 0x00002aaaabf4d899 in notify (this=0x613540, event=2, data=0x621ea0) at client-protocol.c:6510
i = 0
ret = -1
child_down = 1
was_not_down = 0
trans = (transport_t *) 0x621ea0
conn = (client_connection_t *) 0x0
conf = (client_conf_t *) 0x6218c0
parent = (xlator_list_t *) 0x0
__FUNCTION__ = "notify"
#13 0x00002aaaaace0b7c in xlator_notify (xl=0x613540, event=2, data=0x621ea0) at xlator.c:919
old_THIS = (xlator_t *) 0x2aaaaaf34640
ret = 0
#14 0x00002aaaad49be96 in socket_event_poll_in (this=0x621ea0) at socket.c:731
ret = 0
#15 0x00002aaaad49c18b in socket_event_handler (fd=10, idx=1, data=0x621ea0, poll_in=1, poll_out=0, poll_err=0) at socket.c:831
this = (transport_t *) 0x621ea0
priv = (socket_private_t *) 0x6222d0
ret = 0
#16 0x00002aaaaad072b9 in event_dispatch_epoll_handler (event_pool=0x60c8b0, events=0x625910, i=0) at event.c:804
event_data = (struct event_data *) 0x625914
handler = (event_handler_t) 0x2aaaad49c0d4 <socket_event_handler>
data = (void *) 0x621ea0
idx = 1
ret = -1
__FUNCTION__ = "event_dispatch_epoll_handler"
#17 0x00002aaaaad0748e in event_dispatch_epoll (event_pool=0x60c8b0) at event.c:867
events = (struct epoll_event *) 0x625910
size = 1
i = 0
ret = 1
__FUNCTION__ = "event_dispatch_epoll"
#18 0x00002aaaaad077a4 in event_dispatch (event_pool=0x60c8b0) at event.c:975
ret = -1
__FUNCTION__ = "event_dispatch"
#19 0x0000000000406344 in main (argc=8, argv=0x7fff4292ca28) at glusterfsd.c:1494
ctx = (glusterfs_ctx_t *) 0x60b010
cmd_args = (cmd_args_t *) 0x60b010
pool = (call_pool_t *) 0x60b3d0
stbuf = {st_dev = 750006350, st_ino = 191, st_nlink = 5411883152, st_mode = 2875990392, st_uid = 10922, st_gid = 2863428448, pad0 = 10922,
st_rdev = 140734310304160, st_size = 46912496234496, st_blksize = 4199511, st_blocks = 46909632806990, st_atim = {tv_sec = 6442450944, tv_nsec = 0},
st_mtim = {tv_sec = 0, tv_nsec = 0}, st_ctim = {tv_sec = 5658608, tv_nsec = 191}, __unused = {5658608, 140734310304022, 140734310304023}}
tmp_logfile = '\0' <repeats 1023 times>
tmp_logfile_dyn = 0x0
tmp_logfilebase = 0x0
timestr = '\0' <repeats 255 times>
utime = 1272262507
tm = (struct tm *) 0x0
ret = 0
lim = {rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615}
specfp = (FILE *) 0x60cbe0
graph = (xlator_t *) 0x61dcf0
trav = (xlator_t *) 0x0
fuse_volume_found = 0
xl_count = 13
process_mode = 0 '\0'
pipe_fd = {6, 7}
gf_success = 0
gf_failure = -1
__FUNCTION__ = "main"
Log file:
[2010-04-26 02:51:58] T [rpcsvc.c:2369:rpcsvc_conn_data_poll_in] rpc-service: conn: 0x11c1d70, readsize: 20, dataread: 20
[2010-04-26 02:51:58] T [rpcsvc.c:2303:rpcsvc_record_update_state] rpc-service: Vectored frag
[2010-04-26 02:51:58] T [rpcsvc.c:1176:rpcsvc_record_read_partial_frag] rpc-service: Fragment remaining: 0
[2010-04-26 02:51:58] T [rpcsvc.c:2266:rpcsvc_handle_vectored_frag] rpc-service: Vectored frag complete
[2010-04-26 02:51:58] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE
[2010-04-26 02:51:58] T [rpcsvc.c:2106:rpcsvc_handle_vectored_rpc_call] rpc-service: Vectored RPC buf remaining: 65536
[2010-04-26 02:51:58] T [rpcsvc.c:2369:rpcsvc_conn_data_poll_in] rpc-service: conn: 0x11c1d70, readsize: 65536, dataread: 65536
[2010-04-26 02:51:58] T [rpcsvc.c:2303:rpcsvc_record_update_state] rpc-service: Vectored frag
[2010-04-26 02:51:58] T [rpcsvc.c:1176:rpcsvc_record_read_partial_frag] rpc-service: Fragment remaining: 0
[2010-04-26 02:51:58] T [rpcsvc.c:2266:rpcsvc_handle_vectored_frag] rpc-service: Vectored frag complete
[2010-04-26 02:51:58] T [rpcsvc.c:2208:rpcsvc_update_vectored_state] rpc-service: Vectored RPC vector read
[2010-04-26 02:51:58] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE
[2010-04-26 02:51:58] D [nfs3-helpers.c:2179:nfs3_log_rw_call] nfs-nfsv3: XID: 1299e4ef, WRITE: args: FH: hashcount 5, xlid 0, gen 5464325855197015355, ino 127402340, offset: 57217024, count: 65536, UNSTABLE
[2010-04-26 02:51:58] T [nfs3.c:1751:nfs3_write] nfs-nfsv3: FH to Volume: statprefetch
[2010-04-26 02:51:58] T [nfs3-helpers.c:2799:nfs3_fh_resolve_inode] nfs-nfsv3: FH needs inode resolution
[2010-04-26 02:51:58] T [nfs3-helpers.c:2303:nfs3_fh_resolve_inode_done] nfs-nfsv3: FH inode resolved
[2010-04-26 02:51:58] T [nfs3-helpers.c:2018:nfs3_file_open_and_resume] nfs-nfsv3: Opening: /ip-10-245-191-20/bonnie/run_bon/dir/Bonnie.19390.000
[2010-04-26 02:51:58] T [nfs3-helpers.c:2024:nfs3_file_open_and_resume] nfs-nfsv3: fd found in state: 29
[2010-04-26 02:51:58] T [nfs3-helpers.c:1779:__nfs3_fdcache_update_entry] nfs-nfsv3: Updating fd: 0xdf03a0
[2010-04-26 02:51:58] T [nfs.c:407:nfs_user_create] nfs: uid: 0, gid 0, gids: 0
[2010-04-26 02:51:58] T [io-cache.c:132:ioc_inode_flush] iocache: locked inode(0x2aaab0355260)
[2010-04-26 02:51:58] T [io-cache.c:136:ioc_inode_flush] iocache: unlocked inode(0x2aaab0355260)
[2010-04-26 02:51:58] T [rpcsvc.c:2369:rpcsvc_conn_data_poll_in] rpc-service: conn: 0x11c1d70, readsize: 4, dataread: 4
[2010-04-26 02:51:58] T [rpcsvc.c:1064:rpcsvc_record_read_complete_fraghdr] rpc-service: Received fragment size: 65632
[2010-04-26 02:51:58] T [rpcsvc.c:1067:rpcsvc_record_read_complete_fraghdr] rpc-service: Vectored RPC header, remaining: 32
[2010-04-26 02:51:58] T [rpcsvc.c:2369:rpcsvc_conn_data_poll_in] rpc-service: conn: 0x11c1d70, readsize: 32, dataread: 32
[2010-04-26 02:51:58] T [rpcsvc.c:2303:rpcsvc_record_update_state] rpc-service: Vectored frag
[2010-04-26 02:51:58] T [rpcsvc.c:1176:rpcsvc_record_read_partial_frag] rpc-service: Fragment remaining: 0
[2010-04-26 02:51:58] T [rpcsvc.c:2266:rpcsvc_handle_vectored_frag] rpc-service: Vectored frag complete
[2010-04-26 02:51:58] T [rpcsvc.c:2045:rpcsvc_update_vectored_barerpc] rpc-service: Vectored RPC verfsz remaining: 8
[2010-04-26 02:51:58] T [rpcsvc.c:2369:rpcsvc_conn_data_poll_in] rpc-service: conn: 0x11c1d70, readsize: 8, dataread: 8
[2010-04-26 02:51:58] T [rpcsvc.c:2303:rpcsvc_record_update_state] rpc-service: Vectored frag
[2010-04-26 02:51:58] T [rpcsvc.c:1176:rpcsvc_record_read_partial_frag] rpc-service: Fragment remaining: 0
[2010-04-26 02:51:58] T [rpcsvc.c:2266:rpcsvc_handle_vectored_frag] rpc-service: Vectored frag complete
[2010-04-26 02:51:58] T [rpcsvc.c:2003:rpcsvc_update_vectored_verfsz] rpc-service: Vectored RPC preparing call
[2010-04-26 02:51:58] T [rpcsvc-auth.c:247:rpcsvc_auth_request_init] rpc-service: Auth handler: AUTH_NULL
[2010-04-26 02:51:58] D [rpcsvc.c:1790:rpcsvc_request_create] rpc-service: RPC XID: 1399e4ef, Ver: 2, Program: 100003, ProgVers: 3, Proc: 7
[2010-04-26 02:51:58] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE
[2010-04-26 02:51:58] T [rpcsvc.c:1962:rpcsvc_handle_vectored_prep_rpc_call] rpc-service: Vectored RPC proc header remaining: 4
[2010-04-26 02:51:58] T [rpcsvc.c:2369:rpcsvc_conn_data_poll_in] rpc-service: conn: 0x11c1d70, readsize: 4, dataread: 4
[2010-04-26 02:51:58] T [rpcsvc.c:2303:rpcsvc_record_update_state] rpc-service: Vectored frag
[2010-04-26 02:51:58] T [rpcsvc.c:1176:rpcsvc_record_read_partial_frag] rpc-service: Fragment remaining: 0
[2010-04-26 02:51:58] T [rpcsvc.c:2266:rpcsvc_handle_vectored_frag] rpc-service: Vectored frag complete
[2010-04-26 02:51:58] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE
[2010-04-26 02:51:58] T [rpcsvc.c:2110:rpcsvc_handle_vectored_rpc_call] rpc-service: Vectored RPC proc remaining: 32
[2010-04-26 02:51:58] T [rpcsvc.c:2369:rpcsvc_conn_data_poll_in] rpc-service: conn: 0x11c1d70, readsize: 32, dataread: 32
[2010-04-26 02:51:58] T [rpcsvc.c:2303:rpcsvc_record_update_state] rpc-service: Vectored frag
[2010-04-26 02:51:58] T [rpcsvc.c:1176:rpcsvc_record_read_partial_frag] rpc-service: Fragment remaining: 0
[2010-04-26 02:51:58] T [rpcsvc.c:2266:rpcsvc_handle_vectored_frag] rpc-service: Vectored frag complete
[2010-04-26 02:51:58] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE
[2010-04-26 02:51:58] T [rpcsvc.c:2110:rpcsvc_handle_vectored_rpc_call] rpc-service: Vectored RPC proc remaining: 20
[2010-04-26 02:51:58] T [rpcsvc.c:2369:rpcsvc_conn_data_poll_in] rpc-service: conn: 0x11c1d70, readsize: 20, dataread: 20
[2010-04-26 02:51:58] T [rpcsvc.c:2303:rpcsvc_record_update_state] rpc-service: Vectored frag
[2010-04-26 02:51:58] T [rpcsvc.c:1176:rpcsvc_record_read_partial_frag] rpc-service: Fragment remaining: 0
[2010-04-26 02:51:58] T [rpcsvc.c:2266:rpcsvc_handle_vectored_frag] rpc-service: Vectored frag complete
[2010-04-26 02:51:58] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE
[2010-04-26 02:51:58] T [rpcsvc.c:2106:rpcsvc_handle_vectored_rpc_call] rpc-service: Vectored RPC buf remaining: 65536
[2010-04-26 02:51:58] T [rpcsvc.c:2369:rpcsvc_conn_data_poll_in] rpc-service: conn: 0x11c1d70, readsize: 65536, dataread: 65080
[2010-04-26 02:51:58] T [rpcsvc.c:2303:rpcsvc_record_update_state] rpc-service: Vectored frag
[2010-04-26 02:51:58] T [rpcsvc.c:1176:rpcsvc_record_read_partial_frag] rpc-service: Fragment remaining: 456
[2010-04-26 02:51:58] T [rpcsvc.c:2369:rpcsvc_conn_data_poll_in] rpc-service: conn: 0x11c1d70, readsize: 456, dataread: 456
[2010-04-26 02:51:58] T [rpcsvc.c:2303:rpcsvc_record_update_state] rpc-service: Vectored frag
[2010-04-26 02:51:58] T [rpcsvc.c:1176:rpcsvc_record_read_partial_frag] rpc-service: Fragment remaining: 0
[2010-04-26 02:51:58] T [rpcsvc.c:2266:rpcsvc_handle_vectored_frag] rpc-service: Vectored frag complete
[2010-04-26 02:51:58] T [rpcsvc.c:2208:rpcsvc_update_vectored_state] rpc-service: Vectored RPC vector read
[2010-04-26 02:51:58] D [rpcsvc.c:1266:rpcsvc_program_actor] rpc-service: Actor found: NFS3 - WRITE
[2010-04-26 02:51:58] D [nfs3-helpers.c:2179:nfs3_log_rw_call] nfs-nfsv3: XID: 1399e4ef, WRITE: args: FH: hashcount 5, xlid 0, gen 5464325855197015355, ino 127402340, offset: 57282560, count: 65536, UNSTABLE
[2010-04-26 02:51:58] T [nfs3.c:1751:nfs3_write] nfs-nfsv3: FH to Volume: statprefetch
[2010-04-26 02:51:58] T [nfs3-helpers.c:2799:nfs3_fh_resolve_inode] nfs-nfsv3: FH needs inode resolution
[2010-04-26 02:51:58] T [nfs3-helpers.c:2303:nfs3_fh_resolve_inode_done] nfs-nfsv3: FH inode resolved
[2010-04-26 02:51:58] T [nfs3-helpers.c:2018:nfs3_file_open_and_resume] nfs-nfsv3: Opening: /ip-10-245-191-20/bonnie/run_bon/dir/Bonnie.19390.000
[2010-04-26 02:51:58] T [nfs3-helpers.c:2024:nfs3_file_open_and_resume] nfs-nfsv3: fd found in state: 31
[2010-04-26 02:51:58] T [nfs3-helpers.c:1779:__nfs3_fdcache_update_entry] nfs-nfsv3: Updating fd: 0xdf03a0
[2010-04-26 02:51:58] T [nfs.c:407:nfs_user_create] nfs: uid: 0, gid 0, gids: 0
[2010-04-26 02:51:58] T [io-cache.c:132:ioc_inode_flush] iocache: locked inode(0x2aaab0355260)
[2010-04-26 02:51:58] T [io-cache.c:136:ioc_inode_flush] iocache: unlocked inode(0x2aaab0355260)
/lib64/libc.so.6[0x2aaaab39c280]
/lib64/libc.so.6(gsignal+0x35)[0x2aaaab39c215]
/lib64/libc.so.6(abort+0x110)[0x2aaaab39dcc0]
/lib64/libc.so.6(__assert_fail+0xf6)[0x2aaaab395696]
/opt/glusterfs/nfs-git/lib/libglusterfs.so.0[0x2aaaaacf1cec]
/opt/glusterfs/nfs-git/lib/libglusterfs.so.0(inode_unref+0x31)[0x2aaaaacf1dc5]
/opt/glusterfs/nfs-git/lib/libglusterfs.so.0(loc_wipe+0x26)[0x2aaaaace0dbd]
/opt/glusterfs/nfs-git/lib/glusterfs/3.0.0git/xlator/cluster/replicate.so(afr_local_cleanup+0x59)[0x2aaaac15e17c]
/opt/glusterfs/nfs-git/lib/glusterfs/3.0.0git/xlator/cluster/replicate.so(afr_unlink_done+0x7e)[0x2aaaac17532c]
/opt/glusterfs/nfs-git/lib/glusterfs/3.0.0git/xlator/cluster/replicate.so(afr_unlock_common_cbk+0x7b)[0x2aaaac1856f7]
/opt/glusterfs/nfs-git/lib/glusterfs/3.0.0git/xlator/protocol/client.so(client_entrylk_cbk+0x131)[0x2aaaabf49a6a]
/opt/glusterfs/nfs-git/lib/glusterfs/3.0.0git/xlator/protocol/client.so(protocol_client_interpret+0x1ce)[0x2aaaabf4c55c]
/opt/glusterfs/nfs-git/lib/glusterfs/3.0.0git/xlator/protocol/client.so(protocol_client_pollin+0xb0)[0x2aaaabf4d222]
/opt/glusterfs/nfs-git/lib/glusterfs/3.0.0git/xlator/protocol/client.so(notify+0x16c)[0x2aaaabf4d899]
/opt/glusterfs/nfs-git/lib/libglusterfs.so.0(xlator_notify+0xf5)[0x2aaaaace0b7c]
/opt/glusterfs/nfs-git/lib/glusterfs/3.0.0git/transport/socket.so(socket_event_poll_in+0x40)[0x2aaaad49be96]
/opt/glusterfs/nfs-git/lib/glusterfs/3.0.0git/transport/socket.so(socket_event_handler+0xb7)[0x2aaaad49c18b]
/opt/glusterfs/nfs-git/lib/libglusterfs.so.0[0x2aaaaad072b9]
/opt/glusterfs/nfs-git/lib/libglusterfs.so.0[0x2aaaaad0748e]
/opt/glusterfs/nfs-git/lib/libglusterfs.so.0(event_dispatch+0x73)[0x2aaaaad077a4]
/opt/glusterfs/nfs-git/sbin/glusterfsd(main+0xfbe)[0x406344]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x2aaaab389974]
/opt/glusterfs/nfs-git/sbin/glusterfsd[0x402719]
Log copied to: dev:/share/tickets/855 *** This bug has been marked as a duplicate of bug 857 *** Re-opened because 857 turned out to be unrelated to NFS. *** Bug 879 has been marked as a duplicate of this bug. *** *** Bug 858 has been marked as a duplicate of this bug. *** Patches submitted for 857 fix this. (In reply to comment #7) > Patches submitted for 857 fix this. Incorrect. Patch submitted for 858 fixes this. 857 is unrelated to nfs. Regression Test
The crash here is easily reproducable by sending nfsx into a problematic state using error-gen.
Test Case
1. Export a simple posix+iot through nfs. Place error-gen just below NFS with:
option failure 5
2. On the NFS mount point, dbench is run as:
$ for i in $(seq 1 100); do dbench -s 10; done
for loop is to continue running dbench even when error gen has caused it to exit with errors.
If there is a regression, the nfs server will crash before the 100 iterations are complete.
Fixed ,verified with nfs-beta-rc7. *** Bug 1167 has been marked as a duplicate of this bug. *** |