Bug 1161025 - Brick process crashed after failing to send a RPC-Reply, client_t related?
Summary: Brick process crashed after failing to send a RPC-Reply, client_t related?
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: 3.5.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kaleb KEITHLEY
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-11-06 08:18 UTC by Franco Broi
Modified: 2016-06-17 16:24 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-06-17 16:24:04 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Log file for brick daemon (63.40 KB, application/x-gzip)
2014-11-06 08:18 UTC, Franco Broi
no flags Details
Brick log (80.48 KB, application/x-gzip)
2014-11-16 23:57 UTC, Franco Broi
no flags Details

Description Franco Broi 2014-11-06 08:18:49 UTC
Created attachment 954336 [details]
Log file for brick daemon

Description of problem:

Brick process crashed.

Version-Release number of selected component (if applicable):


How reproducible:

First time I've seen this and restarting glusterd brought the brick back online


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Log file attached

Comment 1 Niels de Vos 2014-11-11 12:32:20 UTC
I [socket.c:3134:socket_submit_reply] 0-tcp.data-server: not connected (priv->connected = -1)
E [rpcsvc.c:1258:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x94c32, Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.data-server)
E [server.c:190:server_submit_reply] (-->/usr/lib64/glusterfs/3.5.2/xlator/features/marker.so(marker_lookup_cbk+0x10e) [0x7f69045e814e] (-->/usr/lib64/glusterfs/3.5.2/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x113) [0x7f69041b1c63] (-->/usr/lib64/glusterfs/3.5.2/xlator/protocol/server.so(server_lookup_cbk+0x34d) [0x7f68ffdf0e5d]))) 0-: Reply submission failed
E [server-helpers.c:381:server_alloc_frame] (-->/usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x103) [0x7f690be609e3] (-->/usr/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x295) [0x7f690be607a5] (-->/usr/lib64/glusterfs/3.5.2/xlator/protocol/server.so(server3_3_lookup+0x9d) [0x7f68ffdf147d]))) 0-server: invalid argument: client
E [rpcsvc.c:547:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed to complete successfully
I [server-handshake.c:575:server_setvolume] 0-data-server: accepted client from delta30-2273-2014/11/05-00:09:22:351963-data-client-13-0 (version: 3.5.1)
I [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection delta30-2273-2014/11/05-00:09:22:351963-data-client-13-0
I [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection delta30-2273-2014/11/05-00:09:22:351963-data-client-13-0
I [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection 
I [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection 
I [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection pÔ
I [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection pÔ
I [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection pÔ
I [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection pÔ
I [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection pÔ
I [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection pÔ
I [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection pÔ
I [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection pÔ
I [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection pÔ
I [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection pÔ
I [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection pÔ
pending frames:
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(27)
frame : type(0) op(14)
frame : type(0) op(27)

patchset: git://git.gluster.com/glusterfs.git
signal received: 6
time of crash: 2014-11-06 06:01:26configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.5.2
/lib64/libc.so.6(+0x329a0)[0x7f690a6aa9a0]
/lib64/libc.so.6(gsignal+0x35)[0x7f690a6aa925]
/lib64/libc.so.6(abort+0x175)[0x7f690a6ac105]
/lib64/libc.so.6(+0x70837)[0x7f690a6e8837]
/lib64/libc.so.6(+0x76166)[0x7f690a6ee166]
/lib64/libc.so.6(+0x79f1f)[0x7f690a6f1f1f]
/lib64/libc.so.6(__libc_malloc+0x71)[0x7f690a6f2991]
/lib64/libc.so.6(xdr_bytes+0xf0)[0x7f690a790180]
/usr/lib64/libgfxdr.so.0(xdr_gfs3_readdirp_req+0x7d)[0x7f690bc4ab2d]
/usr/lib64/libgfxdr.so.0(xdr_to_generic+0x73)[0x7f690bc48aa3]
/usr/lib64/glusterfs/3.5.2/xlator/protocol/server.so(server3_3_readdirp+0x65)[0x7f68ffdda885]
/usr/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x295)[0x7f690be607a5]
/usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x103)[0x7f690be609e3]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7f690be62328]
/usr/lib64/glusterfs/3.5.2/rpc-transport/socket.so(+0x8fb5)[0x7f69076c0fb5]
/usr/lib64/glusterfs/3.5.2/rpc-transport/socket.so(+0xa9fd)[0x7f69076c29fd]
/usr/lib64/libglusterfs.so.0(+0x67cd7)[0x7f690c0d7cd7]
/usr/sbin/glusterfsd(main+0x564)[0x4075e4]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f690a696d1d]
/usr/sbin/glusterfsd[0x404679]

Comment 2 Niels de Vos 2014-11-11 12:33:55 UTC
This might be related to the name of the client in this message:

    [client_t.c:417:gf_client_unref] 0-data-server: Shutting down connection

Comment 3 Kaleb KEITHLEY 2014-11-11 15:47:54 UTC
Do you have the core file from this crash?

Thanks,

Comment 4 Anand Avati 2014-11-11 17:02:26 UTC
REVIEW: http://review.gluster.org/9096 (libglusterfs: brick crashed after failing to send RPC reply, client_t) posted (#1) for review on release-3.5 by Kaleb KEITHLEY (kkeithle)

Comment 5 Franco Broi 2014-11-11 23:55:13 UTC
core file is 20M compressed, here is the backtrace.

#0  0x00007f690a6aa925 in raise () from /lib64/libc.so.6
#1  0x00007f690a6ac105 in abort () from /lib64/libc.so.6
#2  0x00007f690a6e8837 in __libc_message () from /lib64/libc.so.6
#3  0x00007f690a6ee166 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f690a6f1f1f in _int_malloc () from /lib64/libc.so.6
#5  0x00007f690a6f2991 in malloc () from /lib64/libc.so.6
#6  0x00007f690a790180 in xdr_bytes_internal () from /lib64/libc.so.6
#7  0x00007f690bc4ab2d in xdr_gfs3_readdirp_req ()
   from /usr/lib64/libgfxdr.so.0
#8  0x00007f690bc48aa3 in xdr_to_generic () from /usr/lib64/libgfxdr.so.0
#9  0x00007f68ffdda885 in server3_3_readdirp ()
   from /usr/lib64/glusterfs/3.5.2/xlator/protocol/server.so
#10 0x00007f690be607a5 in rpcsvc_handle_rpc_call ()
   from /usr/lib64/libgfrpc.so.0
#11 0x00007f690be609e3 in rpcsvc_notify () from /usr/lib64/libgfrpc.so.0
#12 0x00007f690be62328 in rpc_transport_notify () from /usr/lib64/libgfrpc.so.0
#13 0x00007f69076c0fb5 in ?? ()
   from /usr/lib64/glusterfs/3.5.2/rpc-transport/socket.so
#14 0x00007f69076c29fd in ?? ()
   from /usr/lib64/glusterfs/3.5.2/rpc-transport/socket.so
#15 0x00007f690c0d7cd7 in ?? () from /usr/lib64/libglusterfs.so.0
#16 0x00000000004075e4 in main ()

Comment 6 Franco Broi 2014-11-16 23:55:18 UTC
Same brick has crashed again.

#0  0x00007f4322aee3a0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x00007f431c92f4b6 in pl_inodelk_client_cleanup () from /usr/lib64/glusterfs/3.5.2/xlator/features/locks.so
#2  0x00007f431c924eea in ?? () from 
/usr/lib64/glusterfs/3.5.2/xlator/features/locks.so
#3  0x00007f431c924f2a in ?? () from /usr/lib64/glusterfs/3.5.2/xlator/features/locks.so
#4  0x00007f4323dcfc75 in gf_client_unref () from /usr/lib64/libglusterfs.so.0
#5  0x00007f4317bbb21c in server_submit_reply () from /usr/lib64/glusterfs/3.5.2/xlator/protocol/server.so
#6  0x00007f4317bc7d4f in server_statfs_cbk () from /usr/lib64/glusterfs/3.5.2/xlator/protocol/server.so
#7  0x00007f4317df6b96 in io_stats_statfs_cbk () from /usr/lib64/glusterfs/3.5.2/xlator/debug/io-stats.so
#8  0x00007f431c70adb2 in iot_statfs_cbk () from /usr/lib64/glusterfs/3.5.2/xlator/performance/io-threads.so
#9  0x00007f431d37b3cc in posix_statfs () from /usr/lib64/glusterfs/3.5.2/xlator/storage/posix.so
#10 0x00007f4323d8a606 in default_statfs () from /usr/lib64/libglusterfs.so.0
#11 0x00007f4323d8a606 in default_statfs () from /usr/lib64/libglusterfs.so.0
#12 0x00007f4323d8a606 in default_statfs () from /usr/lib64/libglusterfs.so.0
#13 0x00007f431c70f0e4 in iot_statfs_wrapper () from /usr/lib64/glusterfs/3.5.2/xlator/performance/io-threads.so
#14 0x00007f4323d9ea06 in call_resume () from /usr/lib64/libglusterfs.so.0
#15 0x00007f431c716ee8 in iot_worker () from /usr/lib64/glusterfs/3.5.2/xlator/performance/io-threads.so
#16 0x00007f4322aec9d1 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f432245ab6d in clone () from /lib64/libc.so.6

Comment 7 Franco Broi 2014-11-16 23:57:56 UTC
Created attachment 958094 [details]
Brick log

Here is the brick log for the second crash

Comment 8 Franco Broi 2014-11-17 01:00:18 UTC
More gdb info.

  25 Thread 0x7f4316444700 (LWP 32288)  0x00007f4322af05bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  24 Thread 0x7f4320dc7700 (LWP 32261)  0x00007f4322af44b5 in sigwait () from /lib64/libpthread.so.0
  23 Thread 0x7f4317648700 (LWP 32284)  0x00007f4322af05bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  22 Thread 0x7f4314f78700 (LWP 32302)  0x00007f4322af36fd in write () from /lib64/libpthread.so.0
  21 Thread 0x7f4314d76700 (LWP 32304)  0x00007f4322af36fd in write () from /lib64/libpthread.so.0
  20 Thread 0x7f4314172700 (LWP 32377)  0x00007f4322af36fd in write () from /lib64/libpthread.so.0
  19 Thread 0x7f431557e700 (LWP 32291)  0x00007f4322af36fd in write () from /lib64/libpthread.so.0
  18 Thread 0x7f431547d700 (LWP 32292)  0x00007f4322af3264 in __lll_lock_wait () from /lib64/libpthread.so.0
  17 Thread 0x7f431dd97700 (LWP 32264)  0x00007f4322af3f3d in nanosleep () from /lib64/libpthread.so.0
  16 Thread 0x7f4317547700 (LWP 32285)  0x00007f4322af3264 in __lll_lock_wait () from /lib64/libpthread.so.0
  15 Thread 0x7f4317446700 (LWP 32286)  0x00007f432241ecdd in nanosleep () from /lib64/libc.so.6
  14 Thread 0x7f4316c45700 (LWP 32287)  0x00007f4322af098e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  13 Thread 0x7f4324206700 (LWP 32260)  0x00007f432245b163 in epoll_wait () from /lib64/libc.so.6
  12 Thread 0x7f431567f700 (LWP 32290)  0x00007f4322af098e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  11 Thread 0x7f4315079700 (LWP 32301)  0x00007f4322af3264 in __lll_lock_wait () from /lib64/libpthread.so.0
  10 Thread 0x7f431fdc5700 (LWP 32263)  0x00007f4322af098e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
* 9 Thread 0x7f4315780700 (LWP 32289)  0x00007f4322af3264 in __lll_lock_wait () from /lib64/libpthread.so.0
  8 Thread 0x7f4314475700 (LWP 32374)  0x00007f4322af36fd in write () from /lib64/libpthread.so.0
  7 Thread 0x7f431517a700 (LWP 32295)  0x00007f4322498be0 in _dl_addr () from /lib64/libc.so.6
  6 Thread 0x7f4314374700 (LWP 32375)  0x00007f4322452c57 in writev () from /lib64/libc.so.6
  5 Thread 0x7f4314273700 (LWP 32376)  0x00007f4322452c57 in writev () from /lib64/libc.so.6
  4 Thread 0x7f431527b700 (LWP 32294)  0x00007f4322af098e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  3 Thread 0x7f4314e77700 (LWP 32303)  0x00007f4322af098e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  2 Thread 0x7f43205c6700 (LWP 32262)  0x00007f4322af098e in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  1 Thread 0x7f431537c700 (LWP 32293)  0x00007f4322aee3a0 in pthread_mutex_lock () from /lib64/libpthread.so.0

Comment 9 Niels de Vos 2016-06-17 16:24:04 UTC
This bug is getting closed because the 3.5 is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.


Note You need to log in before you can comment on or make changes to this bug.