Description of problem: Suddenly, a brick went down during the operation and collected an error, but unfortunately no core dump was generated. < GlusterFS(3.12.15) Volume Info > Volume Name: nas Type: Distributed-Replicate Volume ID: 5a8cc386-85dc-4255-b287-352a499a28d5 Status: Started Snapshot Count: 0 Number of Bricks: 2 x 3 = 6 Transport-type: tcp Bricks: Brick1: NAS01:/gluster1/nas Brick2: NAS02:/gluster1/nas Brick3: NAS03:/gluster1/nas Brick4: NAS01:/gluster2/nas ===> down brick(10.10.10.121:49154) Brick5: NAS02:/gluster2/nas Brick6: NAS03:/gluster2/nas Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet cluster.self-heal-window-size: 16 cluster.background-self-heal-count: 64 cluster.shd-wait-qlength: 32768 performance.cache-size: 4GB performance.write-behind-window-size: 50MB disperse.background-heals: 16 disperse.heal-wait-qlength: 2048 disperse.self-heal-window-size: 16 disperse.shd-wait-qlength: 16384 features.cache-invalidation: off features.cache-invalidation-timeout: 60 performance.stat-prefetch: on performance.cache-invalidation: off performance.md-cache-timeout: 1 network.inode-lru-limit: 200000 performance.nl-cache-limit: 10MB performance.nl-cache-positive-entry: FALSE performance.nl-cache: off cluster.use-compound-fops: on performance.parallel-readdir: on cluster.lookup-optimize: on client.event-threads: 32 server.event-threads: 16 performance.low-prio-threads: 64 performance.normal-prio-threads: 64 performance.high-prio-threads: 64 performance.least-prio-threads: 64 performance.io-thread-count: 64 cluster.eager-lock: on < Down Brick Log - 10.10.10.121:49154 > [2020-02-25 18:39:47.375248] I [MSGID: 115008] [server-resolve.c:541:server_resolve_fd] 0-: fd not found in context [Bad file descriptor] [2020-02-25 18:39:47.375658] E [MSGID: 115073] [server-rpc-fops.c:1833:server_fxattrop_cbk] 0-nas-server: 17031439: FXATTROP 2 (a79e63ee-5610-41bb-bd59-72bf7164354d), client: CLIENT01-35025-2020/01/08-11:41:03:97960-nas-client-3-5-3, error-xlator: - [Bad file descriptor] [2020-02-25 18:39:47.375701] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x103e10f, Program: GlusterFS 3.3, ProgVers: 330, Proc: 34) to rpc-transport (tcp.nas-server) [2020-02-25 18:39:47.375750] E [server.c:195:server_submit_reply] (-->/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x1bfa6) [0x7f2b6f552fa6] -->/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x1bba3) [0x7f2b6f552ba3] -->/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x94a6) [0x7f2b6f5404a6] ) 0-: Reply submission failed [2020-02-25 18:39:47.376339] I [MSGID: 115008] [server-resolve.c:541:server_resolve_fd] 0-: fd not found in context [Bad file descriptor] [2020-02-25 18:39:47.376365] E [MSGID: 115090] [server-rpc-fops.c:2171:server_compound_cbk] 0-nas-server: 17031440: COMPOUND2 (a79e63ee-5610-41bb-bd59-72bf7164354d), client: CLIENT01-35025-2020/01/08-11:41:03:97960-nas-client-3-5-3, error-xlator: - [Bad file descriptor] pending frames: frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(13) frame : type(0) op(52) frame : type(0) op(13) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(13) frame : type(0) op(52) frame : type(0) op(13) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(13) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(13) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(13) frame : type(0) op(13) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(13) frame : type(0) op(13) frame : type(0) op(13) frame : type(0) op(13) frame : type(0) op(52) frame : type(0) op(13) frame : type(0) op(13) frame : type(0) op(13) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(13) frame : type(0) op(13) frame : type(0) op(13) frame : type(0) op(13) frame : type(0) op(52) frame : type(0) op(13) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(13) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(11) frame : type(0) op(52) frame : type(0) op(33) frame : type(0) op(11) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(13) frame : type(0) op(52) frame : type(0) op(52) frame : type(0) op(11) frame : type(0) op(52) frame : type(0) op(33) frame : type(0) op(11) frame : type(0) op(52) frame : type(0) op(11) frame : type(0) op(33) frame : type(0) op(52) frame : type(0) op(33) frame : type(0) op(52) patchset: git://git.gluster.org/glusterfs.git signal received: 11 time of crash: 2020-02-25 18:39:47 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.12.15 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f2b842db4e0] /lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f2b842e5414] /lib64/libc.so.6(+0x36280)[0x7f2b8293b280] /usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x242eb)[0x7f2b6f55b2eb] /usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x2674d)[0x7f2b6f55d74d] /usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc8e9)[0x7f2b6f5438e9] /usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc98d)[0x7f2b6f54398d] /usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd3a5)[0x7f2b6f5443a5] /usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc9ce)[0x7f2b6f5439ce] /usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd23e)[0x7f2b6f54423e] /usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd356)[0x7f2b6f544356] /usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc9ae)[0x7f2b6f5439ae] /usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd464)[0x7f2b6f544464] /usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x27195)[0x7f2b6f55e195] /lib64/libgfrpc.so.0(rpcsvc_request_handler+0x96)[0x7f2b8409d246] /lib64/libpthread.so.0(+0x7dd5)[0x7f2b8313add5] /lib64/libc.so.6(clone+0x6d)[0x7f2b82a02ead] --------- < CLIENT01 gluster LOG > [2020-02-25 14:18:28.617641] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0xff906d sent = 2020-02-25 13:48:27.301196. timeout = 1800 for 10.10.10.121:49154 [2020-02-25 14:18:28.627224] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5) [2020-02-25 14:48:28.961143] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x10001f7 sent = 2020-02-25 14:18:28.627356. timeout = 1800 for 10.10.10.121:49154 [2020-02-25 14:48:28.961402] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5) [2020-02-25 14:48:28.970718] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 235752053: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected) [2020-02-25 15:19:59.238515] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x1007af2 sent = 2020-02-25 14:49:55.891752. timeout = 1800 for 10.10.10.121:49154 [2020-02-25 15:19:59.238736] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5) [2020-02-25 15:49:59.570378] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x100ef9f sent = 2020-02-25 15:19:59.238815. timeout = 1800 for 10.10.10.121:49154 [2020-02-25 15:49:59.570438] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5) [2020-02-25 15:49:59.615705] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 235897960: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected) [2020-02-25 16:29:29.884503] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x1017eda sent = 2020-02-25 15:59:22.576432. timeout = 1800 for 10.10.10.121:49154 [2020-02-25 16:29:29.884547] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5) [2020-02-25 16:59:30.151714] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x101f311 sent = 2020-02-25 16:29:29.884632. timeout = 1800 for 10.10.10.121:49154 [2020-02-25 16:59:30.151759] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5) [2020-02-25 16:59:30.163008] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236055074: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected) [2020-02-25 17:39:20.481714] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x1028daa sent = 2020-02-25 17:09:17.597659. timeout = 1800 for 10.10.10.121:49154 [2020-02-25 17:39:20.481784] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5) [2020-02-25 18:09:20.783690] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x102fefb sent = 2020-02-25 17:39:20.481881. timeout = 1800 for 10.10.10.121:49154 [2020-02-25 18:09:20.783739] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5) [2020-02-25 18:09:20.864368] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236216079: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected) [2020-02-25 18:39:47.372960] C [rpc-clnt.c:449:rpc_clnt_fill_request_info] 5-nas-client-3: cannot lookup the saved frame corresponding to xid (16748653) [2020-02-25 18:39:47.373034] W [socket.c:1956:__socket_read_reply] 5-nas-client-3: notify for event MAP_XID failed for 10.10.10.121:49154 [2020-02-25 18:39:47.373060] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 5-nas-client-3: disconnected from nas-client-3. Client process will keep trying to connect to glusterd until brick's port is available [2020-02-25 18:39:47.373332] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f3ab5f58ebb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f3ab5d1dd9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f3ab5d1debe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f3ab5d1f640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f3ab5d20130] ))))) 5-nas-client-3: forced unwinding frame type(GlusterFS 3.3) op(XATTROP(33)) called at 2020-02-25 18:19:23.681712 (xid=0x10396ec) [2020-02-25 18:39:47.373366] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5) [2020-02-25 18:39:47.373670] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f3ab5f58ebb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f3ab5d1dd9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f3ab5d1debe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f3ab5d1f640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f3ab5d20130] ))))) 5-nas-client-3: forced unwinding frame type(GlusterFS 3.3) op(FXATTROP(34)) called at 2020-02-25 18:39:47.372139 (xid=0x103e10f) [2020-02-25 18:39:47.373704] W [MSGID: 114031] [client-rpc-fops.c:1782:client3_3_fxattrop_cbk] 5-nas-client-3: remote operation failed [2020-02-25 18:39:47.373723] E [MSGID: 114031] [client-rpc-fops.c:1557:client3_3_finodelk_cbk] 5-nas-client-3: remote operation failed [Transport endpoint is not connected] [2020-02-25 18:39:47.373927] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 5-nas-client-3: remote operation failed [Transport endpoint is not connected] [2020-02-25 18:39:47.373942] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f3ab5f58ebb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f3ab5d1dd9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f3ab5d1debe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f3ab5d1f640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f3ab5d20130] ))))) 5-nas-client-3: forced unwinding frame type(GlusterFS 3.3) op(COMPOUND(49)) called at 2020-02-25 18:39:47.372211 (xid=0x103e110) [2020-02-25 18:39:47.373981] W [MSGID: 114031] [client-rpc-fops.c:3138:client3_3_compound_cbk] 5-nas-client-3: remote operation failed [Transport endpoint is not connected] [2020-02-25 18:39:47.374029] W [MSGID: 114061] [client-common.c:438:client_pre_flush] 5-nas-client-3: (a79e63ee-5610-41bb-bd59-72bf7164354d) remote_fd is -1. EBADFD [File descriptor in bad state] [2020-02-25 18:39:47.374330] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236374556: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected) [2020-02-25 18:39:47.380497] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419406: TRUNCATE() /Image/655/148418_org.jpg => -1 (Input/output error) [2020-02-25 18:39:47.464163] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419417: TRUNCATE() /Image/655/152411_org.jpg => -1 (Input/output error) [2020-02-25 18:39:47.466615] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419422: TRUNCATE() /Image/655/152411.jpg => -1 (Input/output error) [2020-02-25 18:39:47.812181] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419427: TRUNCATE() /Image/655/151868_org.jpg => -1 (Input/output error) [2020-02-25 18:39:52.409176] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419532: TRUNCATE() /Image/655/147289_org.jpg => -1 (Input/output error) [2020-02-25 18:39:58.037141] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 5-nas-client-3: changing port to 49154 (from 0) [2020-02-25 18:39:58.040347] E [socket.c:2376:socket_connect_finish] 5-nas-client-3: connection to 10.10.10.121:49154 failed (Connection refused); disconnecting socket In addition, we recently asked about the issue of increased CPU load for this gluster server. (https://bugzilla.redhat.com/show_bug.cgi?id=1806244) Do you know why the brick went down? Please tell me if you need more information. Version-Release number of selected component (if applicable): 3.12.15 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
This bug is moved to https://github.com/gluster/glusterfs/issues/883, and will be tracked there from now on. Visit GitHub issues URL for further details