Bug 1807738

Summary: [GlusterFS-3.12.15] The brick went down after the FOP.
Product: [Community] GlusterFS Reporter: JeongKinam <knjeong>
Component: coreAssignee: bugs <bugs>
Status: CLOSED UPSTREAM QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: mainlineCC: bugs, pasik
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-12 12:22:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description JeongKinam 2020-02-27 05:45:15 UTC
Description of problem:


Suddenly, a brick went down during the operation and collected an error, but unfortunately no core dump was generated.


< GlusterFS(3.12.15) Volume Info >
Volume Name: nas
Type: Distributed-Replicate
Volume ID: 5a8cc386-85dc-4255-b287-352a499a28d5
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: NAS01:/gluster1/nas
Brick2: NAS02:/gluster1/nas
Brick3: NAS03:/gluster1/nas
Brick4: NAS01:/gluster2/nas ===> down brick(10.10.10.121:49154)
Brick5: NAS02:/gluster2/nas
Brick6: NAS03:/gluster2/nas
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
cluster.self-heal-window-size: 16
cluster.background-self-heal-count: 64
cluster.shd-wait-qlength: 32768
performance.cache-size: 4GB
performance.write-behind-window-size: 50MB
disperse.background-heals: 16
disperse.heal-wait-qlength: 2048
disperse.self-heal-window-size: 16
disperse.shd-wait-qlength: 16384
features.cache-invalidation: off
features.cache-invalidation-timeout: 60
performance.stat-prefetch: on
performance.cache-invalidation: off
performance.md-cache-timeout: 1
network.inode-lru-limit: 200000
performance.nl-cache-limit: 10MB
performance.nl-cache-positive-entry: FALSE
performance.nl-cache: off
cluster.use-compound-fops: on
performance.parallel-readdir: on
cluster.lookup-optimize: on
client.event-threads: 32
server.event-threads: 16
performance.low-prio-threads: 64
performance.normal-prio-threads: 64
performance.high-prio-threads: 64
performance.least-prio-threads: 64
performance.io-thread-count: 64
cluster.eager-lock: on


< Down Brick Log - 10.10.10.121:49154 >
[2020-02-25 18:39:47.375248] I [MSGID: 115008] [server-resolve.c:541:server_resolve_fd] 0-: fd not found in context [Bad file descriptor]
[2020-02-25 18:39:47.375658] E [MSGID: 115073] [server-rpc-fops.c:1833:server_fxattrop_cbk] 0-nas-server: 17031439: FXATTROP 2 (a79e63ee-5610-41bb-bd59-72bf7164354d), client: CLIENT01-35025-2020/01/08-11:41:03:97960-nas-client-3-5-3, error-xlator: - [Bad file descriptor]
[2020-02-25 18:39:47.375701] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x103e10f, Program: GlusterFS 3.3, ProgVers: 330, Proc: 34) to rpc-transport (tcp.nas-server)
[2020-02-25 18:39:47.375750] E [server.c:195:server_submit_reply] (-->/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x1bfa6) [0x7f2b6f552fa6] -->/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x1bba3) [0x7f2b6f552ba3] -->/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x94a6) [0x7f2b6f5404a6] ) 0-: Reply submission failed
[2020-02-25 18:39:47.376339] I [MSGID: 115008] [server-resolve.c:541:server_resolve_fd] 0-: fd not found in context [Bad file descriptor]
[2020-02-25 18:39:47.376365] E [MSGID: 115090] [server-rpc-fops.c:2171:server_compound_cbk] 0-nas-server: 17031440: COMPOUND2 (a79e63ee-5610-41bb-bd59-72bf7164354d), client: CLIENT01-35025-2020/01/08-11:41:03:97960-nas-client-3-5-3, error-xlator: - [Bad file descriptor]
pending frames:
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(33)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(33)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(11)
frame : type(0) op(33)
frame : type(0) op(52)
frame : type(0) op(33)
frame : type(0) op(52)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2020-02-25 18:39:47
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.15
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f2b842db4e0]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f2b842e5414]
/lib64/libc.so.6(+0x36280)[0x7f2b8293b280]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x242eb)[0x7f2b6f55b2eb]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x2674d)[0x7f2b6f55d74d]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc8e9)[0x7f2b6f5438e9]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc98d)[0x7f2b6f54398d]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd3a5)[0x7f2b6f5443a5]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc9ce)[0x7f2b6f5439ce]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd23e)[0x7f2b6f54423e]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd356)[0x7f2b6f544356]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc9ae)[0x7f2b6f5439ae]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd464)[0x7f2b6f544464]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x27195)[0x7f2b6f55e195]
/lib64/libgfrpc.so.0(rpcsvc_request_handler+0x96)[0x7f2b8409d246]
/lib64/libpthread.so.0(+0x7dd5)[0x7f2b8313add5]
/lib64/libc.so.6(clone+0x6d)[0x7f2b82a02ead]
---------


< CLIENT01 gluster LOG >
[2020-02-25 14:18:28.617641] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0xff906d sent = 2020-02-25 13:48:27.301196. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 14:18:28.627224] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 14:48:28.961143] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x10001f7 sent = 2020-02-25 14:18:28.627356. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 14:48:28.961402] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 14:48:28.970718] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 235752053: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected)
[2020-02-25 15:19:59.238515] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x1007af2 sent = 2020-02-25 14:49:55.891752. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 15:19:59.238736] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 15:49:59.570378] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x100ef9f sent = 2020-02-25 15:19:59.238815. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 15:49:59.570438] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 15:49:59.615705] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 235897960: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected)
[2020-02-25 16:29:29.884503] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x1017eda sent = 2020-02-25 15:59:22.576432. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 16:29:29.884547] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 16:59:30.151714] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x101f311 sent = 2020-02-25 16:29:29.884632. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 16:59:30.151759] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 16:59:30.163008] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236055074: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected)
[2020-02-25 17:39:20.481714] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x1028daa sent = 2020-02-25 17:09:17.597659. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 17:39:20.481784] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 18:09:20.783690] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x102fefb sent = 2020-02-25 17:39:20.481881. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 18:09:20.783739] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 18:09:20.864368] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236216079: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected)
[2020-02-25 18:39:47.372960] C [rpc-clnt.c:449:rpc_clnt_fill_request_info] 5-nas-client-3: cannot lookup the saved frame corresponding to xid (16748653)
[2020-02-25 18:39:47.373034] W [socket.c:1956:__socket_read_reply] 5-nas-client-3: notify for event MAP_XID failed for 10.10.10.121:49154
[2020-02-25 18:39:47.373060] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 5-nas-client-3: disconnected from nas-client-3. Client process will keep trying to connect to glusterd until brick's port is available
[2020-02-25 18:39:47.373332] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f3ab5f58ebb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f3ab5d1dd9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f3ab5d1debe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f3ab5d1f640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f3ab5d20130] ))))) 5-nas-client-3: forced unwinding frame type(GlusterFS 3.3) op(XATTROP(33)) called at 2020-02-25 18:19:23.681712 (xid=0x10396ec)
[2020-02-25 18:39:47.373366] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 18:39:47.373670] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f3ab5f58ebb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f3ab5d1dd9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f3ab5d1debe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f3ab5d1f640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f3ab5d20130] ))))) 5-nas-client-3: forced unwinding frame type(GlusterFS 3.3) op(FXATTROP(34)) called at 2020-02-25 18:39:47.372139 (xid=0x103e10f)
[2020-02-25 18:39:47.373704] W [MSGID: 114031] [client-rpc-fops.c:1782:client3_3_fxattrop_cbk] 5-nas-client-3: remote operation failed
[2020-02-25 18:39:47.373723] E [MSGID: 114031] [client-rpc-fops.c:1557:client3_3_finodelk_cbk] 5-nas-client-3: remote operation failed [Transport endpoint is not connected]
[2020-02-25 18:39:47.373927] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 5-nas-client-3: remote operation failed [Transport endpoint is not connected]
[2020-02-25 18:39:47.373942] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f3ab5f58ebb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f3ab5d1dd9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f3ab5d1debe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f3ab5d1f640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f3ab5d20130] ))))) 5-nas-client-3: forced unwinding frame type(GlusterFS 3.3) op(COMPOUND(49)) called at 2020-02-25 18:39:47.372211 (xid=0x103e110)
[2020-02-25 18:39:47.373981] W [MSGID: 114031] [client-rpc-fops.c:3138:client3_3_compound_cbk] 5-nas-client-3: remote operation failed [Transport endpoint is not connected]
[2020-02-25 18:39:47.374029] W [MSGID: 114061] [client-common.c:438:client_pre_flush] 5-nas-client-3:  (a79e63ee-5610-41bb-bd59-72bf7164354d) remote_fd is -1. EBADFD [File descriptor in bad state]
[2020-02-25 18:39:47.374330] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236374556: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected)
[2020-02-25 18:39:47.380497] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419406: TRUNCATE() /Image/655/148418_org.jpg => -1 (Input/output error)
[2020-02-25 18:39:47.464163] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419417: TRUNCATE() /Image/655/152411_org.jpg => -1 (Input/output error)
[2020-02-25 18:39:47.466615] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419422: TRUNCATE() /Image/655/152411.jpg => -1 (Input/output error)
[2020-02-25 18:39:47.812181] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419427: TRUNCATE() /Image/655/151868_org.jpg => -1 (Input/output error)
[2020-02-25 18:39:52.409176] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419532: TRUNCATE() /Image/655/147289_org.jpg => -1 (Input/output error)
[2020-02-25 18:39:58.037141] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 5-nas-client-3: changing port to 49154 (from 0)
[2020-02-25 18:39:58.040347] E [socket.c:2376:socket_connect_finish] 5-nas-client-3: connection to 10.10.10.121:49154 failed (Connection refused); disconnecting socket



In addition, we recently asked about the issue of increased CPU load for this gluster server.
(https://bugzilla.redhat.com/show_bug.cgi?id=1806244)

Do you know why the brick went down?

Please tell me if you need more information.



Version-Release number of selected component (if applicable): 3.12.15


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Worker Ant 2020-03-12 12:22:11 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/883, and will be tracked there from now on. Visit GitHub issues URL for further details