Bug 1807738 - [GlusterFS-3.12.15] The brick went down after the FOP.
Summary: [GlusterFS-3.12.15] The brick went down after the FOP.
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-27 05:45 UTC by JeongKinam
Modified: 2020-03-12 12:22 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: ---
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-12 12:22:11 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description JeongKinam 2020-02-27 05:45:15 UTC
Description of problem:


Suddenly, a brick went down during the operation and collected an error, but unfortunately no core dump was generated.


< GlusterFS(3.12.15) Volume Info >
Volume Name: nas
Type: Distributed-Replicate
Volume ID: 5a8cc386-85dc-4255-b287-352a499a28d5
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 3 = 6
Transport-type: tcp
Bricks:
Brick1: NAS01:/gluster1/nas
Brick2: NAS02:/gluster1/nas
Brick3: NAS03:/gluster1/nas
Brick4: NAS01:/gluster2/nas ===> down brick(10.10.10.121:49154)
Brick5: NAS02:/gluster2/nas
Brick6: NAS03:/gluster2/nas
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
cluster.self-heal-window-size: 16
cluster.background-self-heal-count: 64
cluster.shd-wait-qlength: 32768
performance.cache-size: 4GB
performance.write-behind-window-size: 50MB
disperse.background-heals: 16
disperse.heal-wait-qlength: 2048
disperse.self-heal-window-size: 16
disperse.shd-wait-qlength: 16384
features.cache-invalidation: off
features.cache-invalidation-timeout: 60
performance.stat-prefetch: on
performance.cache-invalidation: off
performance.md-cache-timeout: 1
network.inode-lru-limit: 200000
performance.nl-cache-limit: 10MB
performance.nl-cache-positive-entry: FALSE
performance.nl-cache: off
cluster.use-compound-fops: on
performance.parallel-readdir: on
cluster.lookup-optimize: on
client.event-threads: 32
server.event-threads: 16
performance.low-prio-threads: 64
performance.normal-prio-threads: 64
performance.high-prio-threads: 64
performance.least-prio-threads: 64
performance.io-thread-count: 64
cluster.eager-lock: on


< Down Brick Log - 10.10.10.121:49154 >
[2020-02-25 18:39:47.375248] I [MSGID: 115008] [server-resolve.c:541:server_resolve_fd] 0-: fd not found in context [Bad file descriptor]
[2020-02-25 18:39:47.375658] E [MSGID: 115073] [server-rpc-fops.c:1833:server_fxattrop_cbk] 0-nas-server: 17031439: FXATTROP 2 (a79e63ee-5610-41bb-bd59-72bf7164354d), client: CLIENT01-35025-2020/01/08-11:41:03:97960-nas-client-3-5-3, error-xlator: - [Bad file descriptor]
[2020-02-25 18:39:47.375701] E [rpcsvc.c:1364:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x103e10f, Program: GlusterFS 3.3, ProgVers: 330, Proc: 34) to rpc-transport (tcp.nas-server)
[2020-02-25 18:39:47.375750] E [server.c:195:server_submit_reply] (-->/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x1bfa6) [0x7f2b6f552fa6] -->/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x1bba3) [0x7f2b6f552ba3] -->/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x94a6) [0x7f2b6f5404a6] ) 0-: Reply submission failed
[2020-02-25 18:39:47.376339] I [MSGID: 115008] [server-resolve.c:541:server_resolve_fd] 0-: fd not found in context [Bad file descriptor]
[2020-02-25 18:39:47.376365] E [MSGID: 115090] [server-rpc-fops.c:2171:server_compound_cbk] 0-nas-server: 17031440: COMPOUND2 (a79e63ee-5610-41bb-bd59-72bf7164354d), client: CLIENT01-35025-2020/01/08-11:41:03:97960-nas-client-3-5-3, error-xlator: - [Bad file descriptor]
pending frames:
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(33)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(13)
frame : type(0) op(52)
frame : type(0) op(52)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(33)
frame : type(0) op(11)
frame : type(0) op(52)
frame : type(0) op(11)
frame : type(0) op(33)
frame : type(0) op(52)
frame : type(0) op(33)
frame : type(0) op(52)
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash:
2020-02-25 18:39:47
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.12.15
/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xa0)[0x7f2b842db4e0]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f2b842e5414]
/lib64/libc.so.6(+0x36280)[0x7f2b8293b280]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x242eb)[0x7f2b6f55b2eb]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x2674d)[0x7f2b6f55d74d]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc8e9)[0x7f2b6f5438e9]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc98d)[0x7f2b6f54398d]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd3a5)[0x7f2b6f5443a5]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc9ce)[0x7f2b6f5439ce]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd23e)[0x7f2b6f54423e]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd356)[0x7f2b6f544356]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xc9ae)[0x7f2b6f5439ae]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0xd464)[0x7f2b6f544464]
/usr/lib64/glusterfs/3.12.15/xlator/protocol/server.so(+0x27195)[0x7f2b6f55e195]
/lib64/libgfrpc.so.0(rpcsvc_request_handler+0x96)[0x7f2b8409d246]
/lib64/libpthread.so.0(+0x7dd5)[0x7f2b8313add5]
/lib64/libc.so.6(clone+0x6d)[0x7f2b82a02ead]
---------


< CLIENT01 gluster LOG >
[2020-02-25 14:18:28.617641] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0xff906d sent = 2020-02-25 13:48:27.301196. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 14:18:28.627224] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 14:48:28.961143] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x10001f7 sent = 2020-02-25 14:18:28.627356. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 14:48:28.961402] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 14:48:28.970718] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 235752053: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected)
[2020-02-25 15:19:59.238515] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x1007af2 sent = 2020-02-25 14:49:55.891752. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 15:19:59.238736] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 15:49:59.570378] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x100ef9f sent = 2020-02-25 15:19:59.238815. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 15:49:59.570438] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 15:49:59.615705] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 235897960: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected)
[2020-02-25 16:29:29.884503] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x1017eda sent = 2020-02-25 15:59:22.576432. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 16:29:29.884547] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 16:59:30.151714] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x101f311 sent = 2020-02-25 16:29:29.884632. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 16:59:30.151759] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 16:59:30.163008] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236055074: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected)
[2020-02-25 17:39:20.481714] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x1028daa sent = 2020-02-25 17:09:17.597659. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 17:39:20.481784] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 18:09:20.783690] E [rpc-clnt.c:185:call_bail] 5-nas-client-3: bailing out frame type(GlusterFS 3.3) op(XATTROP(33)) xid = 0x102fefb sent = 2020-02-25 17:39:20.481881. timeout = 1800 for 10.10.10.121:49154
[2020-02-25 18:09:20.783739] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 18:09:20.864368] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236216079: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected)
[2020-02-25 18:39:47.372960] C [rpc-clnt.c:449:rpc_clnt_fill_request_info] 5-nas-client-3: cannot lookup the saved frame corresponding to xid (16748653)
[2020-02-25 18:39:47.373034] W [socket.c:1956:__socket_read_reply] 5-nas-client-3: notify for event MAP_XID failed for 10.10.10.121:49154
[2020-02-25 18:39:47.373060] I [MSGID: 114018] [client.c:2285:client_rpc_notify] 5-nas-client-3: disconnected from nas-client-3. Client process will keep trying to connect to glusterd until brick's port is available
[2020-02-25 18:39:47.373332] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f3ab5f58ebb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f3ab5d1dd9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f3ab5d1debe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f3ab5d1f640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f3ab5d20130] ))))) 5-nas-client-3: forced unwinding frame type(GlusterFS 3.3) op(XATTROP(33)) called at 2020-02-25 18:19:23.681712 (xid=0x10396ec)
[2020-02-25 18:39:47.373366] E [MSGID: 114031] [client-rpc-fops.c:1718:client3_3_xattrop_cbk] 5-nas-client-3: remote operation failed. Path: /Image/655/146136.jpg (8745f884-b922-4066-b54c-5893066070c5)
[2020-02-25 18:39:47.373670] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f3ab5f58ebb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f3ab5d1dd9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f3ab5d1debe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f3ab5d1f640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f3ab5d20130] ))))) 5-nas-client-3: forced unwinding frame type(GlusterFS 3.3) op(FXATTROP(34)) called at 2020-02-25 18:39:47.372139 (xid=0x103e10f)
[2020-02-25 18:39:47.373704] W [MSGID: 114031] [client-rpc-fops.c:1782:client3_3_fxattrop_cbk] 5-nas-client-3: remote operation failed
[2020-02-25 18:39:47.373723] E [MSGID: 114031] [client-rpc-fops.c:1557:client3_3_finodelk_cbk] 5-nas-client-3: remote operation failed [Transport endpoint is not connected]
[2020-02-25 18:39:47.373927] E [MSGID: 114031] [client-rpc-fops.c:1508:client3_3_inodelk_cbk] 5-nas-client-3: remote operation failed [Transport endpoint is not connected]
[2020-02-25 18:39:47.373942] E [rpc-clnt.c:350:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x13b)[0x7f3ab5f58ebb] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f3ab5d1dd9e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f3ab5d1debe] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x90)[0x7f3ab5d1f640] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x2a0)[0x7f3ab5d20130] ))))) 5-nas-client-3: forced unwinding frame type(GlusterFS 3.3) op(COMPOUND(49)) called at 2020-02-25 18:39:47.372211 (xid=0x103e110)
[2020-02-25 18:39:47.373981] W [MSGID: 114031] [client-rpc-fops.c:3138:client3_3_compound_cbk] 5-nas-client-3: remote operation failed [Transport endpoint is not connected]
[2020-02-25 18:39:47.374029] W [MSGID: 114061] [client-common.c:438:client_pre_flush] 5-nas-client-3:  (a79e63ee-5610-41bb-bd59-72bf7164354d) remote_fd is -1. EBADFD [File descriptor in bad state]
[2020-02-25 18:39:47.374330] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236374556: TRUNCATE() /Image/655/146136.jpg => -1 (Transport endpoint is not connected)
[2020-02-25 18:39:47.380497] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419406: TRUNCATE() /Image/655/148418_org.jpg => -1 (Input/output error)
[2020-02-25 18:39:47.464163] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419417: TRUNCATE() /Image/655/152411_org.jpg => -1 (Input/output error)
[2020-02-25 18:39:47.466615] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419422: TRUNCATE() /Image/655/152411.jpg => -1 (Input/output error)
[2020-02-25 18:39:47.812181] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419427: TRUNCATE() /Image/655/151868_org.jpg => -1 (Input/output error)
[2020-02-25 18:39:52.409176] W [fuse-bridge.c:779:fuse_truncate_cbk] 0-glusterfs-fuse: 236419532: TRUNCATE() /Image/655/147289_org.jpg => -1 (Input/output error)
[2020-02-25 18:39:58.037141] I [rpc-clnt.c:1986:rpc_clnt_reconfig] 5-nas-client-3: changing port to 49154 (from 0)
[2020-02-25 18:39:58.040347] E [socket.c:2376:socket_connect_finish] 5-nas-client-3: connection to 10.10.10.121:49154 failed (Connection refused); disconnecting socket



In addition, we recently asked about the issue of increased CPU load for this gluster server.
(https://bugzilla.redhat.com/show_bug.cgi?id=1806244)

Do you know why the brick went down?

Please tell me if you need more information.



Version-Release number of selected component (if applicable): 3.12.15


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Worker Ant 2020-03-12 12:22:11 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/883, and will be tracked there from now on. Visit GitHub issues URL for further details


Note You need to log in before you can comment on or make changes to this bug.