Description of problem: ======================== I created my systemic setup on friday as part of regression cycle testing Over the weekend I noticed two of the bricks in a 8x2 volume crashed(both of different dht-subvols) Following is the volume configuration bash-4.3$ ssh root.35.20 root.35.20's password: Last login: Mon Jan 2 11:55:17 2017 from dhcp35-226.lab.eng.blr.redhat.com [root@dhcp35-20 ~]# gluster v status Status of volume: sysvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.20:/rhs/brick1/sysvol 49154 0 Y 2404 Brick 10.70.37.86:/rhs/brick1/sysvol 49154 0 Y 26789 Brick 10.70.35.156:/rhs/brick1/sysvol N/A N/A N N/A Brick 10.70.37.154:/rhs/brick1/sysvol 49154 0 Y 2192 Brick 10.70.35.20:/rhs/brick2/sysvol 49155 0 Y 2424 Brick 10.70.37.86:/rhs/brick2/sysvol N/A N/A N N/A Brick 10.70.35.156:/rhs/brick2/sysvol 49155 0 Y 26793 Brick 10.70.37.154:/rhs/brick2/sysvol 49155 0 Y 2212 Snapshot Daemon on localhost 49156 0 Y 2449 Self-heal Daemon on localhost N/A N/A Y 3131 Quota Daemon on localhost N/A N/A Y 3139 Snapshot Daemon on 10.70.37.86 49156 0 Y 26832 Self-heal Daemon on 10.70.37.86 N/A N/A Y 2187 Quota Daemon on 10.70.37.86 N/A N/A Y 2195 Snapshot Daemon on 10.70.35.156 49156 0 Y 26816 Self-heal Daemon on 10.70.35.156 N/A N/A Y 1376 Quota Daemon on 10.70.35.156 N/A N/A Y 1384 Snapshot Daemon on 10.70.37.154 49156 0 Y 2235 Self-heal Daemon on 10.70.37.154 N/A N/A Y 9321 Quota Daemon on 10.70.37.154 N/A N/A Y 9329 Task Status of Volume sysvol ------------------------------------------------------------------------------ There are no active volume tasks [root@dhcp35-20 ~]# gluster v info Volume Name: sysvol Type: Distributed-Replicate Volume ID: 4efd4f77-85c7-4eb9-b958-6769d31d84c8 Status: Started Snapshot Count: 0 Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: 10.70.35.20:/rhs/brick1/sysvol Brick2: 10.70.37.86:/rhs/brick1/sysvol Brick3: 10.70.35.156:/rhs/brick1/sysvol Brick4: 10.70.37.154:/rhs/brick1/sysvol Brick5: 10.70.35.20:/rhs/brick2/sysvol Brick6: 10.70.37.86:/rhs/brick2/sysvol Brick7: 10.70.35.156:/rhs/brick2/sysvol Brick8: 10.70.37.154:/rhs/brick2/sysvol Options Reconfigured: diagnostics.count-fop-hits: on diagnostics.latency-measurement: on performance.md-cache-timeout: 600 performance.cache-invalidation: on performance.stat-prefetch: on features.cache-invalidation-timeout: 600 features.cache-invalidation: on features.quota-deem-statfs: on features.inode-quota: on features.quota: on features.uss: enable transport.address-family: inet performance.readdir-ahead: on nfs.disable: on Also more about the testing is available at https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=632186609 I will be putting more information, but below is the trace I found in brick log Brick 10.70.35.156:/rhs/brick1/sysvol N/A N/A N N/A [2017-01-01 23:48:37.960916] I [MSGID: 115036] [server.c:552:server_rpc_notify] 0-sysvol-server: disconnecting connection from rhs-client14.lab.eng.blr.redhat.com-4193-2016/12/30-10:12:13:880804-sysvol-client-2-0-133 [2017-01-01 23:48:37.961019] W [entrylk.c:752:pl_entrylk_log_cleanup] 0-sysvol-server: releasing lock on fdd77d52-2461-424f-be26-58b3213e2916 held by {client=0x7fbd97e4def0, pid=-6 lk-owner=64de9764667f0000} [2017-01-01 23:48:37.961044] W [entrylk.c:752:pl_entrylk_log_cleanup] 0-sysvol-server: releasing lock on 2b86c5a0-1c1e-4316-a73b-23599b41eb6c held by {client=0x7fbd97e4def0, pid=-6 lk-owner=e8f3a464667f0000} [2017-01-01 23:48:37.961063] W [entrylk.c:752:pl_entrylk_log_cleanup] 0-sysvol-server: releasing lock on fdd77d52-2461-424f-be26-58b3213e2916 held by {client=0x7fbd97e4def0, pid=-6 lk-owner=64de9764667f0000} [2017-01-01 23:48:37.961080] W [entrylk.c:752:pl_entrylk_log_cleanup] 0-sysvol-server: releasing lock on 7a287317-1ad3-4a73-8c6c-d0337606c287 held by {client=0x7fbd97e4def0, pid=-6 lk-owner=64b8a264667f0000} [2017-01-01 23:48:37.961095] W [entrylk.c:752:pl_entrylk_log_cleanup] 0-sysvol-server: releasing lock on 2b86c5a0-1c1e-4316-a73b-23599b41eb6c held by {client=0x7fbd97e4def0, pid=-6 lk-owner=e8f3a464667f0000} [2017-01-01 23:48:37.961107] W [entrylk.c:752:pl_entrylk_log_cleanup] 0-sysvol-server: releasing lock on 7a287317-1ad3-4a73-8c6c-d0337606c287 held by {client=0x7fbd97e4def0, pid=-6 lk-owner=64b8a264667f0000} [2017-01-01 23:48:37.961193] W [socket.c:590:__socket_rwv] 0-tcp.sysvol-server: writev on 10.70.37.72:1018 failed (Broken pipe) [2017-01-01 23:48:37.969562] I [socket.c:3513:socket_submit_reply] 0-tcp.sysvol-server: not connected (priv->connected = -1) [2017-01-01 23:48:37.969597] E [rpcsvc.c:1304:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x10a7e6, Program: GlusterFS 3.3, ProgVers: 330, Proc: 31) to rpc-transport (tcp.sysvol-server) [2017-01-01 23:48:37.980615] E [server.c:202:server_submit_reply] (-->/usr/lib64/glusterfs/3.8.4/xlator/debug/io-stats.so(+0x18502) [0x7fbd91e05502] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x18d36) [0x7fbd919a5d36] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x9186) [0x7fbd91996186] ) 0-: Reply submission failed pending frames: frame : type(0) op(0) frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(11) frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(1) frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(27) frame : type(0) op(31) frame : type(0) op(33) frame : type(0) op(31) frame : type(0) op(31) frame : type(0) op(31) frame : type(0) op(31) frame : type(0) op(31) frame : type(0) op(31) frame : type(0) op(11) frame : type(0) op(29) frame : type(0) op(31) frame : type(0) op(18) frame : type(0) op(31) frame : type(0) op(11) frame : type(0) op(11) frame : type(0) op(11) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) [2017-01-01 23:48:37.980670] W [socket.c:590:__socket_rwv] 0-tcp.sysvol-server: writev on 10.70.36.36:1010 failed (Broken pipe) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) [2017-01-01 23:48:37.991515] I [socket.c:3513:socket_submit_reply] 0-tcp.sysvol-server: not connected (priv->connected = -1) [2017-01-01 23:48:38.002516] E [rpcsvc.c:1304:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x43f158, Program: GlusterFS 3.3, ProgVers: 330, Proc: 1) to rpc-transport (tcp.sysvol-server) [2017-01-01 23:48:38.002566] E [server.c:202:server_submit_reply] (-->/usr/lib64/glusterfs/3.8.4/xlator/debug/io-stats.so(+0x11b2e) [0x7fbd91dfeb2e] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x20b9b) [0x7fbd919adb9b] -->/usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x9186) [0x7fbd91996186] ) 0-: Reply submission failed frame : type(0) op(5) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(31) frame : type(0) op(0) frame : type(0) op(8) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(8) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2017-01-01 23:48:50 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.8.4 [2017-01-01 23:48:38.070045] W [socket.c:590:__socket_rwv] 0-tcp.sysvol-server: writev on 10.70.36.45:999 failed (Broken pipe) [2017-01-01 23:48:38.070631] W [socket.c:590:__socket_rwv] 0-tcp.sysvol-server: writev on 10.70.36.60:998 failed (Broken pipe) [2017-01-01 23:48:38.070766] W [socket.c:590:__socket_rwv] 0-tcp.sysvol-server: writev on 10.70.36.56:1001 failed (Broken pipe) /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7fbda6772c32] /lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7fbda677c6c4] /lib64/libc.so.6(+0x35250)[0x7fbda4e56250] /lib64/libglusterfs.so.0(_gf_event+0x137)[0x7fbda67e7ad7] /usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x7d9d)[0x7fbd91994d9d] /lib64/libgfrpc.so.0(rpcsvc_handle_disconnect+0x10f)[0x7fbda65344cf] /lib64/libgfrpc.so.0(rpcsvc_notify+0xc0)[0x7fbda6536910] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7fbda6538893] /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x9714)[0x7fbd9b02c714] /lib64/libglusterfs.so.0(+0x83650)[0x7fbda67cc650] /lib64/libpthread.so.0(+0x7dc5)[0x7fbda55d3dc5] /lib64/libc.so.6(clone+0x6d)[0x7fbda4f1873d] --------- [2017-01-01 23:48:38.072069] I [socket.c:3513:socket_submit_reply] 0-tcp.sysvol-server: not connected (priv->connected = -1) [2017-01-01 23:48:50.395573] E [rpcsvc.c:1304:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0xa847, Program: GlusterFS 3.3, ProgVers: 330, Proc: 27) to rpc-transport (tcp.sysvol-server) [2017-01-01 23:48:38.080453] E [rpcsvc.c:1304:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x93205c, Program: GlusterFS 3.3, ProgVers: 330, Proc: 8) to rpc-transport (tcp.sysvol-server) [2017-01-01 23:48:50.395766] I [MSGID: 101055] [client_t.c:415:gf_client_unref] 0-sysvol-server: Shutting down connection rhs-client23.lab.eng.blr.redhat.com-3206-2016/12/30-10:15:07:168605-sysvol-client-2-0-134 Also following is the gdb [root@dhcp35-156 ~]# file /core.26773 /core.26773: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from '/usr/sbin/glusterfsd -s 10.70.35.156 --volfile-id sysvol.10.70.35.156.rhs-brick', real uid: 0, effective uid: 0, real gid: 0, effective gid: 0, execfn: '/usr/sbin/glusterfsd', platform: 'x86_64' #0 0x00007fbda67e7ad7 in _gf_event () from /lib64/libglusterfs.so.0 #1 0x00007fbd91994d9d in server_rpc_notify () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so #2 0x00007fbda65344cf in rpcsvc_handle_disconnect () from /lib64/libgfrpc.so.0 #3 0x00007fbda6536910 in rpcsvc_notify () from /lib64/libgfrpc.so.0 #4 0x00007fbda6538893 in rpc_transport_notify () from /lib64/libgfrpc.so.0 #5 0x00007fbd9b02c714 in socket_event_handler () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #6 0x00007fbda67cc650 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 #7 0x00007fbda55d3dc5 in start_thread () from /lib64/libpthread.so.0 #8 0x00007fbda4f1873d in clone () from /lib64/libc.so.6 (gdb) Version-Release number of selected component (if applicable): =================== 3.8.4-10
Also, I saw if this was the same bz as "1385606 - 4 of 8 bricks (2 dht subvols) crashed on systemic setup" which is on_qa I found the trace to be different Hence raising a new bug that also means bz#1385606 is blocked till this gets fixed
Note: also there were kernel hangs seen in dmesg log for which I have updated BZ#1397907 - seeing frequent kernel hangs when doing operations both on fuse client and gluster nodes on replica volumes Note: the crash happened on Jan 2 05:56 the dmesg shows kernel hangs on dec31 many times [Thu Dec 29 19:27:37 2016] type=1305 audit(1483019859.632:1357): audit_pid=0 old=750 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1 [Thu Dec 29 19:27:38 2016] type=1130 audit(1483019859.637:1358): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=auditd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [Thu Dec 29 19:27:38 2016] type=1131 audit(1483019859.637:1359): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=auditd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [Thu Dec 29 19:27:39 2016] type=1107 audit(1483019860.641:1360): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='avc: received policyload notice (seqno=11) exe="/usr/lib/systemd/systemd" sauid=0 hostname=? addr=? terminal=?' [Thu Dec 29 19:27:39 2016] type=1107 audit(1483019860.641:1361): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='avc: received policyload notice (seqno=12) exe="/usr/lib/systemd/systemd" sauid=0 hostname=? addr=? terminal=?' [Thu Dec 29 19:27:39 2016] type=1107 audit(1483019860.641:1362): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='avc: received policyload notice (seqno=13) exe="/usr/lib/systemd/systemd" sauid=0 hostname=? addr=? terminal=?' [Thu Dec 29 19:27:39 2016] type=1305 audit(1483019860.651:1363): audit_enabled=1 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1 [Thu Dec 29 19:27:39 2016] type=1305 audit(1483019860.651:1364): audit_pid=26454 old=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1 [Thu Dec 29 19:30:30 2016] fuse init (API version 7.22) [Sat Dec 31 23:40:18 2016] INFO: task xfsaild/dm-0:487 blocked for more than 120 seconds. [Sat Dec 31 23:40:18 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sat Dec 31 23:40:18 2016] xfsaild/dm-0 D ffff880451cfb000 0 487 2 0x00000000 [Sat Dec 31 23:40:18 2016] ffff8804502dbd58 0000000000000046 ffff880451f00fb0 ffff8804502dbfd8 [Sat Dec 31 23:40:18 2016] ffff8804502dbfd8 ffff8804502dbfd8 ffff880451f00fb0 ffff8804513d5100 [Sat Dec 31 23:40:18 2016] 0000000000000000 ffff880451f00fb0 ffff8804510ee528 ffff880451cfb000 [Sat Dec 31 23:40:18 2016] Call Trace: [Sat Dec 31 23:40:18 2016] [<ffffffff8168b579>] schedule+0x29/0x70 [Sat Dec 31 23:40:18 2016] [<ffffffffa027a97d>] _xfs_log_force+0x1bd/0x2b0 [xfs] [Sat Dec 31 23:40:18 2016] [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20 [Sat Dec 31 23:40:18 2016] [<ffffffffa027aa96>] xfs_log_force+0x26/0x80 [xfs] [Sat Dec 31 23:40:18 2016] [<ffffffffa0286360>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] [Sat Dec 31 23:40:18 2016] [<ffffffffa02864ba>] xfsaild+0x15a/0x660 [xfs] [Sat Dec 31 23:40:18 2016] [<ffffffffa0286360>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs] [Sat Dec 31 23:40:18 2016] [<ffffffff810b052f>] kthread+0xcf/0xe0 [Sat Dec 31 23:40:18 2016] [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140 [Sat Dec 31 23:40:18 2016] [<ffffffff81696418>] ret_from_fork+0x58/0x90 [Sat Dec 31 23:40:18 2016] [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140 [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:21336 blocked for more than 120 seconds. [Sat Dec 31 23:40:18 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 21336 1 0x00000080 [Sat Dec 31 23:40:18 2016] ffff8803643f3c70 0000000000000086 ffff880451e55e20 ffff8803643f3fd8 [Sat Dec 31 23:40:18 2016] ffff8803643f3fd8 ffff8803643f3fd8 ffff880451e55e20 ffff8804507fa020 [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff880451e55e20 00000000ffffffff ffff8804507fa028 [Sat Dec 31 23:40:18 2016] Call Trace: [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] schedule_preempt_disabled+0x29/0x70 [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] __mutex_lock_slowpath+0xc5/0x1c0 [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? kmem_cache_alloc_trace+0x1d6/0x200 [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 [Sat Dec 31 23:40:18 2016] [<ffffffff810f4f30>] ? futex_wake+0x80/0x160 [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:22591 blocked for more than 120 seconds. [Sat Dec 31 23:40:18 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 22591 1 0x00000080 [Sat Dec 31 23:40:18 2016] ffff8804512efc70 0000000000000086 ffff880451e54e70 ffff8804512effd8 [Sat Dec 31 23:40:18 2016] ffff8804512effd8 ffff8804512effd8 ffff880451e54e70 ffff8804507fa020 [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff880451e54e70 00000000ffffffff ffff8804507fa028 [Sat Dec 31 23:40:18 2016] Call Trace: [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] schedule_preempt_disabled+0x29/0x70 [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] __mutex_lock_slowpath+0xc5/0x1c0 [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? kmem_cache_alloc_trace+0x1d6/0x200 [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 [Sat Dec 31 23:40:18 2016] [<ffffffff810f4f30>] ? futex_wake+0x80/0x160 [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:22654 blocked for more than 120 seconds. [Sat Dec 31 23:40:18 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 22654 1 0x00000080 [Sat Dec 31 23:40:18 2016] ffff880440abbc70 0000000000000086 ffff88032274bec0 ffff880440abbfd8 [Sat Dec 31 23:40:18 2016] ffff880440abbfd8 ffff880440abbfd8 ffff88032274bec0 ffff8804507fa020 [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff88032274bec0 00000000ffffffff ffff8804507fa028 [Sat Dec 31 23:40:18 2016] Call Trace: [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] schedule_preempt_disabled+0x29/0x70 [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] __mutex_lock_slowpath+0xc5/0x1c0 [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? kmem_cache_alloc_trace+0x1d6/0x200 [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 [Sat Dec 31 23:40:18 2016] [<ffffffff810f4f30>] ? futex_wake+0x80/0x160 [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:23515 blocked for more than 120 seconds. [Sat Dec 31 23:40:18 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 23515 1 0x00000080 [Sat Dec 31 23:40:18 2016] ffff8802c97cfc70 0000000000000086 ffff88032538af10 ffff8802c97cffd8 [Sat Dec 31 23:40:18 2016] ffff8802c97cffd8 ffff8802c97cffd8 ffff88032538af10 ffff8804507fa020 [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff88032538af10 00000000ffffffff ffff8804507fa028 [Sat Dec 31 23:40:18 2016] Call Trace: [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] schedule_preempt_disabled+0x29/0x70 [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] __mutex_lock_slowpath+0xc5/0x1c0 [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 [Sat Dec 31 23:40:18 2016] [<ffffffff81057bc3>] ? x2apic_send_IPI_mask+0x13/0x20 [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? kmem_cache_alloc_trace+0x1d6/0x200 [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:24449 blocked for more than 120 seconds. [Sat Dec 31 23:40:18 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 24449 1 0x00000080 [Sat Dec 31 23:40:18 2016] ffff88029bebbc70 0000000000000086 ffff880326aa8000 ffff88029bebbfd8 [Sat Dec 31 23:40:18 2016] ffff88029bebbfd8 ffff88029bebbfd8 ffff880326aa8000 ffff8804507fa020 [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff880326aa8000 00000000ffffffff ffff8804507fa028 [Sat Dec 31 23:40:18 2016] Call Trace: [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] schedule_preempt_disabled+0x29/0x70 [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] __mutex_lock_slowpath+0xc5/0x1c0 [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? kmem_cache_alloc_trace+0x1d6/0x200 [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 [Sat Dec 31 23:40:18 2016] [<ffffffff810f4f30>] ? futex_wake+0x80/0x160 [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:1380 blocked for more than 120 seconds. [Sat Dec 31 23:40:18 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 1380 1 0x00000080 [Sat Dec 31 23:40:18 2016] ffff88000adffc70 0000000000000086 ffff880451d14e70 ffff88000adfffd8 [Sat Dec 31 23:40:18 2016] ffff88000adfffd8 ffff88000adfffd8 ffff880451d14e70 ffff8804507fa020 [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff880451d14e70 00000000ffffffff ffff8804507fa028 [Sat Dec 31 23:40:18 2016] Call Trace: [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] schedule_preempt_disabled+0x29/0x70 [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] __mutex_lock_slowpath+0xc5/0x1c0 [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? kmem_cache_alloc_trace+0x1d6/0x200 [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 [Sat Dec 31 23:40:18 2016] [<ffffffff810f4f30>] ? futex_wake+0x80/0x160 [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:1381 blocked for more than 120 seconds. [Sat Dec 31 23:40:18 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 1381 1 0x00000080 [Sat Dec 31 23:40:18 2016] ffff88004483bc70 0000000000000086 ffff8803de232f10 ffff88004483bfd8 [Sat Dec 31 23:40:18 2016] ffff88004483bfd8 ffff88004483bfd8 ffff8803de232f10 ffff8804507fa020 [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff8803de232f10 00000000ffffffff ffff8804507fa028 [Sat Dec 31 23:40:18 2016] Call Trace: [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] schedule_preempt_disabled+0x29/0x70 [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] __mutex_lock_slowpath+0xc5/0x1c0 [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? kmem_cache_alloc_trace+0x1d6/0x200 [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 [Sat Dec 31 23:40:18 2016] [<ffffffff810f4f30>] ? futex_wake+0x80/0x160 [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:1382 blocked for more than 120 seconds. [Sat Dec 31 23:40:18 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 1382 1 0x00000080 [Sat Dec 31 23:40:18 2016] ffff88001c543c70 0000000000000086 ffff880326ad0000 ffff88001c543fd8 [Sat Dec 31 23:40:18 2016] ffff88001c543fd8 ffff88001c543fd8 ffff880326ad0000 ffff8804507fa020 [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff880326ad0000 00000000ffffffff ffff8804507fa028 [Sat Dec 31 23:40:18 2016] Call Trace: [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] schedule_preempt_disabled+0x29/0x70 [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] __mutex_lock_slowpath+0xc5/0x1c0 [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? kmem_cache_alloc_trace+0x1d6/0x200 [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:1719 blocked for more than 120 seconds. [Sat Dec 31 23:40:18 2016] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 1719 1 0x00000080 [Sat Dec 31 23:40:18 2016] ffff88000f23bc70 0000000000000086 ffff880036373ec0 ffff88000f23bfd8 [Sat Dec 31 23:40:18 2016] ffff88000f23bfd8 ffff88000f23bfd8 ffff880036373ec0 ffff8804507fa020 [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff880036373ec0 00000000ffffffff ffff8804507fa028 [Sat Dec 31 23:40:18 2016] Call Trace: [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] schedule_preempt_disabled+0x29/0x70 [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] __mutex_lock_slowpath+0xc5/0x1c0 [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? kmem_cache_alloc_trace+0x1d6/0x200 [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 [Sat Dec 31 23:40:18 2016] [<ffffffff810f4f30>] ? futex_wake+0x80/0x160 [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] system_call_fastpath+0x16/0x1b [Sun Jan 1 05:29:54 2017] Clock: inserting leap second 23:59:60 UTC [root@dhcp35-156 gluster]# ls -l /core.26773 -rw-------. 1 root root 7909343232 Jan 2 05:56 /core.26773 [root@dhcp35-156 gluster]#
sosreports and cores are available at http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/nchilaka/bug.1409472/
the other brick also has the same crash, hence not raising a new bz patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2017-01-01 01:55:25 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.8.4 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f5090987c32] /lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7f50909916c4] /lib64/libc.so.6(+0x35250)[0x7f508f06b250] /lib64/libglusterfs.so.0(_gf_event+0x137)[0x7f50909fcad7] /usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so(+0x31364)[0x7f507bbd1364] /lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325)[0x7f509074b775] /lib64/libgfrpc.so.0(rpcsvc_notify+0x10b)[0x7f509074b95b] /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f509074d893] /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x72d4)[0x7f508523f2d4] /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x9785)[0x7f5085241785] /lib64/libglusterfs.so.0(+0x83650)[0x7f50909e1650] /lib64/libpthread.so.0(+0x7dc5)[0x7f508f7e8dc5] /lib64/libc.so.6(clone+0x6d)[0x7f508f12d73d] bt of core file is as below Core was generated by `/usr/sbin/glusterfsd -s 10.70.37.86 --volfile-id sysvol.10.70.37.86.rhs-brick2-'. Program terminated with signal 11, Segmentation fault. #0 0x00007f50909fcad7 in _gf_event () from /lib64/libglusterfs.so.0 Missing separate debuginfos, use: debuginfo-install glusterfs-fuse-3.8.4-10.el7rhgs.x86_64 (gdb) bt #0 0x00007f50909fcad7 in _gf_event () from /lib64/libglusterfs.so.0 #1 0x00007f507bbd1364 in server_setvolume () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so #2 0x00007f509074b775 in rpcsvc_handle_rpc_call () from /lib64/libgfrpc.so.0 #3 0x00007f509074b95b in rpcsvc_notify () from /lib64/libgfrpc.so.0 #4 0x00007f509074d893 in rpc_transport_notify () from /lib64/libgfrpc.so.0 #5 0x00007f508523f2d4 in socket_event_poll_in () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #6 0x00007f5085241785 in socket_event_handler () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #7 0x00007f50909e1650 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 #8 0x00007f508f7e8dc5 in start_thread () from /lib64/libpthread.so.0 #9 0x00007f508f12d73d in clone () from /lib64/libc.so.6
(In reply to nchilaka from comment #5) > the other brick also has the same crash, hence not raising a new bz > patchset: git://git.gluster.com/glusterfs.git > signal received: 11 > time of crash: > 2017-01-01 01:55:25 > configuration details: > argp 1 > backtrace 1 > dlfcn 1 > libpthread 1 > llistxattr 1 > setfsid 1 > spinlock 1 > epoll.h 1 > xattr.h 1 > st_atim.tv_nsec 1 > package-string: glusterfs 3.8.4 > /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7f5090987c32] > /lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7f50909916c4] > /lib64/libc.so.6(+0x35250)[0x7f508f06b250] > /lib64/libglusterfs.so.0(_gf_event+0x137)[0x7f50909fcad7] > /usr/lib64/glusterfs/3.8.4/xlator/protocol/server. > so(+0x31364)[0x7f507bbd1364] > /lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x325)[0x7f509074b775] > /lib64/libgfrpc.so.0(rpcsvc_notify+0x10b)[0x7f509074b95b] > /lib64/libgfrpc.so.0(rpc_transport_notify+0x23)[0x7f509074d893] > /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x72d4)[0x7f508523f2d4] > /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so(+0x9785)[0x7f5085241785] > /lib64/libglusterfs.so.0(+0x83650)[0x7f50909e1650] > /lib64/libpthread.so.0(+0x7dc5)[0x7f508f7e8dc5] > /lib64/libc.so.6(clone+0x6d)[0x7f508f12d73d] > > > > bt of core file is as below > Core was generated by `/usr/sbin/glusterfsd -s 10.70.37.86 --volfile-id > sysvol.10.70.37.86.rhs-brick2-'. > Program terminated with signal 11, Segmentation fault. > #0 0x00007f50909fcad7 in _gf_event () from /lib64/libglusterfs.so.0 > Missing separate debuginfos, use: debuginfo-install > glusterfs-fuse-3.8.4-10.el7rhgs.x86_64 > (gdb) bt > #0 0x00007f50909fcad7 in _gf_event () from /lib64/libglusterfs.so.0 > #1 0x00007f507bbd1364 in server_setvolume () > from /usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so > #2 0x00007f509074b775 in rpcsvc_handle_rpc_call () from /lib64/libgfrpc.so.0 > #3 0x00007f509074b95b in rpcsvc_notify () from /lib64/libgfrpc.so.0 > #4 0x00007f509074d893 in rpc_transport_notify () from /lib64/libgfrpc.so.0 > #5 0x00007f508523f2d4 in socket_event_poll_in () > from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so > #6 0x00007f5085241785 in socket_event_handler () > from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so > #7 0x00007f50909e1650 in event_dispatch_epoll_worker () > from /lib64/libglusterfs.so.0 > #8 0x00007f508f7e8dc5 in start_thread () from /lib64/libpthread.so.0 > #9 0x00007f508f12d73d in clone () from /lib64/libc.so.6 BT does look similar to BZ 1399147
On 3rd Jan: I found the one of the client fuse mount crashed I found that a 3rd brick crashed(which was the only online brick of one the replica pairs) With that it means one dht-subvol is completely Down I later noticed that the fuse client crashed with a core (brick3 core) [root@dhcp37-154 ~]# ll /core.2192 -rw-------. 1 root root 7985569792 Jan 3 08:00 /core.2192 client side core: [root@rhs-client23 glusterfs]# ll /core.3210 -rw-------. 1 root root 3032264704 Jan 3 08:13 /core.3210 the client was mounted as below 10.70.37.154:sysvol on /mnt/sysvol type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072) The client crashed about 20 min after the brick crashed(don't know if that has any relation) latest vol status [root@dhcp35-20 ~]# gluster v status Status of volume: sysvol Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.35.20:/rhs/brick1/sysvol 49154 0 Y 2404 Brick 10.70.37.86:/rhs/brick1/sysvol 49154 0 Y 26789 Brick 10.70.35.156:/rhs/brick1/sysvol N/A N/A N N/A Brick 10.70.37.154:/rhs/brick1/sysvol N/A N/A N N/A Brick 10.70.35.20:/rhs/brick2/sysvol 49155 0 Y 2424 Brick 10.70.37.86:/rhs/brick2/sysvol N/A N/A N N/A Brick 10.70.35.156:/rhs/brick2/sysvol 49155 0 Y 26793 Brick 10.70.37.154:/rhs/brick2/sysvol 49155 0 Y 2212 Snapshot Daemon on localhost 49156 0 Y 2449 Self-heal Daemon on localhost N/A N/A Y 21443 Quota Daemon on localhost N/A N/A Y 21451 Snapshot Daemon on 10.70.37.86 49156 0 Y 26832 Self-heal Daemon on 10.70.37.86 N/A N/A Y 17952 Quota Daemon on 10.70.37.86 N/A N/A Y 17960 Snapshot Daemon on 10.70.35.156 49156 0 Y 26816 Self-heal Daemon on 10.70.35.156 N/A N/A Y 16898 Quota Daemon on 10.70.35.156 N/A N/A Y 16912 Snapshot Daemon on 10.70.37.154 49156 0 Y 2235 Self-heal Daemon on 10.70.37.154 N/A N/A Y 26914 Quota Daemon on 10.70.37.154 N/A N/A Y 26926 Task Status of Volume sysvol ------------------------------------------------------------------------------ There are no active volume tasks [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfs --volfile-server=10.70.37.154 --volfile-id=sysvol /mnt/sysv'. Program terminated with signal 11, Segmentation fault. #0 0x00007fe4ca3adab7 in _gf_event (event=event@entry=EVENT_AFR_SPLIT_BRAIN, fmt=fmt@entry=0x7fe4bc2941aa "subvol=%s;type=data;file=%s") at events.c:71 71 host = inet_ntoa (*(struct in_addr *)(host_data->h_addr)); Missing separate debuginfos, use: debuginfo-install glibc-2.17-105.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.13.2-10.el7.x86_64 libcom_err-1.42.9-7.el7.x86_64 libgcc-4.8.5-4.el7.x86_64 libselinux-2.2.2-6.el7.x86_64 libuuid-2.23.2-26.el7.x86_64 openssl-libs-1.0.1e-42.el7_1.9.x86_64 pcre-8.32-15.el7.x86_64 xz-libs-5.1.2-12alpha.el7.x86_64 zlib-1.2.7-15.el7.x86_64 (gdb) l _gf_event 31 #define EVENT_PORT 24009 32 33 34 int 35 _gf_event (eventtypes_t event, char *fmt, ...) 36 { 37 int ret = 0; 38 int sock = -1; 39 char *eventstr = NULL; 40 struct sockaddr_in server; (gdb) 41 va_list arguments; 42 char *msg = NULL; 43 glusterfs_ctx_t *ctx = NULL; 44 struct hostent *host_data; 45 char *host = NULL; 46 47 /* Global context */ 48 ctx = THIS->ctx; 49 50 if (event < 0 || event >= EVENT_LAST) { (gdb) 51 ret = EVENT_ERROR_INVALID_INPUTS; 52 goto out; 53 } 54 55 /* Initialize UDP socket */ 56 sock = socket (AF_INET, SOCK_DGRAM, 0); 57 if (sock < 0) { 58 ret = EVENT_ERROR_SOCKET; 59 goto out; 60 } (gdb) 61 62 /* Get Host name to send message */ 63 if (ctx && ctx->cmd_args.volfile_server) { 64 /* If it is client code then volfile_server is set 65 use that information to push the events. */ 66 host_data = gethostbyname (ctx->cmd_args.volfile_server); 67 if (host_data == NULL) { 68 ret = EVENT_ERROR_RESOLVE; 69 goto out; 70 } (gdb) 71 host = inet_ntoa (*(struct in_addr *)(host_data->h_addr)); 72 } else { 73 /* Localhost, Use the defined IP for localhost */ 74 host = EVENT_HOST; 75 } 76 77 /* Socket Configurations */ 78 server.sin_family = AF_INET; 79 server.sin_port = htons (EVENT_PORT); 80 server.sin_addr.s_addr = inet_addr (host); (gdb) p ctx->cmd_args.volfile_server $1 = 0x7fe4cbff6ab0 "10.70.37.154" (gdb) p host_data $2 = <optimized out> (gdb) bt #0 0x00007fe4ca3adab7 in _gf_event (event=event@entry=EVENT_AFR_SPLIT_BRAIN, fmt=fmt@entry=0x7fe4bc2941aa "subvol=%s;type=data;file=%s") at events.c:71 #1 0x00007fe4bc271c81 in __afr_selfheal_data_finalize_source (witness=<optimized out>, replies=<optimized out>, undid_pending=<optimized out>, locked_on=0x7fe4a14cb040 "\001\001", healed_sinks=0x7fe4a14cb060 "\001\001", sinks=0x7fe4a14cb080 "\001\001", sources=<optimized out>, inode=0x7fe4b543a35c, this=<optimized out>, frame=<optimized out>) at afr-self-heal-data.c:603 #2 __afr_selfheal_data_prepare (frame=<optimized out>, this=<optimized out>, inode=0x7fe4b543a35c, locked_on=0x7fe4a14cb040 "\001\001", sources=<optimized out>, sinks=0x7fe4a14cb080 "\001\001", healed_sinks=healed_sinks@entry=0x7fe4a14cb060 "\001\001", undid_pending=0x7fe4a14cb020 "", replies=0x7fe4a14caa60, pflag=pflag@entry=0x0) at afr-self-heal-data.c:684 #3 0x00007fe4bc271ecd in __afr_selfheal_data (frame=frame@entry=0x7fe4c7e84e00, this=this@entry=0x7fe4b80138e0, fd=0x7fe4b52de29c, locked_on=<optimized out>) at afr-self-heal-data.c:736 #4 0x00007fe4bc2734cb in afr_selfheal_data (frame=frame@entry=0x7fe4c7e84e00, this=this@entry=0x7fe4b80138e0, inode=0x7fe4b543a35c) at afr-self-heal-data.c:883 #5 0x00007fe4bc2702b6 in afr_selfheal_do (frame=frame@entry=0x7fe4c7e84e00, this=this@entry=0x7fe4b80138e0, gfid=gfid@entry=0x7fe3efd0e894 "/\245\326\r\225\066MP\241\351^Q+\233\356\b\001") at afr-self-heal-common.c:1968 #6 0x00007fe4bc270323 in afr_selfheal (this=0x7fe4b80138e0, gfid=0x7fe3efd0e894 "/\245\326\r\225\066MP\241\351^Q+\233\356\b\001") at afr-self-heal-common.c:2015 #7 0x00007fe4ca36eba2 in synctask_wrap (old_task=<optimized out>) at syncop.c:375 #8 0x00007fe4c8a31110 in ?? () from /lib64/libc.so.6 #9 0x0000000000000000 in ?? () (gdb) f 0 #0 0x00007fe4ca3adab7 in _gf_event (event=event@entry=EVENT_AFR_SPLIT_BRAIN, fmt=fmt@entry=0x7fe4bc2941aa "subvol=%s;type=data;file=%s") at events.c:71 71 host = inet_ntoa (*(struct in_addr *)(host_data->h_addr)); (gdb) [2017-01-03 02:43:04.358490] E [MSGID: 108008] [afr-transaction.c:2602:afr_write_txn_refresh_done] 0-sysvol-replicate-0: Failing WRITE on gfid c6b39a08-4302-4abd-9aea-45e8d46988e5: split-brain observed. frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2017-01-03 02:43:04 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.8.4 /lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xc2)[0x7fe4ca338c32] /lib64/libglusterfs.so.0(gf_print_trace+0x324)[0x7fe4ca3426b4] /lib64/libc.so.6(+0x35670)[0x7fe4c8a1f670] /lib64/libglusterfs.so.0(_gf_event+0x137)[0x7fe4ca3adab7] /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x3ec81)[0x7fe4bc271c81] /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x3eecd)[0x7fe4bc271ecd] /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x404cb)[0x7fe4bc2734cb] /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x3d2b6)[0x7fe4bc2702b6] /usr/lib64/glusterfs/3.8.4/xlator/cluster/replicate.so(+0x3d323)[0x7fe4bc270323] /lib64/libglusterfs.so.0(synctask_wrap+0x12)[0x7fe4ca36eba2] /lib64/libc.so.6(+0x47110)[0x7fe4c8a31110] --------- [2017-01-03 02:43:04.364138] W [MSGID: 108008] [afr-read-txn.c:239:afr_read_txn] 0-sysvol-replicate-0: Unreadable subvolume -1 found with event generation 76 for gfid c6b39a08-4302-4abd-9aea-45e8d46988e5. (Possible split-brain)
(In reply to nchilaka from comment #3) > Note: also there were kernel hangs seen in dmesg log for which I have Most importantly, I need to know if it is the same reason or else I will have to raise a new bz > updated BZ#1397907 - seeing frequent kernel hangs when doing operations both > on fuse client and gluster nodes on replica volumes > > Note: the crash happened on Jan 2 05:56 > the dmesg shows kernel hangs on dec31 many times > > [Thu Dec 29 19:27:37 2016] type=1305 audit(1483019859.632:1357): audit_pid=0 > old=750 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 > res=1 > [Thu Dec 29 19:27:38 2016] type=1130 audit(1483019859.637:1358): pid=1 uid=0 > auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 > msg='unit=auditd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? > addr=? terminal=? res=success' > [Thu Dec 29 19:27:38 2016] type=1131 audit(1483019859.637:1359): pid=1 uid=0 > auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 > msg='unit=auditd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? > addr=? terminal=? res=success' > [Thu Dec 29 19:27:39 2016] type=1107 audit(1483019860.641:1360): pid=1 uid=0 > auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='avc: > received policyload notice (seqno=11) > exe="/usr/lib/systemd/systemd" sauid=0 hostname=? addr=? terminal=?' > [Thu Dec 29 19:27:39 2016] type=1107 audit(1483019860.641:1361): pid=1 uid=0 > auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='avc: > received policyload notice (seqno=12) > exe="/usr/lib/systemd/systemd" sauid=0 hostname=? addr=? terminal=?' > [Thu Dec 29 19:27:39 2016] type=1107 audit(1483019860.641:1362): pid=1 uid=0 > auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='avc: > received policyload notice (seqno=13) > exe="/usr/lib/systemd/systemd" sauid=0 hostname=? addr=? terminal=?' > [Thu Dec 29 19:27:39 2016] type=1305 audit(1483019860.651:1363): > audit_enabled=1 old=1 auid=4294967295 ses=4294967295 > subj=system_u:system_r:auditd_t:s0 res=1 > [Thu Dec 29 19:27:39 2016] type=1305 audit(1483019860.651:1364): > audit_pid=26454 old=0 auid=4294967295 ses=4294967295 > subj=system_u:system_r:auditd_t:s0 res=1 > [Thu Dec 29 19:30:30 2016] fuse init (API version 7.22) > [Sat Dec 31 23:40:18 2016] INFO: task xfsaild/dm-0:487 blocked for more than > 120 seconds. > [Sat Dec 31 23:40:18 2016] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [Sat Dec 31 23:40:18 2016] xfsaild/dm-0 D ffff880451cfb000 0 487 > 2 0x00000000 > [Sat Dec 31 23:40:18 2016] ffff8804502dbd58 0000000000000046 > ffff880451f00fb0 ffff8804502dbfd8 > [Sat Dec 31 23:40:18 2016] ffff8804502dbfd8 ffff8804502dbfd8 > ffff880451f00fb0 ffff8804513d5100 > [Sat Dec 31 23:40:18 2016] 0000000000000000 ffff880451f00fb0 > ffff8804510ee528 ffff880451cfb000 > [Sat Dec 31 23:40:18 2016] Call Trace: > [Sat Dec 31 23:40:18 2016] [<ffffffff8168b579>] schedule+0x29/0x70 > [Sat Dec 31 23:40:18 2016] [<ffffffffa027a97d>] _xfs_log_force+0x1bd/0x2b0 > [xfs] > [Sat Dec 31 23:40:18 2016] [<ffffffff810c4ec0>] ? wake_up_state+0x20/0x20 > [Sat Dec 31 23:40:18 2016] [<ffffffffa027aa96>] xfs_log_force+0x26/0x80 > [xfs] > [Sat Dec 31 23:40:18 2016] [<ffffffffa0286360>] ? > xfs_trans_ail_cursor_first+0x90/0x90 [xfs] > [Sat Dec 31 23:40:18 2016] [<ffffffffa02864ba>] xfsaild+0x15a/0x660 [xfs] > [Sat Dec 31 23:40:18 2016] [<ffffffffa0286360>] ? > xfs_trans_ail_cursor_first+0x90/0x90 [xfs] > [Sat Dec 31 23:40:18 2016] [<ffffffff810b052f>] kthread+0xcf/0xe0 > [Sat Dec 31 23:40:18 2016] [<ffffffff810b0460>] ? > kthread_create_on_node+0x140/0x140 > [Sat Dec 31 23:40:18 2016] [<ffffffff81696418>] ret_from_fork+0x58/0x90 > [Sat Dec 31 23:40:18 2016] [<ffffffff810b0460>] ? > kthread_create_on_node+0x140/0x140 > [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:21336 blocked for more than > 120 seconds. > [Sat Dec 31 23:40:18 2016] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 21336 > 1 0x00000080 > [Sat Dec 31 23:40:18 2016] ffff8803643f3c70 0000000000000086 > ffff880451e55e20 ffff8803643f3fd8 > [Sat Dec 31 23:40:18 2016] ffff8803643f3fd8 ffff8803643f3fd8 > ffff880451e55e20 ffff8804507fa020 > [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff880451e55e20 > 00000000ffffffff ffff8804507fa028 > [Sat Dec 31 23:40:18 2016] Call Trace: > [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] > schedule_preempt_disabled+0x29/0x70 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] > __mutex_lock_slowpath+0xc5/0x1c0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f > [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? > kmem_cache_alloc_trace+0x1d6/0x200 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 > [Sat Dec 31 23:40:18 2016] [<ffffffff810f4f30>] ? futex_wake+0x80/0x160 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 > [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] > system_call_fastpath+0x16/0x1b > [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:22591 blocked for more than > 120 seconds. > [Sat Dec 31 23:40:18 2016] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 22591 > 1 0x00000080 > [Sat Dec 31 23:40:18 2016] ffff8804512efc70 0000000000000086 > ffff880451e54e70 ffff8804512effd8 > [Sat Dec 31 23:40:18 2016] ffff8804512effd8 ffff8804512effd8 > ffff880451e54e70 ffff8804507fa020 > [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff880451e54e70 > 00000000ffffffff ffff8804507fa028 > [Sat Dec 31 23:40:18 2016] Call Trace: > [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] > schedule_preempt_disabled+0x29/0x70 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] > __mutex_lock_slowpath+0xc5/0x1c0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f > [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? > kmem_cache_alloc_trace+0x1d6/0x200 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 > [Sat Dec 31 23:40:18 2016] [<ffffffff810f4f30>] ? futex_wake+0x80/0x160 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 > [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] > system_call_fastpath+0x16/0x1b > [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:22654 blocked for more than > 120 seconds. > [Sat Dec 31 23:40:18 2016] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 22654 > 1 0x00000080 > [Sat Dec 31 23:40:18 2016] ffff880440abbc70 0000000000000086 > ffff88032274bec0 ffff880440abbfd8 > [Sat Dec 31 23:40:18 2016] ffff880440abbfd8 ffff880440abbfd8 > ffff88032274bec0 ffff8804507fa020 > [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff88032274bec0 > 00000000ffffffff ffff8804507fa028 > [Sat Dec 31 23:40:18 2016] Call Trace: > [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] > schedule_preempt_disabled+0x29/0x70 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] > __mutex_lock_slowpath+0xc5/0x1c0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f > [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? > kmem_cache_alloc_trace+0x1d6/0x200 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 > [Sat Dec 31 23:40:18 2016] [<ffffffff810f4f30>] ? futex_wake+0x80/0x160 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 > [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] > system_call_fastpath+0x16/0x1b > [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:23515 blocked for more than > 120 seconds. > [Sat Dec 31 23:40:18 2016] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 23515 > 1 0x00000080 > [Sat Dec 31 23:40:18 2016] ffff8802c97cfc70 0000000000000086 > ffff88032538af10 ffff8802c97cffd8 > [Sat Dec 31 23:40:18 2016] ffff8802c97cffd8 ffff8802c97cffd8 > ffff88032538af10 ffff8804507fa020 > [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff88032538af10 > 00000000ffffffff ffff8804507fa028 > [Sat Dec 31 23:40:18 2016] Call Trace: > [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] > schedule_preempt_disabled+0x29/0x70 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] > __mutex_lock_slowpath+0xc5/0x1c0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f > [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 > [Sat Dec 31 23:40:18 2016] [<ffffffff81057bc3>] ? > x2apic_send_IPI_mask+0x13/0x20 > [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? > kmem_cache_alloc_trace+0x1d6/0x200 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 > [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] > system_call_fastpath+0x16/0x1b > [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:24449 blocked for more than > 120 seconds. > [Sat Dec 31 23:40:18 2016] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 24449 > 1 0x00000080 > [Sat Dec 31 23:40:18 2016] ffff88029bebbc70 0000000000000086 > ffff880326aa8000 ffff88029bebbfd8 > [Sat Dec 31 23:40:18 2016] ffff88029bebbfd8 ffff88029bebbfd8 > ffff880326aa8000 ffff8804507fa020 > [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff880326aa8000 > 00000000ffffffff ffff8804507fa028 > [Sat Dec 31 23:40:18 2016] Call Trace: > [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] > schedule_preempt_disabled+0x29/0x70 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] > __mutex_lock_slowpath+0xc5/0x1c0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f > [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? > kmem_cache_alloc_trace+0x1d6/0x200 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 > [Sat Dec 31 23:40:18 2016] [<ffffffff810f4f30>] ? futex_wake+0x80/0x160 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 > [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] > system_call_fastpath+0x16/0x1b > [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:1380 blocked for more than > 120 seconds. > [Sat Dec 31 23:40:18 2016] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 1380 > 1 0x00000080 > [Sat Dec 31 23:40:18 2016] ffff88000adffc70 0000000000000086 > ffff880451d14e70 ffff88000adfffd8 > [Sat Dec 31 23:40:18 2016] ffff88000adfffd8 ffff88000adfffd8 > ffff880451d14e70 ffff8804507fa020 > [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff880451d14e70 > 00000000ffffffff ffff8804507fa028 > [Sat Dec 31 23:40:18 2016] Call Trace: > [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] > schedule_preempt_disabled+0x29/0x70 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] > __mutex_lock_slowpath+0xc5/0x1c0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f > [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? > kmem_cache_alloc_trace+0x1d6/0x200 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 > [Sat Dec 31 23:40:18 2016] [<ffffffff810f4f30>] ? futex_wake+0x80/0x160 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 > [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] > system_call_fastpath+0x16/0x1b > [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:1381 blocked for more than > 120 seconds. > [Sat Dec 31 23:40:18 2016] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 1381 > 1 0x00000080 > [Sat Dec 31 23:40:18 2016] ffff88004483bc70 0000000000000086 > ffff8803de232f10 ffff88004483bfd8 > [Sat Dec 31 23:40:18 2016] ffff88004483bfd8 ffff88004483bfd8 > ffff8803de232f10 ffff8804507fa020 > [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff8803de232f10 > 00000000ffffffff ffff8804507fa028 > [Sat Dec 31 23:40:18 2016] Call Trace: > [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] > schedule_preempt_disabled+0x29/0x70 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] > __mutex_lock_slowpath+0xc5/0x1c0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f > [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? > kmem_cache_alloc_trace+0x1d6/0x200 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 > [Sat Dec 31 23:40:18 2016] [<ffffffff810f4f30>] ? futex_wake+0x80/0x160 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 > [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] > system_call_fastpath+0x16/0x1b > [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:1382 blocked for more than > 120 seconds. > [Sat Dec 31 23:40:18 2016] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 1382 > 1 0x00000080 > [Sat Dec 31 23:40:18 2016] ffff88001c543c70 0000000000000086 > ffff880326ad0000 ffff88001c543fd8 > [Sat Dec 31 23:40:18 2016] ffff88001c543fd8 ffff88001c543fd8 > ffff880326ad0000 ffff8804507fa020 > [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff880326ad0000 > 00000000ffffffff ffff8804507fa028 > [Sat Dec 31 23:40:18 2016] Call Trace: > [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] > schedule_preempt_disabled+0x29/0x70 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] > __mutex_lock_slowpath+0xc5/0x1c0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f > [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? > kmem_cache_alloc_trace+0x1d6/0x200 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 > [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] > system_call_fastpath+0x16/0x1b > [Sat Dec 31 23:40:18 2016] INFO: task glusterfsd:1719 blocked for more than > 120 seconds. > [Sat Dec 31 23:40:18 2016] "echo 0 > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [Sat Dec 31 23:40:18 2016] glusterfsd D ffff8804507fa028 0 1719 > 1 0x00000080 > [Sat Dec 31 23:40:18 2016] ffff88000f23bc70 0000000000000086 > ffff880036373ec0 ffff88000f23bfd8 > [Sat Dec 31 23:40:18 2016] ffff88000f23bfd8 ffff88000f23bfd8 > ffff880036373ec0 ffff8804507fa020 > [Sat Dec 31 23:40:18 2016] ffff8804507fa024 ffff880036373ec0 > 00000000ffffffff ffff8804507fa028 > [Sat Dec 31 23:40:18 2016] Call Trace: > [Sat Dec 31 23:40:18 2016] [<ffffffff8168c669>] > schedule_preempt_disabled+0x29/0x70 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168a2c5>] > __mutex_lock_slowpath+0xc5/0x1c0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8168972f>] mutex_lock+0x1f/0x2f > [Sat Dec 31 23:40:18 2016] [<ffffffff8120cb9f>] do_last+0x28f/0x12a0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811de2f6>] ? > kmem_cache_alloc_trace+0x1d6/0x200 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120dc72>] path_openat+0xc2/0x490 > [Sat Dec 31 23:40:18 2016] [<ffffffff810f4f30>] ? futex_wake+0x80/0x160 > [Sat Dec 31 23:40:18 2016] [<ffffffff8120fdeb>] do_filp_open+0x4b/0xb0 > [Sat Dec 31 23:40:18 2016] [<ffffffff8121ca67>] ? __alloc_fd+0xa7/0x130 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd2f3>] do_sys_open+0xf3/0x1f0 > [Sat Dec 31 23:40:18 2016] [<ffffffff811fd40e>] SyS_open+0x1e/0x20 > [Sat Dec 31 23:40:18 2016] [<ffffffff816964c9>] > system_call_fastpath+0x16/0x1b > [Sun Jan 1 05:29:54 2017] Clock: inserting leap second 23:59:60 UTC > [root@dhcp35-156 gluster]# ls -l /core.26773 > -rw-------. 1 root root 7909343232 Jan 2 05:56 /core.26773 > [root@dhcp35-156 gluster]#
(In reply to nchilaka from comment #9) > (In reply to nchilaka from comment #3) > > Note: also there were kernel hangs seen in dmesg log for which I have > Most importantly, I need to know if it is the same reason or else I will > have to raise a new bz Not sure whether this is a dup of bz 1385606, but definitely I've seen this bt earlier. The line brick crashed was: host = inet_ntoa (*(struct in_addr *)(host_data->h_addr)); I remember seeing host_data->h_addr being NULL and hence dereferencing it resulted in crash of brick. I had even discussed this with Atin.
(In reply to Raghavendra G from comment #10) > (In reply to nchilaka from comment #9) > > (In reply to nchilaka from comment #3) > > > Note: also there were kernel hangs seen in dmesg log for which I have > > Most importantly, I need to know if it is the same reason or else I will > > have to raise a new bz > > Not sure whether this is a dup of bz 1385606, but definitely I've seen this > bt earlier. The line brick crashed was: > > host = inet_ntoa (*(struct in_addr *)(host_data->h_addr)); > > I remember seeing host_data->h_addr being NULL and hence dereferencing it > resulted in crash of brick. I had even discussed this with Atin. bt of this crash looks same of BZ 1399147
client sosreports are available at scp -r /var/tmp/$HOSTNAME qe@rhsqe-repo:/var/www/html/sosreports/nchilaka/3.2_logs/systemic_testing_logs/regression_cycle/same_dir_create_clients/
upstream mainline patch http://review.gluster.org/16327 posted for review.
Downstream patch https://code.engineering.redhat.com/gerrit/94316
changing the title as it was too generic and given that the issue was found in eventing and the fix is available, I can consume the title for some other new bz I would be raising. This will also help in future searches for this bz
I have run my systemic testing setup for 2 days, i didn't see any new crash with this core. hence moving to verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html