Description of problem: Glusterd encountered a segfault. Version-Release number of selected component (if applicable): [ec2-user@ip-192-168-0-14 ~]$ rpm -qa | grep gluster gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-cli-3.8.4-54.10.el7rhgs.x86_64 glusterfs-geo-replication-3.8.4-54.10.el7rhgs.x86_64 glusterfs-debuginfo-3.8.4-54.10.el7rhgs.x86_64 glusterfs-client-xlators-3.8.4-54.10.el7rhgs.x86_64 glusterfs-api-3.8.4-54.10.el7rhgs.x86_64 python-gluster-3.8.4-54.10.el7rhgs.noarch gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 pcp-pmda-gluster-4.1.0-0.201805281909.git68ab4b18.el7.x86_64 glusterfs-libs-3.8.4-54.10.el7rhgs.x86_64 glusterfs-fuse-3.8.4-54.10.el7rhgs.x86_64 vdsm-gluster-4.17.33-1.2.el7rhgs.noarch libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.5.x86_64 glusterfs-3.8.4-54.10.el7rhgs.x86_64 glusterfs-server-3.8.4-54.10.el7rhgs.x86_64 glusterfs-rdma-3.8.4-54.10.el7rhgs.x86_64 How reproducible: I do not have a reproducer. There were no admin actions ongoing other than regular monitoring via pcp glusterpmda and standard I/O workload. Steps to Reproduce: 1. unknown 2. 3. Actual results: Expected results: Additional info: segv is a result of a bad length to memcpy... see below. BT of relevant thread(s): Thread 3 (Thread 0x7f2dabfff700 (LWP 29173)): #0 0x00007f2e4bb4ae9d in poll () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f2e3f116fa9 in poll (__timeout=-1, __nfds=2, __fds=0x7f2dabffee80) at /usr/include/bits/poll2.h:46 #2 socket_poller (ctx=0x7f2e28126310) at socket.c:2563 #3 0x00007f2e4c28cdd5 in start_thread (arg=0x7f2dabfff700) at pthread_create.c:308 #4 0x00007f2e4bb55b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 2 (Thread 0x7f2e2f7fe700 (LWP 21641)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007f2e4d46359b in __synclock_lock (lock=lock@entry=0x7f2e4d7a0818) at syncop.c:924 #2 0x00007f2e4d466ad6 in synclock_lock (lock=lock@entry=0x7f2e4d7a0818) at syncop.c:950 #3 0x00007f2e41f1981b in glusterd_big_locked_cbk (req=0x7f2e283f59a0, iov=0x7f2e283f59e0, count=1, myframe=0x7f2e283d1fd0, fn=0x7f2e41f177b0 <__glusterd_commit_op_cbk>) at glusterd-rpc-ops.c:214 #4 0x00007f2e4d1f5770 in rpc_clnt_handle_reply (clnt=clnt@entry=0x55718e4f8a50, pollin=pollin@entry=0x7f2e26b77310) at rpc-clnt.c:794 #5 0x00007f2e4d1f5a57 in rpc_clnt_notify (trans=<optimized out>, mydata=0x55718e4f8a80, event=<optimized out>, data=0x7f2e26b77310) at rpc-clnt.c:987 #6 0x00007f2e4d1f1913 in rpc_transport_notify (this=this@entry=0x55718e4f9430, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f2e26b77310) at rpc-transport.c:538 #7 0x00007f2e3f114366 in socket_event_poll_in (this=this@entry=0x55718e4f9430, notify_handled=notify_handled@entry=_gf_false) at socket.c:2306 #8 0x00007f2e3f117042 in socket_poller (ctx=0x55718e4f9430) at socket.c:2581 #9 0x00007f2e4c28cdd5 in start_thread (arg=0x7f2e2f7fe700) at pthread_create.c:308 #10 0x00007f2e4bb55b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 1 (Thread 0x7f2e06ffd700 (LWP 13885)): #0 0x00007f2e4d423798 in memcpy (__len=18446744072547383994, __src=0x7f2e26b9f0b0, __dest=0x7f2e4c4506b0) at /usr/include/bits/string3.h:51 #1 data_to_int32_ptr (data=0x7f2e26b9f000, val=val@entry=0x7f2e06ffc21c) at dict.c:1518 #2 0x00007f2e4d4265fd in dict_get_int32 (this=this@entry=0x7f2e19efbfc0, key=key@entry=0x7f2e41fd5a2c "count", val=val@entry=0x7f2e06ffc21c) at dict.c:1776 #3 0x00007f2e41f04681 in glusterd_profile_volume_use_rsp_dict (aggr=0x7f2e19efbfc0, rsp_dict=0x7f2da64d1550) at glusterd-utils.c:9544 #4 0x00007f2e41f17c2e in __glusterd_commit_op_cbk (req=req@entry=0x7f2e283f6690, iov=iov@entry=0x7f2e283f66d0, count=count@entry=1, myframe=myframe@entry=0x7f2e283d2140) at glusterd-rpc-ops.c:1430 #5 0x00007f2e41f1982a in glusterd_big_locked_cbk (req=0x7f2e283f6690, iov=0x7f2e283f66d0, count=1, myframe=0x7f2e283d2140, fn=0x7f2e41f177b0 <__glusterd_commit_op_cbk>) at glusterd-rpc-ops.c:215 #6 0x00007f2e4d1f5770 in rpc_clnt_handle_reply (clnt=clnt@entry=0x55718e501280, pollin=pollin@entry=0x7f2da64e85a0) at rpc-clnt.c:794 #7 0x00007f2e4d1f5a57 in rpc_clnt_notify (trans=<optimized out>, mydata=0x55718e5012b0, event=<optimized out>, data=0x7f2da64e85a0) at rpc-clnt.c:987 #8 0x00007f2e4d1f1913 in rpc_transport_notify (this=this@entry=0x55718e501480, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f2da64e85a0) at rpc-transport.c:538 #9 0x00007f2e3f114366 in socket_event_poll_in (this=this@entry=0x55718e501480, notify_handled=notify_handled@entry=_gf_false) at socket.c:2306 #10 0x00007f2e3f117042 in socket_poller (ctx=0x55718e501480) at socket.c:2581 #11 0x00007f2e4c28cdd5 in start_thread (arg=0x7f2e06ffd700) at pthread_create.c:308 #12 0x00007f2e4bb55b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Snip of glusterd log: ... [2018-07-25 14:46:39.319143] I [MSGID: 106494] [glusterd-handler.c:3087:__glusterd_handle_cli_profile_volume] 0-management: Received volume profile req for volume supervol01 The message "I [MSGID: 106488] [glusterd-handler.c:1540:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 7 times between [2018-07-25 14:45:39.062451] and [2018-07-25 14:46:39.092824] [2018-07-25 14:46:39.202692] I [MSGID: 106494] [glusterd-handler.c:3087:__glusterd_handle_cli_profile_volume] 0-management: Received volume profile req for volume gluster_shared_storage [2018-07-25 14:46:39.228740] E [MSGID: 106524] [glusterd-op-sm.c:1987:glusterd_op_stage_stats_volume] 0-glusterd: Profile on Volume gluster_shared_storage is not started [2018-07-25 14:46:40.320097] I [MSGID: 106494] [glusterd-handler.c:3087:__glusterd_handle_cli_profile_volume] 0-management: Received volume profile req for volume supervole2a03 [2018-07-25 14:47:14.369772] E [socket.c:2631:socket_poller] 0-socket.management: socket_poller 172.31.71.222:1023 failed (No data available) [2018-07-25 14:47:39.093345] I [MSGID: 106488] [glusterd-handler.c:1540:__glusterd_handle_cli_get_volume] 0-management: Received get vol req [2018-07-25 14:47:39.205960] I [MSGID: 106494] [glusterd-handler.c:3087:__glusterd_handle_cli_profile_volume] 0-management: Received volume profile req for volume gluster_shared_storage [2018-07-25 14:47:39.243975] E [MSGID: 106524] [glusterd-op-sm.c:1987:glusterd_op_stage_stats_volume] 0-glusterd: Profile on Volume gluster_shared_storage is not started [2018-07-25 14:47:39.244014] E [MSGID: 106301] [glusterd-op-sm.c:4553:glusterd_op_ac_send_stage_op] 0-management: Staging of operation 'Volume Profile' failed on localhost : Profile on Volume gluster_shared_storage is not started [2018-07-25 14:47:39.369274] I [MSGID: 106494] [glusterd-handler.c:3087:__glusterd_handle_cli_profile_volume] 0-management: Received volume profile req for volume supervol01 [2018-07-25 14:47:39.369370] W [glusterd-locks.c:622:glusterd_mgmt_v3_lock] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x24a7b) [0x7f2e41ecba7b] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x242d2) [0x7f2e41ecb2d2] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xdda56) [0x7f2e41f84a56] ) 0-management: Lock for supervol01 held by 8bbb0dba-d052-4321-b460-030465d89d70 [2018-07-25 14:47:39.369383] E [MSGID: 106119] [glusterd-handler.c:794:glusterd_op_txn_begin] 0-management: Unable to acquire lock for supervol01 [2018-07-25 14:47:39.482267] I [MSGID: 106494] [glusterd-handler.c:3087:__glusterd_handle_cli_profile_volume] 0-management: Received volume profile req for volume supervole2a03 [2018-07-25 14:48:39.221960] E [MSGID: 106301] [glusterd-op-sm.c:4553:glusterd_op_ac_send_stage_op] 0-management: Staging of operation 'Volume Profile' failed on localhost : Profile on Volume gluster_shared_storage is not started [2018-07-25 14:48:39.350491] I [MSGID: 106494] [glusterd-handler.c:3087:__glusterd_handle_cli_profile_volume] 0-management: Received volume profile req for volume supervol01 [2018-07-25 14:48:39.350578] W [glusterd-locks.c:622:glusterd_mgmt_v3_lock] (-->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x24a7b) [0x7f2e41ecba7b] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0x242d2) [0x7f2e41ecb2d2] -->/usr/lib64/glusterfs/3.8.4/xlator/mgmt/glusterd.so(+0xdda56) [0x7f2e41f84a56] ) 0-management: Lock for supervol01 held by 8bbb0dba-d052-4321-b460-030465d89d70 [2018-07-25 14:48:39.350597] E [MSGID: 106119] [glusterd-handler.c:794:glusterd_op_txn_begin] 0-management: Unable to acquire lock for supervol01 The message "I [MSGID: 106488] [glusterd-handler.c:1540:__glusterd_handle_cli_get_volume] 0-management: Received get vol req" repeated 7 times between [2018-07-25 14:47:39.093345] and [2018-07-25 14:48:39.075899] [2018-07-25 14:48:39.185182] I [MSGID: 106494] [glusterd-handler.c:3087:__glusterd_handle_cli_profile_volume] 0-management: Received volume profile req for volume gluster_shared_storage [2018-07-25 14:48:39.221954] E [MSGID: 106524] [glusterd-op-sm.c:1987:glusterd_op_stage_stats_volume] 0-glusterd: Profile on Volume gluster_shared_storage is not started [2018-07-25 14:48:39.464297] I [MSGID: 106494] [glusterd-handler.c:3087:__glusterd_handle_cli_profile_volume] 0-management: Received volume profile req for volume supervole2a03 [2018-07-25 14:48:47.279277] E [rpc-clnt.c:200:call_bail] 0-management: bailing out frame type(brick operations) op(--(2)) xid = 0xe3a2 sent = 2018-07-25 14:38:39.424347. timeout = 600 for /var/run/gluster/ac7892a429909ee0963384af107bfd9e.socket [2018-07-25 14:48:47.284340] I [socket.c:3659:socket_submit_reply] 0-socket.management: not connected (priv->connected = -1) [2018-07-25 14:48:47.284349] E [rpcsvc.c:1333:rpcsvc_submit_generic] 0-rpc-service: failed to submit message (XID: 0x1, Program: GlusterD svc cli, ProgVers: 2, Proc: 23) to rpc-transport (socket.management) [2018-07-25 14:48:47.284381] E [MSGID: 106430] [glusterd-utils.c:539:glusterd_submit_reply] 0-glusterd: Reply submission failed [2018-07-25 14:49:39.094685] I [MSGID: 106488] [glusterd-handler.c:1540:__glusterd_handle_cli_get_volume] 0-management: Received get vol req [2018-07-25 14:49:39.207428] I [MSGID: 106494] [glusterd-handler.c:3087:__glusterd_handle_cli_profile_volume] 0-management: Received volume profile req for volume gluster_shared_storage [2018-07-25 14:49:39.245229] E [MSGID: 106524] [glusterd-op-sm.c:1987:glusterd_op_stage_stats_volume] 0-glusterd: Profile on Volume gluster_shared_storage is not started [2018-07-25 14:49:39.245269] E [MSGID: 106301] [glusterd-op-sm.c:4553:glusterd_op_ac_send_stage_op] 0-management: Staging of operation 'Volume Profile' failed on localhost : Profile on Volume gluster_shared_storage is not started [2018-07-25 14:49:39.369411] I [MSGID: 106494] [glusterd-handler.c:3087:__glusterd_handle_cli_profile_volume] 0-management: Received volume profile req for volume supervol01 [2018-07-25 14:49:39.404238] E [MSGID: 106275] [glusterd-rpc-ops.c:878:glusterd_mgmt_v3_lock_peers_cbk_fn] 0-management: Received mgmt_v3 lock RJT from uuid: d5c6e41c-01ec-4879-8b78-f8a8d2573d7c [2018-07-25 14:49:39.404692] E [MSGID: 106275] [glusterd-rpc-ops.c:878:glusterd_mgmt_v3_lock_peers_cbk_fn] 0-management: Received mgmt_v3 lock RJT from uuid: 7e853480-f053-46f2-9534-1f59a330f4d4 [2018-07-25 14:49:39.536180] I [MSGID: 106494] [glusterd-handler.c:3087:__glusterd_handle_cli_profile_volume] 0-management: Received volume profile req for volume supervole2a03 [2018-07-25 14:50:39.243598] E [MSGID: 106524] [glusterd-op-sm.c:1987:glusterd_op_stage_stats_volume] 0-glusterd: Profile on Volume gluster_shared_storage is not started [2018-07-25 14:50:39.243603] E [MSGID: 106301] [glusterd-op-sm.c:4553:glusterd_op_ac_send_stage_op] 0-management: Staging of operation 'Volume Profile' failed on localhost : Profile on Volume gluster_shared_storage is not started [2018-07-25 14:50:39.372516] I [MSGID: 106494] [glusterd-handler.c:3087:__glusterd_handle_cli_profile_volume] 0-management: Received volume profile req for volume supervol01
I am ok with deferring if we have not been able to reproduce the issue. Would John me able to reproduce ?
Based on the comment that we have not been able to reproduce this after trying the same procedure as John, I am ok defer it.
Require QE assessment here.
I have now hit this in RHGS 3.4.0.
(In reply to John Strunk from comment #23) > I have now hit this in RHGS 3.4.0. John - in the email exchange you highlighted the core wasn't available. Just trying to understand if you have observed any patterns to get to a conclusion we hit the same backtrace.
I was able to fix the config problem and get cores after my initial e-mail. Backtrace: Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'. Program terminated with signal 11, Segmentation fault. #0 0x00007f185e867128 in memcpy (__len=18446744072547383994, __src=0x7f18400760d0, __dest=0x7f18941ea6b0) at /usr/include/bits/string3.h:51 51 /usr/include/bits/string3.h: No such file or directory. Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.170-4.el7.x86_64 elfutils-libs-0.170-4.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-19.el7.x86_64 libattr-2.4.46-13.el7.x86_64 libcap-2.22-9.el7.x86_64 libcom_err-1.42.9-12.el7_5.x86_64 libselinux-2.5-12.el7.x86_64 libsepol-2.5-8.1.el7.x86_64 pcre-8.32-17.el7.x86_64 systemd-libs-219-57.el7_5.1.x86_64 xz-libs-5.2.2-1.el7.x86_64 (gdb) bt #0 0x00007f185e867128 in memcpy (__len=18446744072547383994, __src=0x7f18400760d0, __dest=0x7f18941ea6b0) at /usr/include/bits/string3.h:51 #1 data_to_int32_ptr (data=0x7f1840076020, val=val@entry=0x7f184ed9621c) at dict.c:1555 #2 0x00007f185e86a78d in dict_get_int32 (this=this@entry=0x7f181001f160, key=key@entry=0x7f1853412f5b "count", val=val@entry=0x7f184ed9621c) at dict.c:1813 #3 0x00007f1853335191 in glusterd_profile_volume_use_rsp_dict (aggr=0x7f181001f160, rsp_dict=0x7f184802fff0) at glusterd-utils.c:9950 #4 0x00007f185334a4ee in __glusterd_commit_op_cbk (req=req@entry=0x7f18380b3bf0, iov=iov@entry=0x7f18380b3c30, count=count@entry=1, myframe=myframe@entry=0x7f183809d5d0) at glusterd-rpc-ops.c:1436 #5 0x00007f185334c0ea in glusterd_big_locked_cbk (req=0x7f18380b3bf0, iov=0x7f18380b3c30, count=1, myframe=0x7f183809d5d0, fn=0x7f185334a070 <__glusterd_commit_op_cbk>) at glusterd-rpc-ops.c:223 #6 0x00007f185e638960 in rpc_clnt_handle_reply (clnt=clnt@entry=0x561848c16c50, pollin=pollin@entry=0x7f184801a820) at rpc-clnt.c:778 #7 0x00007f185e638d03 in rpc_clnt_notify (trans=<optimized out>, mydata=0x561848c16c80, event=<optimized out>, data=0x7f184801a820) at rpc-clnt.c:971 #8 0x00007f185e634a73 in rpc_transport_notify (this=this@entry=0x561848c17990, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f184801a820) at rpc-transport.c:538 #9 0x00007f1850542566 in socket_event_poll_in (this=this@entry=0x561848c17990, notify_handled=notify_handled@entry=_gf_false) at socket.c:2315 #10 0x00007f1850545242 in socket_poller (ctx=0x561848c17990) at socket.c:2590 #11 0x00007f185d6cfdd5 in start_thread (arg=0x7f184ed97700) at pthread_create.c:308 #12 0x00007f185cf98b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 This appears to be caused by an interaction w/ other processes on the machine. Last night, glusterd was repeatedly hitting this, crashing every 5-10 min. I was able to stabilize glusterd by rebooting the server. # rpm -qa | grep gluster glusterfs-fuse-3.12.2-18.el7rhgs.x86_64 gluster-nagios-common-0.2.4-1.el7rhgs.noarch glusterfs-client-xlators-3.12.2-18.el7rhgs.x86_64 glusterfs-rdma-3.12.2-18.el7rhgs.x86_64 python2-gluster-3.12.2-18.el7rhgs.x86_64 glusterfs-cli-3.12.2-18.el7rhgs.x86_64 glusterfs-api-3.12.2-18.el7rhgs.x86_64 glusterfs-server-3.12.2-18.el7rhgs.x86_64 libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.7.x86_64 pcp-pmda-gluster-4.1.0-0.201805281909.git68ab4b18.el7.x86_64 gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64 glusterfs-3.12.2-18.el7rhgs.x86_64 vdsm-gluster-4.19.43-2.3.el7rhgs.noarch glusterfs-libs-3.12.2-18.el7rhgs.x86_64 glusterfs-geo-replication-3.12.2-18.el7rhgs.x86_64 glusterfs-debuginfo-3.12.2-18.el7rhgs.x86_64
upstream patch: https://review.gluster.org/#/c/glusterfs/+/21736/
@Bala Here, we need to ensure that, while doing in-service upgrade, profile commands are working as expected. @Nithya, please add if you want anything else to be tested.
(In reply to Sanju from comment #55) > @Bala Here, we need to ensure that, while doing in-service upgrade, profile > commands are working as expected. > > @Nithya, please add if you want anything else to be tested. There is nothing else that I can think of.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0263