Description of problem: Glusterd process is periodically crashing with a segmentation fault. This happens occasionally on some of our nodes. I've been unable to determine a reason. Dec 18 18:13:53 ch1c7ocvgl01 systemd: glusterd.service: main process exited, code=killed, status=11/SEGV Dec 18 19:02:49 ch1c7ocvgl01 systemd: glusterd.service: main process exited, code=killed, status=11/SEGV Dec 19 18:24:15 ch1c7ocvgl01 systemd: glusterd.service: main process exited, code=killed, status=11/SEGV Dec 21 05:45:39 ch1c7ocvgl01 systemd: glusterd.service: main process exited, code=killed, status=11/SEGV Version-Release number of selected component (if applicable): [root@ch1c7ocvgl01 ~]# cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core) [root@ch1c7ocvgl01 /]# rpm -qa | grep gluster glusterfs-libs-6.1-1.el7.x86_64 glusterfs-server-6.1-1.el7.x86_64 tendrl-gluster-integration-1.6.3-10.el7.noarch centos-release-gluster6-1.0-1.el7.centos.noarch python2-gluster-6.1-1.el7.x86_64 centos-release-gluster5-1.0-1.el7.centos.noarch glusterfs-api-6.1-1.el7.x86_64 nfs-ganesha-gluster-2.8.2-1.el7.x86_64 glusterfs-client-xlators-6.1-1.el7.x86_64 glusterfs-cli-6.1-1.el7.x86_64 glusterfs-6.1-1.el7.x86_64 glusterfs-fuse-6.1-1.el7.x86_64 glusterfs-events-6.1-1.el7.x86_64 How reproducible: Unable to reproduce at this time. Issue occurs periodically with an indeterminate cause. Steps to Reproduce: N/A Actual results: N/A Expected results: glusterd should not crash with a segmentation fault. Additional info: Several core dumps are located here. Too large to attach. https://nextcloud.anthonywingerter.net/index.php/s/3n5sSE3SNxfyeyj Please let me know what further info I can provide. [root@ch1c7ocvgl01 ~]# gluster volume info Volume Name: autosfx-prd Type: Distributed-Replicate Volume ID: 25e6b3a9-f339-4439-b41e-6084c7527320 Status: Started Snapshot Count: 0 Number of Bricks: 3 x (2 + 1) = 9 Transport-type: tcp Bricks: Brick1: ch1c7ocvgl01:/covisint/gluster/autosfx/brick01 Brick2: ch1c7ocvgl02:/covisint/gluster/autosfx/brick02 Brick3: ch1c7ocvga11:/covisint/gluster/autosfx/brick03 (arbiter) Brick4: ch1c7ocvgl03:/covisint/gluster/autosfx/brick04 Brick5: ch1c7ocvgl04:/covisint/gluster/autosfx/brick05 Brick6: ch1c7ocvga11:/covisint/gluster/autosfx/brick06 (arbiter) Brick7: ch1c7ocvgl05:/covisint/gluster/autosfx/brick07 Brick8: ch1c7ocvgl06:/covisint/gluster/autosfx/brick08 Brick9: ch1c7ocvga11:/covisint/gluster/autosfx/brick09 (arbiter) Options Reconfigured: nfs.disable: on performance.client-io-threads: off transport.address-family: inet cluster.lookup-optimize: on performance.stat-prefetch: on server.event-threads: 16 client.event-threads: 16 performance.cache-invalidation: on performance.read-ahead: on storage.fips-mode-rchecksum: on performance.cache-size: 6GB features.ctime: on cluster.self-heal-daemon: enable diagnostics.latency-measurement: on diagnostics.count-fop-hits: on diagnostics.brick-log-level: ERROR diagnostics.client-log-level: ERROR cluster.data-self-heal-algorithm: full cluster.background-self-heal-count: 256 cluster.rebalance-stats: on cluster.readdir-optimize: on cluster.metadata-self-heal: on cluster.data-self-heal: on cluster.heal-timeout: 500 cluster.quorum-type: auto cluster.self-heal-window-size: 2 cluster.self-heal-readdir-size: 2KB network.ping-timeout: 15 cluster.eager-lock: on performance.io-thread-count: 16 cluster.shd-max-threads: 64 cluster.shd-wait-qlength: 4096 performance.write-behind-window-size: 8MB cluster.enable-shared-storage: enable Volume Name: gluster_shared_storage Type: Replicate Volume ID: 50e7c3e8-adb9-427f-ae56-c327829a7d34 Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: ch1c7ocvgl02.covisint.net:/var/lib/glusterd/ss_brick Brick2: ch1c7ocvgl03.covisint.net:/var/lib/glusterd/ss_brick Brick3: ch1c7ocvgl01.covisint.net:/var/lib/glusterd/ss_brick Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet diagnostics.latency-measurement: on diagnostics.count-fop-hits: on cluster.enable-shared-storage: enable Volume Name: hc-pstore-prd Type: Distributed-Replicate Volume ID: 1947247c-b3e0-4bd9-b808-011273e45195 Status: Started Snapshot Count: 0 Number of Bricks: 3 x (2 + 1) = 9 Transport-type: tcp Bricks: Brick1: ch1c7ocvgl01:/covisint/gluster/hc-pstore-prd/brick01 Brick2: ch1c7ocvgl02:/covisint/gluster/hc-pstore-prd/brick02 Brick3: ch1c7ocvga11:/covisint/gluster/hc-pstore-prd/brick03 (arbiter) Brick4: ch1c7ocvgl03:/covisint/gluster/hc-pstore-prd/brick04 Brick5: ch1c7ocvgl04:/covisint/gluster/hc-pstore-prd/brick05 Brick6: ch1c7ocvga11:/covisint/gluster/hc-pstore-prd/brick06 (arbiter) Brick7: ch1c7ocvgl05:/covisint/gluster/hc-pstore-prd/brick07 Brick8: ch1c7ocvgl06:/covisint/gluster/hc-pstore-prd/brick08 Brick9: ch1c7ocvga11:/covisint/gluster/hc-pstore-prd/brick09 (arbiter) Options Reconfigured: auth.allow: exlap1354.covisint.net,exlap1355.covisint.net performance.write-behind-window-size: 8MB cluster.shd-wait-qlength: 4096 cluster.shd-max-threads: 64 performance.io-thread-count: 16 cluster.eager-lock: on network.ping-timeout: 15 cluster.self-heal-readdir-size: 2KB cluster.self-heal-window-size: 2 cluster.quorum-type: auto cluster.heal-timeout: 500 cluster.data-self-heal: on cluster.metadata-self-heal: on cluster.readdir-optimize: on cluster.rebalance-stats: on cluster.background-self-heal-count: 256 cluster.data-self-heal-algorithm: full diagnostics.client-log-level: ERROR diagnostics.brick-log-level: ERROR diagnostics.count-fop-hits: on diagnostics.latency-measurement: on cluster.self-heal-daemon: enable features.ctime: on performance.cache-size: 2GB storage.fips-mode-rchecksum: on performance.read-ahead: on performance.cache-invalidation: on client.event-threads: 8 server.event-threads: 8 performance.stat-prefetch: on cluster.lookup-optimize: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable Volume Name: plink-prd Type: Distributed-Replicate Volume ID: f146a391-c92e-4965-9026-09f16d2d1c53 Status: Started Snapshot Count: 0 Number of Bricks: 3 x (2 + 1) = 9 Transport-type: tcp Bricks: Brick1: ch1c7ocvgl01:/covisint/gluster/plink/brick01 Brick2: ch1c7ocvgl02:/covisint/gluster/plink/brick02 Brick3: ch1c7ocvga11:/covisint/gluster/plink/brick03 (arbiter) Brick4: ch1c7ocvgl03:/covisint/gluster/plink/brick04 Brick5: ch1c7ocvgl04:/covisint/gluster/plink/brick05 Brick6: ch1c7ocvga11:/covisint/gluster/plink/brick06 (arbiter) Brick7: ch1c7ocvgl05:/covisint/gluster/plink/brick07 Brick8: ch1c7ocvgl06:/covisint/gluster/plink/brick08 Brick9: ch1c7ocvga11:/covisint/gluster/plink/brick09 (arbiter) Options Reconfigured: nfs.disable: on performance.client-io-threads: off transport.address-family: inet cluster.lookup-optimize: on performance.stat-prefetch: on server.event-threads: 16 client.event-threads: 16 performance.cache-invalidation: on performance.read-ahead: on storage.fips-mode-rchecksum: on performance.cache-size: 3800MB features.ctime: on cluster.self-heal-daemon: enable diagnostics.latency-measurement: on diagnostics.count-fop-hits: on diagnostics.brick-log-level: ERROR diagnostics.client-log-level: ERROR cluster.data-self-heal-algorithm: full cluster.background-self-heal-count: 256 cluster.rebalance-stats: on cluster.readdir-optimize: on cluster.metadata-self-heal: on cluster.data-self-heal: on cluster.heal-timeout: 500 cluster.quorum-type: auto cluster.self-heal-window-size: 2 cluster.self-heal-readdir-size: 2KB network.ping-timeout: 15 cluster.eager-lock: on performance.io-thread-count: 16 cluster.shd-max-threads: 64 cluster.shd-wait-qlength: 4096 performance.write-behind-window-size: 8MB cluster.enable-shared-storage: enable Volume Name: pstore-prd Type: Distributed-Replicate Volume ID: d77c45ef-19ca-4add-9dac-1bc401244395 Status: Started Snapshot Count: 0 Number of Bricks: 3 x (2 + 1) = 9 Transport-type: tcp Bricks: Brick1: ch1c7ocvgl01:/covisint/gluster/pstore-prd/brick01 Brick2: ch1c7ocvgl02:/covisint/gluster/pstore-prd/brick02 Brick3: ch1c7ocvga11:/covisint/gluster/pstore-prd/brick03 (arbiter) Brick4: ch1c7ocvgl03:/covisint/gluster/pstore-prd/brick04 Brick5: ch1c7ocvgl04:/covisint/gluster/pstore-prd/brick05 Brick6: ch1c7ocvga11:/covisint/gluster/pstore-prd/brick06 (arbiter) Brick7: ch1c7ocvgl05:/covisint/gluster/pstore-prd/brick07 Brick8: ch1c7ocvgl06:/covisint/gluster/pstore-prd/brick08 Brick9: ch1c7ocvga11:/covisint/gluster/pstore-prd/brick09 (arbiter) Options Reconfigured: cluster.min-free-disk: 1GB performance.client-io-threads: off nfs.disable: on transport.address-family: inet cluster.lookup-optimize: on performance.stat-prefetch: on server.event-threads: 16 client.event-threads: 16 performance.cache-invalidation: on performance.read-ahead: on storage.fips-mode-rchecksum: on performance.cache-size: 6GB features.ctime: on cluster.self-heal-daemon: enable diagnostics.latency-measurement: on diagnostics.count-fop-hits: on diagnostics.brick-log-level: ERROR diagnostics.client-log-level: ERROR cluster.data-self-heal-algorithm: full cluster.background-self-heal-count: 256 cluster.rebalance-stats: on cluster.readdir-optimize: on cluster.metadata-self-heal: on cluster.data-self-heal: on cluster.heal-timeout: 500 cluster.quorum-type: auto cluster.self-heal-window-size: 2 cluster.self-heal-readdir-size: 2KB network.ping-timeout: 15 cluster.eager-lock: on performance.io-thread-count: 16 cluster.shd-max-threads: 64 cluster.shd-wait-qlength: 4096 performance.write-behind-window-size: 8MB auth.allow: exlap779.covisint.net,exlap780.covisint.net cluster.enable-shared-storage: enable Volume Name: rvsshare-prd Type: Distributed-Replicate Volume ID: bee2d0f7-9215-4be8-9fc6-302fd568d5ed Status: Started Snapshot Count: 0 Number of Bricks: 3 x (2 + 1) = 9 Transport-type: tcp Bricks: Brick1: ch1c7ocvgl01:/covisint/gluster/rvsshare-prd/brick01 Brick2: ch1c7ocvgl02:/covisint/gluster/rvsshare-prd/brick02 Brick3: ch1c7ocvga11:/covisint/gluster/rvsshare-prd/brick03 (arbiter) Brick4: ch1c7ocvgl03:/covisint/gluster/rvsshare-prd/brick04 Brick5: ch1c7ocvgl04:/covisint/gluster/rvsshare-prd/brick05 Brick6: ch1c7ocvga11:/covisint/gluster/rvsshare-prd/brick06 (arbiter) Brick7: ch1c7ocvgl05:/covisint/gluster/rvsshare-prd/brick07 Brick8: ch1c7ocvgl06:/covisint/gluster/rvsshare-prd/brick08 Brick9: ch1c7ocvga11:/covisint/gluster/rvsshare-prd/brick09 (arbiter) Options Reconfigured: performance.client-io-threads: off nfs.disable: on transport.address-family: inet cluster.lookup-optimize: on performance.stat-prefetch: on server.event-threads: 16 client.event-threads: 16 performance.cache-invalidation: on performance.read-ahead: on storage.fips-mode-rchecksum: on performance.cache-size: 6GB features.ctime: off cluster.self-heal-daemon: enable diagnostics.latency-measurement: on diagnostics.count-fop-hits: on diagnostics.brick-log-level: ERROR diagnostics.client-log-level: ERROR cluster.data-self-heal-algorithm: full cluster.background-self-heal-count: 256 cluster.rebalance-stats: on cluster.readdir-optimize: on cluster.metadata-self-heal: on cluster.data-self-heal: on cluster.heal-timeout: 500 cluster.quorum-type: auto cluster.self-heal-window-size: 2 cluster.self-heal-readdir-size: 2KB network.ping-timeout: 15 cluster.eager-lock: on performance.io-thread-count: 16 cluster.shd-max-threads: 64 cluster.shd-wait-qlength: 4096 performance.write-behind-window-size: 8MB auth.allow: exlap825.covisint.net,exlap826.covisint.net cluster.enable-shared-storage: enable Volume Name: test Type: Distributed-Replicate Volume ID: 07c36821-382d-45bd-9f17-e7e48811d2a2 Status: Started Snapshot Count: 0 Number of Bricks: 3 x (2 + 1) = 9 Transport-type: tcp Bricks: Brick1: ch1c7ocvgl01:/covisint/gluster/test/brick01 Brick2: ch1c7ocvgl02:/covisint/gluster/test/brick02 Brick3: ch1c7ocvga11:/covisint/gluster/test/brick03 (arbiter) Brick4: ch1c7ocvgl03:/covisint/gluster/test/brick04 Brick5: ch1c7ocvgl04:/covisint/gluster/test/brick05 Brick6: ch1c7ocvga11:/covisint/gluster/test/brick06 (arbiter) Brick7: ch1c7ocvgl05:/covisint/gluster/test/brick07 Brick8: ch1c7ocvgl06:/covisint/gluster/test/brick08 Brick9: ch1c7ocvga11:/covisint/gluster/test/brick09 (arbiter) Options Reconfigured: performance.write-behind-window-size: 8MB cluster.shd-wait-qlength: 4096 cluster.shd-max-threads: 64 performance.io-thread-count: 16 cluster.eager-lock: on network.ping-timeout: 15 cluster.self-heal-readdir-size: 2KB cluster.self-heal-window-size: 2 cluster.quorum-type: auto cluster.heal-timeout: 500 cluster.data-self-heal: on cluster.metadata-self-heal: on cluster.readdir-optimize: on cluster.rebalance-stats: on cluster.background-self-heal-count: 256 cluster.data-self-heal-algorithm: full diagnostics.client-log-level: ERROR diagnostics.brick-log-level: ERROR diagnostics.count-fop-hits: on diagnostics.latency-measurement: on cluster.self-heal-daemon: enable performance.cache-size: 2GB storage.fips-mode-rchecksum: on performance.read-ahead: on performance.cache-invalidation: on client.event-threads: 16 server.event-threads: 16 performance.stat-prefetch: on cluster.lookup-optimize: on transport.address-family: inet nfs.disable: on performance.client-io-threads: off cluster.enable-shared-storage: enable [root@ch1c7ocvgl01 ~]# gluster volume status Status of volume: autosfx-prd Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ch1c7ocvgl01:/covisint/gluster/autosf x/brick01 49152 0 Y 8316 Brick ch1c7ocvgl02:/covisint/gluster/autosf x/brick02 49152 0 Y 8310 Brick ch1c7ocvga11:/covisint/gluster/autosf x/brick03 49152 0 Y 8688 Brick ch1c7ocvgl03:/covisint/gluster/autosf x/brick04 49152 0 Y 8388 Brick ch1c7ocvgl04:/covisint/gluster/autosf x/brick05 49152 0 Y 7705 Brick ch1c7ocvga11:/covisint/gluster/autosf x/brick06 49153 0 Y 8689 Brick ch1c7ocvgl05:/covisint/gluster/autosf x/brick07 49152 0 Y 8128 Brick ch1c7ocvgl06:/covisint/gluster/autosf x/brick08 49152 0 Y 7811 Brick ch1c7ocvga11:/covisint/gluster/autosf x/brick09 49154 0 Y 8690 Self-heal Daemon on localhost N/A N/A Y 15133 Self-heal Daemon on ch1c7ocvgl05.covisint.n et N/A N/A Y 13966 Self-heal Daemon on ch1c7ocvgl04.covisint.n et N/A N/A Y 25439 Self-heal Daemon on ch1c7ocvgl03.covisint.n et N/A N/A Y 27470 Self-heal Daemon on ch1c7ocvga11.covisint.n et N/A N/A Y 4772 Self-heal Daemon on ch1c7ocvgl02 N/A N/A Y 30524 Self-heal Daemon on ch1c7ocvgl06.covisint.n et N/A N/A Y 10152 Task Status of Volume autosfx-prd ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: gluster_shared_storage Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ch1c7ocvgl02.covisint.net:/var/lib/gl usterd/ss_brick 49153 0 Y 8319 Brick ch1c7ocvgl03.covisint.net:/var/lib/gl usterd/ss_brick 49153 0 Y 8381 Brick ch1c7ocvgl01.covisint.net:/var/lib/gl usterd/ss_brick 49153 0 Y 8332 Self-heal Daemon on localhost N/A N/A Y 15133 Self-heal Daemon on ch1c7ocvgl05.covisint.n et N/A N/A Y 13966 Self-heal Daemon on ch1c7ocvga11.covisint.n et N/A N/A Y 4772 Self-heal Daemon on ch1c7ocvgl04.covisint.n et N/A N/A Y 25439 Self-heal Daemon on ch1c7ocvgl03.covisint.n et N/A N/A Y 27470 Self-heal Daemon on ch1c7ocvgl02 N/A N/A Y 30524 Self-heal Daemon on ch1c7ocvgl06.covisint.n et N/A N/A Y 10152 Task Status of Volume gluster_shared_storage ------------------------------------------------------------------------------ There are no active volume tasks Status of volume: hc-pstore-prd Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ch1c7ocvgl01:/covisint/gluster/hc-pst ore-prd/brick01 49156 0 Y 15244 Brick ch1c7ocvgl02:/covisint/gluster/hc-pst ore-prd/brick02 49155 0 Y 30807 Brick ch1c7ocvga11:/covisint/gluster/hc-pst ore-prd/brick03 49155 0 Y 8755 Brick ch1c7ocvgl03:/covisint/gluster/hc-pst ore-prd/brick04 49156 0 Y 14874 Brick ch1c7ocvgl04:/covisint/gluster/hc-pst ore-prd/brick05 49154 0 Y 21306 Brick ch1c7ocvga11:/covisint/gluster/hc-pst ore-prd/brick06 49156 0 Y 8734 Brick ch1c7ocvgl05:/covisint/gluster/hc-pst ore-prd/brick07 49156 0 Y 7865 Brick ch1c7ocvgl06:/covisint/gluster/hc-pst ore-prd/brick08 49154 0 Y 5401 Brick ch1c7ocvga11:/covisint/gluster/hc-pst ore-prd/brick09 49157 0 Y 8744 Self-heal Daemon on localhost N/A N/A Y 15133 Self-heal Daemon on ch1c7ocvgl05.covisint.n et N/A N/A Y 13966 Self-heal Daemon on ch1c7ocvgl03.covisint.n et N/A N/A Y 27470 Self-heal Daemon on ch1c7ocvga11.covisint.n et N/A N/A Y 4772 Self-heal Daemon on ch1c7ocvgl02 N/A N/A Y 30524 Self-heal Daemon on ch1c7ocvgl04.covisint.n et N/A N/A Y 25439 Self-heal Daemon on ch1c7ocvgl06.covisint.n et N/A N/A Y 10152 Task Status of Volume hc-pstore-prd ------------------------------------------------------------------------------ There are no active volume tasks Another transaction is in progress for plink-prd. Please try again after some time. Status of volume: pstore-prd Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ch1c7ocvgl01:/covisint/gluster/pstore -prd/brick01 49155 0 Y 23221 Brick ch1c7ocvgl02:/covisint/gluster/pstore -prd/brick02 49156 0 Y 7888 Brick ch1c7ocvga11:/covisint/gluster/pstore -prd/brick03 49161 0 Y 8835 Brick ch1c7ocvgl03:/covisint/gluster/pstore -prd/brick04 49155 0 Y 18838 Brick ch1c7ocvgl04:/covisint/gluster/pstore -prd/brick05 49155 0 Y 18114 Brick ch1c7ocvga11:/covisint/gluster/pstore -prd/brick06 49162 0 Y 8848 Brick ch1c7ocvgl05:/covisint/gluster/pstore -prd/brick07 49155 0 Y 24013 Brick ch1c7ocvgl06:/covisint/gluster/pstore -prd/brick08 49155 0 Y 9192 Brick ch1c7ocvga11:/covisint/gluster/pstore -prd/brick09 49163 0 Y 8859 Self-heal Daemon on localhost N/A N/A Y 15133 Self-heal Daemon on ch1c7ocvga11.covisint.n et N/A N/A Y 4772 Self-heal Daemon on ch1c7ocvgl03.covisint.n et N/A N/A Y 27470 Self-heal Daemon on ch1c7ocvgl05.covisint.n et N/A N/A Y 13966 Self-heal Daemon on ch1c7ocvgl04.covisint.n et N/A N/A Y 25439 Self-heal Daemon on ch1c7ocvgl06.covisint.n et N/A N/A Y 10152 Self-heal Daemon on ch1c7ocvgl02 N/A N/A Y 30524 Task Status of Volume pstore-prd ------------------------------------------------------------------------------ There are no active volume tasks Another transaction is in progress for rvsshare-prd. Please try again after some time. Status of volume: test Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ch1c7ocvgl01:/covisint/gluster/test/b rick01 49158 0 Y 20468 Brick ch1c7ocvgl02:/covisint/gluster/test/b rick02 49158 0 Y 30442 Brick ch1c7ocvga11:/covisint/gluster/test/b rick03 49167 0 Y 8966 Brick ch1c7ocvgl03:/covisint/gluster/test/b rick04 49158 0 Y 27364 Brick ch1c7ocvgl04:/covisint/gluster/test/b rick05 49156 0 Y 19154 Brick ch1c7ocvga11:/covisint/gluster/test/b rick06 49168 0 Y 8980 Brick ch1c7ocvgl05:/covisint/gluster/test/b rick07 49157 0 Y 13820 Brick ch1c7ocvgl06:/covisint/gluster/test/b rick08 49157 0 Y 10030 Brick ch1c7ocvga11:/covisint/gluster/test/b rick09 49169 0 Y 9015 Self-heal Daemon on localhost N/A N/A Y 15133 Self-heal Daemon on ch1c7ocvgl03.covisint.n et N/A N/A Y 27470 Self-heal Daemon on ch1c7ocvgl05.covisint.n et N/A N/A Y 13966 Self-heal Daemon on ch1c7ocvgl04.covisint.n et N/A N/A Y 25439 Self-heal Daemon on ch1c7ocvga11.covisint.n et N/A N/A Y 4772 Self-heal Daemon on ch1c7ocvgl02 N/A N/A Y 30524 Self-heal Daemon on ch1c7ocvgl06.covisint.n et N/A N/A Y 10152 Task Status of Volume test ------------------------------------------------------------------------------ There are no active volume tasks
I have tried to look at the backtrace from the cores. Even though I installed release-6.1 I don't find any debug symbols. It looks like: Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007f442f3244a7 in ?? () [Current thread is 1 (LWP 21520)] (gdb) bt #0 0x00007f442f3244a7 in ?? () #1 0x4cce3ca800000001 in ?? () #2 0x0000000000018b1e in ?? () #3 0x00007f442f41faa8 in ?? () #4 0x00007f4400000000 in ?? () #5 0x00007f440c0174f0 in ?? () #6 0x00007f442f7b1b20 in ?? () #7 0x00007f441c4030c0 in ?? () #8 0x00007f442f7b1b90 in ?? () #9 0x00007f441c4030dc in ?? () #10 0x0000000000000007 in ?? () #11 0x0000562c75ecd4e0 in ?? () #12 0x00007f442f324db7 in ?? () #13 0x00007f4400000000 in ?? () #14 0x0000000000000000 in ?? () Can you please share output of "t a a bt" output? Thanks, Sanju
Sanju, Thank you for the response. I am very unfamiliar with using gdb and collecting backtraces from the cores. Would it be possible for you to detail the configuration / collection steps needed? Thanks and best regards, -Anthony-
Hi Anthony, 1. Take the core into gdb gdb glusterd <path to the corefile> 2. bt command gives you the backtrace of thread 1 and "t a a bt"(thread all apply backtrace) gives you backtrace of all threads. give "t a a bt" command at the gdb and collect the data. Hope that helps, Sanju
Sanju, Thank you for the response. I apologize for getting back to you so late. Here is some data from one of the cores where glusterd crashed. [root@ch1c7ocvgl04 /]# gdb glusterd /core.7525 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/sbin/glusterfsd...(no debugging symbols found)...done. (no debugging symbols found)...done. warning: core file may not match specified executable file. [New LWP 7657] [New LWP 7526] [New LWP 7529] [New LWP 7525] [New LWP 7527] [New LWP 7528] [New LWP 7531] [New LWP 7530] [New LWP 7656] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'. Program terminated with signal 11, Segmentation fault. #0 0x00007fbac6a094a7 in glusterd_op_ac_brick_op_failed () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so Missing separate debuginfos, use: debuginfo-install glusterfs-server-6.1-1.el7.x86_64 (gdb) t a a bt Thread 9 (Thread 0x7fbac3a77700 (LWP 7656)): #0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 #1 0x00007fbac6aafddb in hooks_worker () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so #2 0x00007fbad16fedd5 in start_thread (arg=0x7fbac3a77700) at pthread_create.c:307 #3 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 8 (Thread 0x7fbac7e99700 (LWP 7530)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007fbad28ff810 in syncenv_task () from /lib64/libglusterfs.so.0 #2 0x00007fbad29006c0 in syncenv_processor () from /lib64/libglusterfs.so.0 #3 0x00007fbad16fedd5 in start_thread (arg=0x7fbac7e99700) at pthread_create.c:307 #4 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 7 (Thread 0x7fbac7698700 (LWP 7531)): #0 0x00007fbad0fbcf73 in select () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007fbad293e7e4 in runner () from /lib64/libglusterfs.so.0 #2 0x00007fbad16fedd5 in start_thread (arg=0x7fbac7698700) at pthread_create.c:307 #3 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 6 (Thread 0x7fbac8e9b700 (LWP 7528)): #0 0x00007fbad0f8ce2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007fbad0f8ccc4 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137 #2 0x00007fbad28eb54d in pool_sweeper () from /lib64/libglusterfs.so.0 #3 0x00007fbad16fedd5 in start_thread (arg=0x7fbac8e9b700) at pthread_create.c:307 #4 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 5 (Thread 0x7fbac969c700 (LWP 7527)): #0 0x00007fbad1706361 in do_sigwait (sig=0x7fbac969be1c, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:60 #1 __sigwait (set=0x7fbac969be20, sig=0x7fbac969be1c) at ../sysdeps/unix/sysv/linux/sigwait.c:95 #2 0x000055b5e9cda1bb in glusterfs_sigwaiter () #3 0x00007fbad16fedd5 in start_thread (arg=0x7fbac969c700) at pthread_create.c:307 #4 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 4 (Thread 0x7fbad2dbe780 (LWP 7525)): #0 0x00007fbad16fff47 in pthread_join (threadid=140440114784000, thread_return=0x0) at pthread_join.c:90 #1 0x00007fbad2923478 in event_dispatch_epoll () from /lib64/libglusterfs.so.0 #2 0x000055b5e9cd6735 in main () Thread 3 (Thread 0x7fbac869a700 (LWP 7529)): #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 #1 0x00007fbad28ff810 in syncenv_task () from /lib64/libglusterfs.so.0 #2 0x00007fbad29006c0 in syncenv_processor () from /lib64/libglusterfs.so.0 #3 0x00007fbad16fedd5 in start_thread (arg=0x7fbac869a700) at pthread_create.c:307 #4 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 ---Type <return> to continue, or q <return> to quit--- Thread 2 (Thread 0x7fbac9e9d700 (LWP 7526)): #0 0x00007fbad1705e3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007fbad28cdf76 in gf_timer_proc () from /lib64/libglusterfs.so.0 #2 0x00007fbad16fedd5 in start_thread (arg=0x7fbac9e9d700) at pthread_create.c:307 #3 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 Thread 1 (Thread 0x7fbac3276700 (LWP 7657)): #0 0x00007fbac6a094a7 in glusterd_op_ac_brick_op_failed () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so #1 0x00007fbac6a09db7 in glusterd_op_sm () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so #2 0x00007fbac6a419dc in glusterd_mgmt_v3_lock_peers_cbk_fn () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so #3 0x00007fbac6a40faa in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so #4 0x00007fbad2669021 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0 #5 0x00007fbad2669387 in rpc_clnt_notify () from /lib64/libgfrpc.so.0 #6 0x00007fbad26659f3 in rpc_transport_notify () from /lib64/libgfrpc.so.0 #7 0x00007fbac5c0b875 in socket_event_handler () from /usr/lib64/glusterfs/6.1/rpc-transport/socket.so #8 0x00007fbad2924286 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0 #9 0x00007fbad16fedd5 in start_thread (arg=0x7fbac3276700) at pthread_create.c:307 #10 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Hi Anthony, Sorry for delayed response on this bug. Can you please install the debuginfo package related to glusterfs and then provide the back trace? Thanks, Sanju
This bug is moved to https://github.com/gluster/glusterfs/issues/1106, and will be tracked there from now on. Visit GitHub issues URL for further details
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days