1787463 – Glusterd process is periodically crashing with a segmentation fault

Bug 1787463 - Glusterd process is periodically crashing with a segmentation fault

Summary: Glusterd process is periodically crashing with a segmentation fault

Keywords:
Status:	CLOSED UPSTREAM
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	6
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	Sanju
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-01-02 23:25 UTC by Anthony Wingerter
Modified:	2023-09-14 05:49 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2020-03-17 03:22:30 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Anthony Wingerter 2020-01-02 23:25:51 UTC

Description of problem: Glusterd process is periodically crashing with a segmentation fault. This happens occasionally on some of our nodes. I've been unable to determine a reason.

Dec 18 18:13:53 ch1c7ocvgl01 systemd: glusterd.service: main process exited, code=killed, status=11/SEGV
Dec 18 19:02:49 ch1c7ocvgl01 systemd: glusterd.service: main process exited, code=killed, status=11/SEGV
Dec 19 18:24:15 ch1c7ocvgl01 systemd: glusterd.service: main process exited, code=killed, status=11/SEGV
Dec 21 05:45:39 ch1c7ocvgl01 systemd: glusterd.service: main process exited, code=killed, status=11/SEGV


Version-Release number of selected component (if applicable):

[root@ch1c7ocvgl01 ~]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)

[root@ch1c7ocvgl01 /]# rpm -qa | grep gluster
glusterfs-libs-6.1-1.el7.x86_64
glusterfs-server-6.1-1.el7.x86_64
tendrl-gluster-integration-1.6.3-10.el7.noarch
centos-release-gluster6-1.0-1.el7.centos.noarch
python2-gluster-6.1-1.el7.x86_64
centos-release-gluster5-1.0-1.el7.centos.noarch
glusterfs-api-6.1-1.el7.x86_64
nfs-ganesha-gluster-2.8.2-1.el7.x86_64
glusterfs-client-xlators-6.1-1.el7.x86_64
glusterfs-cli-6.1-1.el7.x86_64
glusterfs-6.1-1.el7.x86_64
glusterfs-fuse-6.1-1.el7.x86_64
glusterfs-events-6.1-1.el7.x86_64


How reproducible:

Unable to reproduce at this time. Issue occurs periodically with an indeterminate cause.


Steps to Reproduce:
N/A

Actual results:
N/A

Expected results:

glusterd should not crash with a segmentation fault.

Additional info:

Several core dumps are located here. Too large to attach.

https://nextcloud.anthonywingerter.net/index.php/s/3n5sSE3SNxfyeyj
 
Please let me know what further info I can provide.

[root@ch1c7ocvgl01 ~]# gluster volume info

Volume Name: autosfx-prd
Type: Distributed-Replicate
Volume ID: 25e6b3a9-f339-4439-b41e-6084c7527320
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: ch1c7ocvgl01:/covisint/gluster/autosfx/brick01
Brick2: ch1c7ocvgl02:/covisint/gluster/autosfx/brick02
Brick3: ch1c7ocvga11:/covisint/gluster/autosfx/brick03 (arbiter)
Brick4: ch1c7ocvgl03:/covisint/gluster/autosfx/brick04
Brick5: ch1c7ocvgl04:/covisint/gluster/autosfx/brick05
Brick6: ch1c7ocvga11:/covisint/gluster/autosfx/brick06 (arbiter)
Brick7: ch1c7ocvgl05:/covisint/gluster/autosfx/brick07
Brick8: ch1c7ocvgl06:/covisint/gluster/autosfx/brick08
Brick9: ch1c7ocvga11:/covisint/gluster/autosfx/brick09 (arbiter)
Options Reconfigured:
nfs.disable: on
performance.client-io-threads: off
transport.address-family: inet
cluster.lookup-optimize: on
performance.stat-prefetch: on
server.event-threads: 16
client.event-threads: 16
performance.cache-invalidation: on
performance.read-ahead: on
storage.fips-mode-rchecksum: on
performance.cache-size: 6GB
features.ctime: on
cluster.self-heal-daemon: enable
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
diagnostics.brick-log-level: ERROR
diagnostics.client-log-level: ERROR
cluster.data-self-heal-algorithm: full
cluster.background-self-heal-count: 256
cluster.rebalance-stats: on
cluster.readdir-optimize: on
cluster.metadata-self-heal: on
cluster.data-self-heal: on
cluster.heal-timeout: 500
cluster.quorum-type: auto
cluster.self-heal-window-size: 2
cluster.self-heal-readdir-size: 2KB
network.ping-timeout: 15
cluster.eager-lock: on
performance.io-thread-count: 16
cluster.shd-max-threads: 64
cluster.shd-wait-qlength: 4096
performance.write-behind-window-size: 8MB
cluster.enable-shared-storage: enable

Volume Name: gluster_shared_storage
Type: Replicate
Volume ID: 50e7c3e8-adb9-427f-ae56-c327829a7d34
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: ch1c7ocvgl02.covisint.net:/var/lib/glusterd/ss_brick
Brick2: ch1c7ocvgl03.covisint.net:/var/lib/glusterd/ss_brick
Brick3: ch1c7ocvgl01.covisint.net:/var/lib/glusterd/ss_brick
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
cluster.enable-shared-storage: enable

Volume Name: hc-pstore-prd
Type: Distributed-Replicate
Volume ID: 1947247c-b3e0-4bd9-b808-011273e45195
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: ch1c7ocvgl01:/covisint/gluster/hc-pstore-prd/brick01
Brick2: ch1c7ocvgl02:/covisint/gluster/hc-pstore-prd/brick02
Brick3: ch1c7ocvga11:/covisint/gluster/hc-pstore-prd/brick03 (arbiter)
Brick4: ch1c7ocvgl03:/covisint/gluster/hc-pstore-prd/brick04
Brick5: ch1c7ocvgl04:/covisint/gluster/hc-pstore-prd/brick05
Brick6: ch1c7ocvga11:/covisint/gluster/hc-pstore-prd/brick06 (arbiter)
Brick7: ch1c7ocvgl05:/covisint/gluster/hc-pstore-prd/brick07
Brick8: ch1c7ocvgl06:/covisint/gluster/hc-pstore-prd/brick08
Brick9: ch1c7ocvga11:/covisint/gluster/hc-pstore-prd/brick09 (arbiter)
Options Reconfigured:
auth.allow: exlap1354.covisint.net,exlap1355.covisint.net
performance.write-behind-window-size: 8MB
cluster.shd-wait-qlength: 4096
cluster.shd-max-threads: 64
performance.io-thread-count: 16
cluster.eager-lock: on
network.ping-timeout: 15
cluster.self-heal-readdir-size: 2KB
cluster.self-heal-window-size: 2
cluster.quorum-type: auto
cluster.heal-timeout: 500
cluster.data-self-heal: on
cluster.metadata-self-heal: on
cluster.readdir-optimize: on
cluster.rebalance-stats: on
cluster.background-self-heal-count: 256
cluster.data-self-heal-algorithm: full
diagnostics.client-log-level: ERROR
diagnostics.brick-log-level: ERROR
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.self-heal-daemon: enable
features.ctime: on
performance.cache-size: 2GB
storage.fips-mode-rchecksum: on
performance.read-ahead: on
performance.cache-invalidation: on
client.event-threads: 8
server.event-threads: 8
performance.stat-prefetch: on
cluster.lookup-optimize: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.enable-shared-storage: enable

Volume Name: plink-prd
Type: Distributed-Replicate
Volume ID: f146a391-c92e-4965-9026-09f16d2d1c53
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: ch1c7ocvgl01:/covisint/gluster/plink/brick01
Brick2: ch1c7ocvgl02:/covisint/gluster/plink/brick02
Brick3: ch1c7ocvga11:/covisint/gluster/plink/brick03 (arbiter)
Brick4: ch1c7ocvgl03:/covisint/gluster/plink/brick04
Brick5: ch1c7ocvgl04:/covisint/gluster/plink/brick05
Brick6: ch1c7ocvga11:/covisint/gluster/plink/brick06 (arbiter)
Brick7: ch1c7ocvgl05:/covisint/gluster/plink/brick07
Brick8: ch1c7ocvgl06:/covisint/gluster/plink/brick08
Brick9: ch1c7ocvga11:/covisint/gluster/plink/brick09 (arbiter)
Options Reconfigured:
nfs.disable: on
performance.client-io-threads: off
transport.address-family: inet
cluster.lookup-optimize: on
performance.stat-prefetch: on
server.event-threads: 16
client.event-threads: 16
performance.cache-invalidation: on
performance.read-ahead: on
storage.fips-mode-rchecksum: on
performance.cache-size: 3800MB
features.ctime: on
cluster.self-heal-daemon: enable
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
diagnostics.brick-log-level: ERROR
diagnostics.client-log-level: ERROR
cluster.data-self-heal-algorithm: full
cluster.background-self-heal-count: 256
cluster.rebalance-stats: on
cluster.readdir-optimize: on
cluster.metadata-self-heal: on
cluster.data-self-heal: on
cluster.heal-timeout: 500
cluster.quorum-type: auto
cluster.self-heal-window-size: 2
cluster.self-heal-readdir-size: 2KB
network.ping-timeout: 15
cluster.eager-lock: on
performance.io-thread-count: 16
cluster.shd-max-threads: 64
cluster.shd-wait-qlength: 4096
performance.write-behind-window-size: 8MB
cluster.enable-shared-storage: enable

Volume Name: pstore-prd
Type: Distributed-Replicate
Volume ID: d77c45ef-19ca-4add-9dac-1bc401244395
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: ch1c7ocvgl01:/covisint/gluster/pstore-prd/brick01
Brick2: ch1c7ocvgl02:/covisint/gluster/pstore-prd/brick02
Brick3: ch1c7ocvga11:/covisint/gluster/pstore-prd/brick03 (arbiter)
Brick4: ch1c7ocvgl03:/covisint/gluster/pstore-prd/brick04
Brick5: ch1c7ocvgl04:/covisint/gluster/pstore-prd/brick05
Brick6: ch1c7ocvga11:/covisint/gluster/pstore-prd/brick06 (arbiter)
Brick7: ch1c7ocvgl05:/covisint/gluster/pstore-prd/brick07
Brick8: ch1c7ocvgl06:/covisint/gluster/pstore-prd/brick08
Brick9: ch1c7ocvga11:/covisint/gluster/pstore-prd/brick09 (arbiter)
Options Reconfigured:
cluster.min-free-disk: 1GB
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
cluster.lookup-optimize: on
performance.stat-prefetch: on
server.event-threads: 16
client.event-threads: 16
performance.cache-invalidation: on
performance.read-ahead: on
storage.fips-mode-rchecksum: on
performance.cache-size: 6GB
features.ctime: on
cluster.self-heal-daemon: enable
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
diagnostics.brick-log-level: ERROR
diagnostics.client-log-level: ERROR
cluster.data-self-heal-algorithm: full
cluster.background-self-heal-count: 256
cluster.rebalance-stats: on
cluster.readdir-optimize: on
cluster.metadata-self-heal: on
cluster.data-self-heal: on
cluster.heal-timeout: 500
cluster.quorum-type: auto
cluster.self-heal-window-size: 2
cluster.self-heal-readdir-size: 2KB
network.ping-timeout: 15
cluster.eager-lock: on
performance.io-thread-count: 16
cluster.shd-max-threads: 64
cluster.shd-wait-qlength: 4096
performance.write-behind-window-size: 8MB
auth.allow: exlap779.covisint.net,exlap780.covisint.net
cluster.enable-shared-storage: enable

Volume Name: rvsshare-prd
Type: Distributed-Replicate
Volume ID: bee2d0f7-9215-4be8-9fc6-302fd568d5ed
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: ch1c7ocvgl01:/covisint/gluster/rvsshare-prd/brick01
Brick2: ch1c7ocvgl02:/covisint/gluster/rvsshare-prd/brick02
Brick3: ch1c7ocvga11:/covisint/gluster/rvsshare-prd/brick03 (arbiter)
Brick4: ch1c7ocvgl03:/covisint/gluster/rvsshare-prd/brick04
Brick5: ch1c7ocvgl04:/covisint/gluster/rvsshare-prd/brick05
Brick6: ch1c7ocvga11:/covisint/gluster/rvsshare-prd/brick06 (arbiter)
Brick7: ch1c7ocvgl05:/covisint/gluster/rvsshare-prd/brick07
Brick8: ch1c7ocvgl06:/covisint/gluster/rvsshare-prd/brick08
Brick9: ch1c7ocvga11:/covisint/gluster/rvsshare-prd/brick09 (arbiter)
Options Reconfigured:
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
cluster.lookup-optimize: on
performance.stat-prefetch: on
server.event-threads: 16
client.event-threads: 16
performance.cache-invalidation: on
performance.read-ahead: on
storage.fips-mode-rchecksum: on
performance.cache-size: 6GB
features.ctime: off
cluster.self-heal-daemon: enable
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on
diagnostics.brick-log-level: ERROR
diagnostics.client-log-level: ERROR
cluster.data-self-heal-algorithm: full
cluster.background-self-heal-count: 256
cluster.rebalance-stats: on
cluster.readdir-optimize: on
cluster.metadata-self-heal: on
cluster.data-self-heal: on
cluster.heal-timeout: 500
cluster.quorum-type: auto
cluster.self-heal-window-size: 2
cluster.self-heal-readdir-size: 2KB
network.ping-timeout: 15
cluster.eager-lock: on
performance.io-thread-count: 16
cluster.shd-max-threads: 64
cluster.shd-wait-qlength: 4096
performance.write-behind-window-size: 8MB
auth.allow: exlap825.covisint.net,exlap826.covisint.net
cluster.enable-shared-storage: enable

Volume Name: test
Type: Distributed-Replicate
Volume ID: 07c36821-382d-45bd-9f17-e7e48811d2a2
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x (2 + 1) = 9
Transport-type: tcp
Bricks:
Brick1: ch1c7ocvgl01:/covisint/gluster/test/brick01
Brick2: ch1c7ocvgl02:/covisint/gluster/test/brick02
Brick3: ch1c7ocvga11:/covisint/gluster/test/brick03 (arbiter)
Brick4: ch1c7ocvgl03:/covisint/gluster/test/brick04
Brick5: ch1c7ocvgl04:/covisint/gluster/test/brick05
Brick6: ch1c7ocvga11:/covisint/gluster/test/brick06 (arbiter)
Brick7: ch1c7ocvgl05:/covisint/gluster/test/brick07
Brick8: ch1c7ocvgl06:/covisint/gluster/test/brick08
Brick9: ch1c7ocvga11:/covisint/gluster/test/brick09 (arbiter)
Options Reconfigured:
performance.write-behind-window-size: 8MB
cluster.shd-wait-qlength: 4096
cluster.shd-max-threads: 64
performance.io-thread-count: 16
cluster.eager-lock: on
network.ping-timeout: 15
cluster.self-heal-readdir-size: 2KB
cluster.self-heal-window-size: 2
cluster.quorum-type: auto
cluster.heal-timeout: 500
cluster.data-self-heal: on
cluster.metadata-self-heal: on
cluster.readdir-optimize: on
cluster.rebalance-stats: on
cluster.background-self-heal-count: 256
cluster.data-self-heal-algorithm: full
diagnostics.client-log-level: ERROR
diagnostics.brick-log-level: ERROR
diagnostics.count-fop-hits: on
diagnostics.latency-measurement: on
cluster.self-heal-daemon: enable
performance.cache-size: 2GB
storage.fips-mode-rchecksum: on
performance.read-ahead: on
performance.cache-invalidation: on
client.event-threads: 16
server.event-threads: 16
performance.stat-prefetch: on
cluster.lookup-optimize: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
cluster.enable-shared-storage: enable

[root@ch1c7ocvgl01 ~]# gluster volume status
Status of volume: autosfx-prd
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ch1c7ocvgl01:/covisint/gluster/autosf
x/brick01                                   49152     0          Y       8316
Brick ch1c7ocvgl02:/covisint/gluster/autosf
x/brick02                                   49152     0          Y       8310
Brick ch1c7ocvga11:/covisint/gluster/autosf
x/brick03                                   49152     0          Y       8688
Brick ch1c7ocvgl03:/covisint/gluster/autosf
x/brick04                                   49152     0          Y       8388
Brick ch1c7ocvgl04:/covisint/gluster/autosf
x/brick05                                   49152     0          Y       7705
Brick ch1c7ocvga11:/covisint/gluster/autosf
x/brick06                                   49153     0          Y       8689
Brick ch1c7ocvgl05:/covisint/gluster/autosf
x/brick07                                   49152     0          Y       8128
Brick ch1c7ocvgl06:/covisint/gluster/autosf
x/brick08                                   49152     0          Y       7811
Brick ch1c7ocvga11:/covisint/gluster/autosf
x/brick09                                   49154     0          Y       8690
Self-heal Daemon on localhost               N/A       N/A        Y       15133
Self-heal Daemon on ch1c7ocvgl05.covisint.n
et                                          N/A       N/A        Y       13966
Self-heal Daemon on ch1c7ocvgl04.covisint.n
et                                          N/A       N/A        Y       25439
Self-heal Daemon on ch1c7ocvgl03.covisint.n
et                                          N/A       N/A        Y       27470
Self-heal Daemon on ch1c7ocvga11.covisint.n
et                                          N/A       N/A        Y       4772
Self-heal Daemon on ch1c7ocvgl02            N/A       N/A        Y       30524
Self-heal Daemon on ch1c7ocvgl06.covisint.n
et                                          N/A       N/A        Y       10152

Task Status of Volume autosfx-prd
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: gluster_shared_storage
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ch1c7ocvgl02.covisint.net:/var/lib/gl
usterd/ss_brick                             49153     0          Y       8319
Brick ch1c7ocvgl03.covisint.net:/var/lib/gl
usterd/ss_brick                             49153     0          Y       8381
Brick ch1c7ocvgl01.covisint.net:/var/lib/gl
usterd/ss_brick                             49153     0          Y       8332
Self-heal Daemon on localhost               N/A       N/A        Y       15133
Self-heal Daemon on ch1c7ocvgl05.covisint.n
et                                          N/A       N/A        Y       13966
Self-heal Daemon on ch1c7ocvga11.covisint.n
et                                          N/A       N/A        Y       4772
Self-heal Daemon on ch1c7ocvgl04.covisint.n
et                                          N/A       N/A        Y       25439
Self-heal Daemon on ch1c7ocvgl03.covisint.n
et                                          N/A       N/A        Y       27470
Self-heal Daemon on ch1c7ocvgl02            N/A       N/A        Y       30524
Self-heal Daemon on ch1c7ocvgl06.covisint.n
et                                          N/A       N/A        Y       10152

Task Status of Volume gluster_shared_storage
------------------------------------------------------------------------------
There are no active volume tasks

Status of volume: hc-pstore-prd
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ch1c7ocvgl01:/covisint/gluster/hc-pst
ore-prd/brick01                             49156     0          Y       15244
Brick ch1c7ocvgl02:/covisint/gluster/hc-pst
ore-prd/brick02                             49155     0          Y       30807
Brick ch1c7ocvga11:/covisint/gluster/hc-pst
ore-prd/brick03                             49155     0          Y       8755
Brick ch1c7ocvgl03:/covisint/gluster/hc-pst
ore-prd/brick04                             49156     0          Y       14874
Brick ch1c7ocvgl04:/covisint/gluster/hc-pst
ore-prd/brick05                             49154     0          Y       21306
Brick ch1c7ocvga11:/covisint/gluster/hc-pst
ore-prd/brick06                             49156     0          Y       8734
Brick ch1c7ocvgl05:/covisint/gluster/hc-pst
ore-prd/brick07                             49156     0          Y       7865
Brick ch1c7ocvgl06:/covisint/gluster/hc-pst
ore-prd/brick08                             49154     0          Y       5401
Brick ch1c7ocvga11:/covisint/gluster/hc-pst
ore-prd/brick09                             49157     0          Y       8744
Self-heal Daemon on localhost               N/A       N/A        Y       15133
Self-heal Daemon on ch1c7ocvgl05.covisint.n
et                                          N/A       N/A        Y       13966
Self-heal Daemon on ch1c7ocvgl03.covisint.n
et                                          N/A       N/A        Y       27470
Self-heal Daemon on ch1c7ocvga11.covisint.n
et                                          N/A       N/A        Y       4772
Self-heal Daemon on ch1c7ocvgl02            N/A       N/A        Y       30524
Self-heal Daemon on ch1c7ocvgl04.covisint.n
et                                          N/A       N/A        Y       25439
Self-heal Daemon on ch1c7ocvgl06.covisint.n
et                                          N/A       N/A        Y       10152

Task Status of Volume hc-pstore-prd
------------------------------------------------------------------------------
There are no active volume tasks

Another transaction is in progress for plink-prd. Please try again after some time.

Status of volume: pstore-prd
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ch1c7ocvgl01:/covisint/gluster/pstore
-prd/brick01                                49155     0          Y       23221
Brick ch1c7ocvgl02:/covisint/gluster/pstore
-prd/brick02                                49156     0          Y       7888
Brick ch1c7ocvga11:/covisint/gluster/pstore
-prd/brick03                                49161     0          Y       8835
Brick ch1c7ocvgl03:/covisint/gluster/pstore
-prd/brick04                                49155     0          Y       18838
Brick ch1c7ocvgl04:/covisint/gluster/pstore
-prd/brick05                                49155     0          Y       18114
Brick ch1c7ocvga11:/covisint/gluster/pstore
-prd/brick06                                49162     0          Y       8848
Brick ch1c7ocvgl05:/covisint/gluster/pstore
-prd/brick07                                49155     0          Y       24013
Brick ch1c7ocvgl06:/covisint/gluster/pstore
-prd/brick08                                49155     0          Y       9192
Brick ch1c7ocvga11:/covisint/gluster/pstore
-prd/brick09                                49163     0          Y       8859
Self-heal Daemon on localhost               N/A       N/A        Y       15133
Self-heal Daemon on ch1c7ocvga11.covisint.n
et                                          N/A       N/A        Y       4772
Self-heal Daemon on ch1c7ocvgl03.covisint.n
et                                          N/A       N/A        Y       27470
Self-heal Daemon on ch1c7ocvgl05.covisint.n
et                                          N/A       N/A        Y       13966
Self-heal Daemon on ch1c7ocvgl04.covisint.n
et                                          N/A       N/A        Y       25439
Self-heal Daemon on ch1c7ocvgl06.covisint.n
et                                          N/A       N/A        Y       10152
Self-heal Daemon on ch1c7ocvgl02            N/A       N/A        Y       30524

Task Status of Volume pstore-prd
------------------------------------------------------------------------------
There are no active volume tasks

Another transaction is in progress for rvsshare-prd. Please try again after some time.

Status of volume: test
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick ch1c7ocvgl01:/covisint/gluster/test/b
rick01                                      49158     0          Y       20468
Brick ch1c7ocvgl02:/covisint/gluster/test/b
rick02                                      49158     0          Y       30442
Brick ch1c7ocvga11:/covisint/gluster/test/b
rick03                                      49167     0          Y       8966
Brick ch1c7ocvgl03:/covisint/gluster/test/b
rick04                                      49158     0          Y       27364
Brick ch1c7ocvgl04:/covisint/gluster/test/b
rick05                                      49156     0          Y       19154
Brick ch1c7ocvga11:/covisint/gluster/test/b
rick06                                      49168     0          Y       8980
Brick ch1c7ocvgl05:/covisint/gluster/test/b
rick07                                      49157     0          Y       13820
Brick ch1c7ocvgl06:/covisint/gluster/test/b
rick08                                      49157     0          Y       10030
Brick ch1c7ocvga11:/covisint/gluster/test/b
rick09                                      49169     0          Y       9015
Self-heal Daemon on localhost               N/A       N/A        Y       15133
Self-heal Daemon on ch1c7ocvgl03.covisint.n
et                                          N/A       N/A        Y       27470
Self-heal Daemon on ch1c7ocvgl05.covisint.n
et                                          N/A       N/A        Y       13966
Self-heal Daemon on ch1c7ocvgl04.covisint.n
et                                          N/A       N/A        Y       25439
Self-heal Daemon on ch1c7ocvga11.covisint.n
et                                          N/A       N/A        Y       4772
Self-heal Daemon on ch1c7ocvgl02            N/A       N/A        Y       30524
Self-heal Daemon on ch1c7ocvgl06.covisint.n
et                                          N/A       N/A        Y       10152

Task Status of Volume test
------------------------------------------------------------------------------
There are no active volume tasks

Comment 1 Sanju 2020-01-06 11:07:47 UTC

I have tried to look at the backtrace from the cores. Even though I installed release-6.1 I don't find any debug symbols.

It looks like:
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f442f3244a7 in ?? ()
[Current thread is 1 (LWP 21520)]
(gdb) bt
#0  0x00007f442f3244a7 in ?? ()
#1  0x4cce3ca800000001 in ?? ()
#2  0x0000000000018b1e in ?? ()
#3  0x00007f442f41faa8 in ?? ()
#4  0x00007f4400000000 in ?? ()
#5  0x00007f440c0174f0 in ?? ()
#6  0x00007f442f7b1b20 in ?? ()
#7  0x00007f441c4030c0 in ?? ()
#8  0x00007f442f7b1b90 in ?? ()
#9  0x00007f441c4030dc in ?? ()
#10 0x0000000000000007 in ?? ()
#11 0x0000562c75ecd4e0 in ?? ()
#12 0x00007f442f324db7 in ?? ()
#13 0x00007f4400000000 in ?? ()
#14 0x0000000000000000 in ?? ()


Can you please share output of "t a a bt" output?

Thanks,
Sanju

Comment 2 Anthony Wingerter 2020-01-09 15:53:47 UTC

Sanju,

Thank you for the response.

I am very unfamiliar with using gdb and collecting backtraces from the cores.

Would it be possible for you to detail the configuration / collection steps needed?

Thanks and best regards,
-Anthony-

Comment 3 Sanju 2020-01-10 05:37:18 UTC

Hi Anthony,

1. Take the core into gdb
   gdb glusterd <path to the corefile>
2. bt command gives you the backtrace of thread 1 and "t a a bt"(thread all apply backtrace) gives you backtrace of all threads. give "t a a bt" command at the gdb and collect the data.

Hope that helps,
Sanju

Comment 4 Anthony Wingerter 2020-01-20 15:49:46 UTC

Sanju,

Thank you for the response.
I apologize for getting back to you so late.

Here is some data from one of the cores where glusterd crashed. 

[root@ch1c7ocvgl04 /]# gdb glusterd /core.7525
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/sbin/glusterfsd...(no debugging symbols found)...done.
(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 7657]
[New LWP 7526]
[New LWP 7529]
[New LWP 7525]
[New LWP 7527]
[New LWP 7528]
[New LWP 7531]
[New LWP 7530]
[New LWP 7656]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fbac6a094a7 in glusterd_op_ac_brick_op_failed () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so
Missing separate debuginfos, use: debuginfo-install glusterfs-server-6.1-1.el7.x86_64
(gdb) t a a bt

Thread 9 (Thread 0x7fbac3a77700 (LWP 7656)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007fbac6aafddb in hooks_worker () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so
#2  0x00007fbad16fedd5 in start_thread (arg=0x7fbac3a77700) at pthread_create.c:307
#3  0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 8 (Thread 0x7fbac7e99700 (LWP 7530)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007fbad28ff810 in syncenv_task () from /lib64/libglusterfs.so.0
#2  0x00007fbad29006c0 in syncenv_processor () from /lib64/libglusterfs.so.0
#3  0x00007fbad16fedd5 in start_thread (arg=0x7fbac7e99700) at pthread_create.c:307
#4  0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 7 (Thread 0x7fbac7698700 (LWP 7531)):
#0  0x00007fbad0fbcf73 in select () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fbad293e7e4 in runner () from /lib64/libglusterfs.so.0
#2  0x00007fbad16fedd5 in start_thread (arg=0x7fbac7698700) at pthread_create.c:307
#3  0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 6 (Thread 0x7fbac8e9b700 (LWP 7528)):
#0  0x00007fbad0f8ce2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fbad0f8ccc4 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x00007fbad28eb54d in pool_sweeper () from /lib64/libglusterfs.so.0
#3  0x00007fbad16fedd5 in start_thread (arg=0x7fbac8e9b700) at pthread_create.c:307
#4  0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 5 (Thread 0x7fbac969c700 (LWP 7527)):
#0  0x00007fbad1706361 in do_sigwait (sig=0x7fbac969be1c, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:60
#1  __sigwait (set=0x7fbac969be20, sig=0x7fbac969be1c) at ../sysdeps/unix/sysv/linux/sigwait.c:95
#2  0x000055b5e9cda1bb in glusterfs_sigwaiter ()
#3  0x00007fbad16fedd5 in start_thread (arg=0x7fbac969c700) at pthread_create.c:307
#4  0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 4 (Thread 0x7fbad2dbe780 (LWP 7525)):
#0  0x00007fbad16fff47 in pthread_join (threadid=140440114784000, thread_return=0x0) at pthread_join.c:90
#1  0x00007fbad2923478 in event_dispatch_epoll () from /lib64/libglusterfs.so.0
#2  0x000055b5e9cd6735 in main ()

Thread 3 (Thread 0x7fbac869a700 (LWP 7529)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007fbad28ff810 in syncenv_task () from /lib64/libglusterfs.so.0
#2  0x00007fbad29006c0 in syncenv_processor () from /lib64/libglusterfs.so.0
#3  0x00007fbad16fedd5 in start_thread (arg=0x7fbac869a700) at pthread_create.c:307
#4  0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

---Type <return> to continue, or q <return> to quit---
Thread 2 (Thread 0x7fbac9e9d700 (LWP 7526)):
#0  0x00007fbad1705e3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fbad28cdf76 in gf_timer_proc () from /lib64/libglusterfs.so.0
#2  0x00007fbad16fedd5 in start_thread (arg=0x7fbac9e9d700) at pthread_create.c:307
#3  0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 1 (Thread 0x7fbac3276700 (LWP 7657)):
#0  0x00007fbac6a094a7 in glusterd_op_ac_brick_op_failed () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so
#1  0x00007fbac6a09db7 in glusterd_op_sm () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so
#2  0x00007fbac6a419dc in glusterd_mgmt_v3_lock_peers_cbk_fn () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so
#3  0x00007fbac6a40faa in glusterd_big_locked_cbk () from /usr/lib64/glusterfs/6.1/xlator/mgmt/glusterd.so
#4  0x00007fbad2669021 in rpc_clnt_handle_reply () from /lib64/libgfrpc.so.0
#5  0x00007fbad2669387 in rpc_clnt_notify () from /lib64/libgfrpc.so.0
#6  0x00007fbad26659f3 in rpc_transport_notify () from /lib64/libgfrpc.so.0
#7  0x00007fbac5c0b875 in socket_event_handler () from /usr/lib64/glusterfs/6.1/rpc-transport/socket.so
#8  0x00007fbad2924286 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#9  0x00007fbad16fedd5 in start_thread (arg=0x7fbac3276700) at pthread_create.c:307
#10 0x00007fbad0fc5ead in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Comment 5 Sanju 2020-02-24 07:01:28 UTC

Hi Anthony,

Sorry for delayed response on this bug. Can you please install the debuginfo package related to glusterfs and then provide the back trace?

Thanks,
Sanju

Comment 6 Worker Ant 2020-03-17 03:22:30 UTC

This bug is moved to https://github.com/gluster/glusterfs/issues/1106, and will be tracked there from now on. Visit GitHub issues URL for further details

Comment 7 Red Hat Bugzilla 2023-09-14 05:49:20 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.