Bug 1773532 - Gluster brick randomly segfaults
Summary: Gluster brick randomly segfaults
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: 6
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-11-18 11:33 UTC by Dominik Drazyk
Modified: 2022-05-13 08:53 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2020-03-12 12:17:32 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:
ddrazyk: needinfo-
ddrazyk: needinfo-


Attachments (Terms of Use)
Compressed logs from journald, glusterd and vdsm. (2.81 MB, application/gzip)
2019-11-18 11:33 UTC, Dominik Drazyk
no flags Details
Gluster logs (809.75 KB, application/gzip)
2019-11-18 12:30 UTC, Dominik Drazyk
no flags Details
Gluster logs node02 (1.65 MB, application/gzip)
2019-11-18 12:30 UTC, Dominik Drazyk
no flags Details
Gluster logs node03 (562.81 KB, application/gzip)
2019-11-18 12:31 UTC, Dominik Drazyk
no flags Details

Description Dominik Drazyk 2019-11-18 11:33:57 UTC
Created attachment 1637263 [details]
Compressed logs from journald, glusterd and vdsm.

Description of problem:
I am running a 3 node ovirt cluster with glusterfs storage domain. Gluster is configured with lvm cache with writeback caching with hardware RAID (one virtual device is SSD and second is HDD on LSI controller) backed by xfs. There are two volumes served by this cluster: wiosna-vmstore which serves as Data storage and wiosna-iso which is an ISO domain. Both have sharding turned on. Management is on a separate physical machine. 
I randomly get glusterd segfaults which cause a brick to go down (it's either iso or vmstore, never both). When two nodes get a segfault then all VM's end up in Paused state. All hosts run a

Version-Release number of selected component (if applicable):
vdsm-gluster-4.30.33-1.el7.x86_64
glusterfs-6.6-1.el7.x86_64

How reproducible:
Don't know. Occurs randomly.

Steps to Reproduce:
N/A

Actual results:
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: patchset: git://git.gluster.org/glusterfs.git
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: signal received: 11
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: time of crash:
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: 2019-11-18 00:53:27
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: configuration details:
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: argp 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: backtrace 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: dlfcn 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: libpthread 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: llistxattr 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: setfsid 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: spinlock 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: epoll.h 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: xattr.h 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: st_atim.tv_nsec 1
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: package-string: glusterfs 6.5
lis 18 01:53:27 node01.wiosna.org.pl opt-data-vmstore[15340]: ---------


Expected results:
Normal operation.

Additional info:

Comment 1 Yaniv Kaul 2019-11-18 12:16:07 UTC
Is this an updat

Comment 2 Yaniv Kaul 2019-11-18 12:16:36 UTC
Can you please share complete logs?

Comment 3 Dominik Drazyk 2019-11-18 12:19:17 UTC
Hi Yaniv,
which one you need? Journald?

Comment 4 Yaniv Kaul 2019-11-18 12:20:53 UTC
(In reply to Dominik Drazyk from comment #3)
> Hi Yaniv,
> which one you need? Journald?

1. If the version you are using is 4.1, please upgrade promptly.
2. Gluster logs.

Comment 5 Dominik Drazyk 2019-11-18 12:30:17 UTC
Created attachment 1637268 [details]
Gluster logs

Gluster logs from node01

Comment 6 Dominik Drazyk 2019-11-18 12:30:54 UTC
Created attachment 1637269 [details]
Gluster logs node02

Gluster logs from node02

Comment 7 Dominik Drazyk 2019-11-18 12:31:20 UTC
Created attachment 1637270 [details]
Gluster logs node03

Gluster logs for node03

Comment 8 Dominik Drazyk 2019-11-18 12:35:14 UTC
I use ovirt 4.3 - newest version. I've updated to the newest version this morning. Before the update I had packages from 12.10.2019. Otherwise glusterd is in the newest version that comes from ovirt repository 4.3 for CentOS 7.

Comment 9 Yaniv Kaul 2019-11-18 13:34:41 UTC
I can see the crash:
2019-11-18 00:53:04.287021] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-wiosna-vmstore-posix: write failed: offset 0, [Invalid argument]
[2019-11-18 00:53:04.287072] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-wiosna-vmstore-server: 26723901: WRITEV 5 (5bb969ab-4ed1-4afa-b136-78bf7ead800d), client: CTX_ID:2cd653a2-1bae-4df7-adbe-9b8151275b15-GRAPH_ID:0-PID:14082-HOST:node03.wiosna.org.pl-PC_NAME:wiosna-vmstore-client-0-RECON_NO:-0, error-xlator: wiosna-vmstore-posix [Invalid argument]
[2019-11-18 00:53:26.572901] E [socket.c:1303:socket_event_poll_err] (-->/lib64/libglusterfs.so.0(+0x8b806) [0x7f6fd1543806] -->/usr/lib64/glusterfs/6.5/rpc-transport/socket.so(+0xa48a) [0x7f6fc58a148a] -->/usr/lib64/glusterfs/6.5/rpc-transport/socket.so(+0x81fc) [0x7f6fc589f1fc] ) 0-socket: invalid argument: this->private [Invalid argument]
pending frames:
patchset: git://git.gluster.org/glusterfs.git
signal received: 11
time of crash: 
2019-11-18 00:53:27
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 6.5
/lib64/libglusterfs.so.0(+0x27130)[0x7f6fd14df130]
/lib64/libglusterfs.so.0(gf_print_trace+0x334)[0x7f6fd14e9b34]
/lib64/libc.so.6(+0x363b0)[0x7f6fcfb1c3b0]
/usr/lib64/glusterfs/6.5/rpc-transport/socket.so(+0xa4cc)[0x7f6fc58a14cc]
/lib64/libglusterfs.so.0(+0x8b806)[0x7f6fd1543806]
/lib64/libpthread.so.0(+0x7e65)[0x7f6fd031ee65]
/lib64/libc.so.6(clone+0x6d)[0x7f6fcfbe488d]
---------


But I'm certainly just as concerned on the write failures which the log is flooded with:
[2019-11-17 02:20:47.554644] E [MSGID: 113072] [posix-inode-fd-ops.c:1898:posix_writev] 0-wiosna-vmstore-posix: write failed: offset 0, [Invalid argument]
[2019-11-17 02:20:47.554694] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-wiosna-vmstore-server: 12021489: WRITEV 2 (82c44424-d251-42cf-ad23-13f0f9ea5ed7), client: CTX_ID:c51b6193-1da1-4e46-9d30-637f419dcae1-GRAPH_ID:0-PID:15471-HOST:node01.wiosna.org.pl-PC_NAME:wiosna-vmstore-client-0-RECON_NO:-0, error-xlator: wiosna-vmstore-posix [Invalid argument]


Gobinda, can you take a look?

Comment 10 Gobinda Das 2019-11-19 09:02:55 UTC
In VDSm log I can see the volumes are created with 4096 blocksize.
2019-11-17 10:21:39,885+0200 DEBUG (jsonrpc/5) [jsonrpc.JsonRpcServer] Return 'GlusterVolume.status' in bridge with {'volumeStatus': {'bricks': [{'hostuuid': '08181378-2670-46d6-8784-7cf5df121f34', 'blockSize': '4096', 'sizeFree': '72913.555', 'sizeTotal': '102346.004', 'mntOptions': 'rw,seclabel,noatime,nodiratime,attr2,inode64,sunit=512,swidth=1024,noquota', 'device': '/dev/mapper/gluster_vg_nvme-gluster_lv_data_fast2', 'brick': 'gluster1:/gluster_bricks/data_fast2/data_fast2', 'fsName': 'xfs'}, {'hostuuid': 'ea6dc070-7bf7-4258-87c8-38183f49805d', 'blockSize': '4096', 'sizeFree': '72909.633', 'sizeTotal': '102346.004', 'mntOptions': 'rw,seclabel,noatime,nodiratime,attr2,inode64,sunit=512,swidth=1024,noquota', 'device': '/dev/mapper/gluster_vg_nvme-gluster_lv_data_fast2', 'brick': 'gluster2:/gluster_bricks/data_fast2/data_fast2', 'fsName': 'xfs'}, {'hostuuid': 'ad1547fe-7469-4f10-b9cf-1dd30317ce2c', 'blockSize': '4096', 'sizeFree': '15292.875', 'sizeTotal': '15350.000', 'mntOptions': 'rw,seclabel,noatime,nodiratime,attr2,inode64,logbsize=256k,sunit=512,swidth=512,noquota', 'device': '/dev/mapper/gluster_vg_sda3-gluster_lv_data_fast2', 'brick': 'ovirt3:/gluster_bricks/data_fast2/data_fast2', 'fsName': 'xfs'}], 'volumeStatsInfo': {'sizeTotal': '107317563392', 'sizeUsed': '31939444736', 'sizeFree': '75378118656'}, 'name': 'data_fast2'}} (__init__:356)

@Dominik Are you using 4kN drive? If not then I wonder whether 4kN code is giving trouble. [Invalid argument] could be because of 4kN.

Comment 11 Dominik Drazyk 2019-11-19 09:32:25 UTC
Hi Gobinda,

fdisk reports as follows:
Sector size (logical/physical): 512 bytes / 4096 bytes (that's SSD virtual drive used for caching)
Sector size (logical/physical): 512 bytes / 4096 bytes (that's HDD virtual drive used for data)
LSI tools return the same - logical sector size as 512 and physical as 4096.
Both are RAID1 on LSI controller. I used default options for creating LVM pools and XFS. Mount options are as below:
/dev/storage/hdd	/opt/data	xfs	rw,inode64,noatime,nouuid,nodiratime	0 0

Comment 12 Gobinda Das 2019-11-20 07:04:07 UTC
Hi Dominik,
 Thanks for the info but fdisk always gives 4096 eventhough it's 512.
Can you please check the output of "blockdev --getss /dev/<device name>" ?
The invalid argument from log looks like there is mismatch between the requested size and the actual size.
But crash may not be because of this could be bug in socket.

Comment 13 Dominik Drazyk 2019-11-20 07:37:31 UTC
Both are 512:
[dominik@node01 ~]$ sudo blockdev --getss /dev/sdb
512
[dominik@node01 ~]$ sudo blockdev --getss /dev/sda
512

Comment 14 Gobinda Das 2019-11-21 06:01:23 UTC
Hi Raghavendra,
 Can you please help me here to find out reason of crashing, I am thinking there is some issue in socket?

Comment 15 Sanju 2019-11-25 07:02:18 UTC
Hi,

Can you please provide us output of "bt" and "t a a bt" from the corefile? That helps us investigating this issue faster. If possible, do share the core file.

Thanks,
Sanju

Comment 16 Dominik Drazyk 2019-11-26 07:57:21 UTC
Hi Sanju,
where can I find the core file?

Kind regards,
Dominik

Comment 17 Sanju 2019-11-26 08:06:01 UTC
It should be in its default location / directory, unless you have customized kernel.core_pattern in /etc/sysctl.conf.

In my case, I have something like below. So, my core files will be stored under /root/cores/

[root@localhost glusterfs]# cat /etc/sysctl.conf 
## Own core file pattern...
kernel.core_pattern=/root/cores/core.%e.%p.%h.%t
[root@localhost glusterfs]#

HTH,
Sanju

Comment 18 Dominik Drazyk 2019-11-26 08:25:22 UTC
Ok I found it.

(gdb) bt
#0  0x00007f7d0c38564c in socket_event_handler () from /usr/lib64/glusterfs/6.6/rpc-transport/socket.so
#1  0x00007f7d18028ae6 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

(gdb) t a a bt

Thread 40 (Thread 0x7f7d184bf4c0 (LWP 138985)):
#0  0x00007f7d16e03fd7 in pthread_join () from /lib64/libpthread.so.0
#1  0x00007f7d18027cd8 in event_dispatch_epoll () from /lib64/libglusterfs.so.0
#2  0x000055f596714723 in main ()

Thread 39 (Thread 0x7f7d000ac700 (LWP 140549)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 38 (Thread 0x7f7cf46b3700 (LWP 140758)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 37 (Thread 0x7f7cf4431700 (LWP 140760)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 36 (Thread 0x7f7d0d59b700 (LWP 138991)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d18002ef0 in syncenv_task () from /lib64/libglusterfs.so.0
#2  0x00007f7d18003da0 in syncenv_processor () from /lib64/libglusterfs.so.0
#3  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 35 (Thread 0x7f7cf4472700 (LWP 140759)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 34 (Thread 0x7f7d0ed9e700 (LWP 138988)):
#0  0x00007f7d16e0a381 in sigwait () from /lib64/libpthread.so.0
#1  0x000055f5967181ab in glusterfs_sigwaiter ()
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 33 (Thread 0x7f7d0dd9c700 (LWP 138990)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d18002ef0 in syncenv_task () from /lib64/libglusterfs.so.0
#2  0x00007f7d18003da0 in syncenv_processor () from /lib64/libglusterfs.so.0
#3  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 32 (Thread 0x7f7d0cd9a700 (LWP 138992)):
#0  0x00007f7d166bf953 in select () from /lib64/libc.so.6
#1  0x00007f7d18043044 in runner () from /lib64/libglusterfs.so.0
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 31 (Thread 0x7f7d000ed700 (LWP 140548)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 30 (Thread 0x7f7d0016f700 (LWP 139188)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 29 (Thread 0x7f7d008f8700 (LWP 139011)):
#0  0x00007f7d16e069f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d02f69d24 in index_worker () from /usr/lib64/glusterfs/6.6/xlator/features/index.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 28 (Thread 0x7f7d0e59d700 (LWP 138989)):
#0  0x00007f7d1668f80d in nanosleep () from /lib64/libc.so.6
#1  0x00007f7d1668f6a4 in sleep () from /lib64/libc.so.6
#2  0x00007f7d17fef678 in pool_sweeper () from /lib64/libglusterfs.so.0
#3  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 27 (Thread 0x7f7d0012e700 (LWP 139189)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 26 (Thread 0x7f7cedffb700 (LWP 139134)):
#0  0x00007f7d16e069f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d17d66bea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 25 (Thread 0x7f7d0f59f700 (LWP 138987)):
#0  0x00007f7d16e09e5d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f7d17fd2396 in gf_timer_proc () from /lib64/libglusterfs.so.0
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 24 (Thread 0x7f7cef7fe700 (LWP 139114)):
#0  0x00007f7d16e069f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d17d66bea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 23 (Thread 0x7f7d001f1700 (LWP 139118)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
---Type <return> to continue, or q <return> to quit---
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 22 (Thread 0x7f7cf46f4700 (LWP 140757)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 21 (Thread 0x7f7cf47b7700 (LWP 140552)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 20 (Thread 0x7f7cf4735700 (LWP 140756)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x7f7cf4776700 (LWP 140755)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7f7cf47f8700 (LWP 140551)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7f7d0006b700 (LWP 140550)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7f7d001b0700 (LWP 139149)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 15 (Thread 0x7f7cee7fc700 (LWP 139126)):
#0  0x00007f7d16e069f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d17d66bea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f7ceeffd700 (LWP 139117)):
#0  0x00007f7d16e069f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d17d66bea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
---Type <return> to continue, or q <return> to quit---
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f7cf7fff700 (LWP 139013)):
#0  0x00007f7d16e069f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d08c17e23 in changelog_ev_connector () from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f7ceffff700 (LWP 139020)):
#0  0x00007f7d16e069f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d096812db in posix_fsyncer_pick () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#2  0x00007f7d09681565 in posix_fsyncer () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#3  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f7cf4ff9700 (LWP 139019)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d0967b663 in posix_ctx_janitor_thread_proc () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f7cf5ffb700 (LWP 139017)):
#0  0x00007f7d1668f80d in nanosleep () from /lib64/libc.so.6
#1  0x00007f7d1668f6a4 in sleep () from /lib64/libc.so.6
#2  0x00007f7d096810b0 in posix_disk_space_check_thread_proc () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#3  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f7d080f7700 (LWP 139012)):
#0  0x00007f7d16e06da2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f7d037b5c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f7cf67fc700 (LWP 139016)):
#0  0x00007f7d166bf953 in select () from /lib64/libc.so.6
#1  0x00007f7d08c1808a in changelog_ev_dispatch () from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f7cf57fa700 (LWP 139018)):
#0  0x00007f7d1668f80d in nanosleep () from /lib64/libc.so.6
#1  0x00007f7d1668f6a4 in sleep () from /lib64/libc.so.6
#2  0x00007f7d096808da in posix_health_check_thread_proc () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#3  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f7cf6ffd700 (LWP 139015)):
#0  0x00007f7d166bf953 in select () from /lib64/libc.so.6
#1  0x00007f7d08c1808a in changelog_ev_dispatch () from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 5 (Thread 0x7f7cf77fe700 (LWP 139014)):
#0  0x00007f7d166bf953 in select () from /lib64/libc.so.6
#1  0x00007f7d08c1808a in changelog_ev_dispatch () from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f7d0a0c8700 (LWP 138998)):
#0  0x00007f7d166c8e63 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f7d180288c0 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f7d01e01700 (LWP 139010)):
#0  0x00007f7d166c8e63 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f7d180288c0 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f7d02602700 (LWP 139009)):
#0  0x00007f7d166c8e63 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f7d180288c0 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f7d0a8c9700 (LWP 138997)):
#0  0x00007f7d0c38564c in socket_event_handler () from /usr/lib64/glusterfs/6.6/rpc-transport/socket.so
#1  0x00007f7d18028ae6 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2  0x00007f7d16e02e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f7d166c888d in clone () from /lib64/libc.so.6


I can share the core file as a link (it's more than 700MB). Is that fine with bugzilla's policy?

Comment 19 Dominik Drazyk 2019-11-28 09:39:55 UTC
Hi Sanju,
any progress with this bug?

Kind regards,
Dominik

Comment 20 Sanju 2019-12-02 06:34:35 UTC
Hi,

I suspect whether you have provided the right coredump, because all the backtraces look usual. I'm unable to make anything out of this.

Thanks,
Sanju

Comment 21 Dominik Drazyk 2019-12-04 09:55:06 UTC
Hi Sanju,
I can install debuginfo packages if that might help with debugging. Do I need to restart glusterd on every node after that?

Kind regards,
Dominik

Comment 22 Gobinda Das 2019-12-05 08:57:19 UTC
Forwarding needinfo on Sanju.

Comment 23 Sanju 2019-12-05 10:07:16 UTC
(In reply to Dominik Drazyk from comment #21)
> Hi Sanju,
> I can install debuginfo packages if that might help with debugging. Do I
> need to restart glusterd on every node after that?
> 
> Kind regards,
> Dominik

Looking at the backtrace you have provided, I can say that you have already installed debug-info packages. I suspect you have provided the backtrace from the wrong core file, as it doesn't have any backtrace where we can see any process crashing. Can you please cross check?

Thanks,
Sanju

Comment 24 Dominik Drazyk 2019-12-06 07:43:49 UTC
Hi Sanju,
below is the newest backtrace.

journalctl -u glusterd:

gru 06 02:54:38 node02 opt-data-vmstore[22826]: pending frames:
gru 06 02:54:38 node02 opt-data-vmstore[22826]: patchset: git://git.gluster.org/glusterfs.g
gru 06 02:54:38 node02 opt-data-vmstore[22826]: signal received: 11
gru 06 02:54:38 node02 opt-data-vmstore[22826]: time of crash:
gru 06 02:54:38 node02 opt-data-vmstore[22826]: 2019-12-06 01:54:38
gru 06 02:54:38 node02 opt-data-vmstore[22826]: configuration details:
gru 06 02:54:38 node02 opt-data-vmstore[22826]: argp 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: backtrace 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: dlfcn 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: libpthread 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: llistxattr 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: setfsid 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: spinlock 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: epoll.h 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: xattr.h 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: st_atim.tv_nsec 1
gru 06 02:54:38 node02 opt-data-vmstore[22826]: package-string: glusterfs 6.6
gru 06 02:54:38 node02 opt-data-vmstore[22826]: ---------

ls -l /var/tmp/abrt/
total 12
drwxr-x---. 2 root abrt 4096 12-06 02:56 ccpp-2019-12-06-02:54:38-22833
-rw-------. 1 root root   20 12-06 02:54 last-ccpp
-rw-------. 1 root root   23 08-29 00:12 last-via-server

ls -l coredump 
-rw-r-----. 1 root abrt 514240512 12-06 02:54 coredump

So this should be the correct coredump. The brick on node02 was down since 02:54.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfsd -s 10.20.99.202 --volfile-id wiosna-vmstore.10.20.99.202.o'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f82e893964c in socket_event_handler () from /usr/lib64/glusterfs/6.6/rpc-transport/socket.so
Missing separate debuginfos, use: debuginfo-install glusterfs-server-6.6-1.el7.x86_64
(gdb) 
(gdb) bt
#0  0x00007f82e893964c in socket_event_handler () from /usr/lib64/glusterfs/6.6/rpc-transport/socket.so
#1  0x00007f82f45dcae6 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6


(gdb) t a a bt

Thread 41 (Thread 0x7f82e934e700 (LWP 22832)):
#0  0x00007f82f2c73953 in select () from /lib64/libc.so.6
#1  0x00007f82f45f7044 in runner () from /lib64/libglusterfs.so.0
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 40 (Thread 0x7f82d026d700 (LWP 23162)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 39 (Thread 0x7f82d02ef700 (LWP 23160)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 38 (Thread 0x7f82c6ffd700 (LWP 22876)):
#0  0x00007f82f33ba9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82f431abea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 37 (Thread 0x7f82dc0a1700 (LWP 22873)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 36 (Thread 0x7f82c67fc700 (LWP 22880)):
#0  0x00007f82f33ba9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82f431abea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 35 (Thread 0x7f82d0571700 (LWP 23084)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 34 (Thread 0x7f82d06f7700 (LWP 22883)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 33 (Thread 0x7f82d06b6700 (LWP 22884)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
---Type <return> to continue, or q <return> to quit---
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 32 (Thread 0x7f82c77fe700 (LWP 22851)):
#0  0x00007f82f33ba9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82f431abea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 31 (Thread 0x7f82d05b2700 (LWP 23083)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 30 (Thread 0x7f82dc060700 (LWP 22882)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 29 (Thread 0x7f82d02ae700 (LWP 23161)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 28 (Thread 0x7f82d17fa700 (LWP 22845)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82e5c2f663 in posix_ctx_janitor_thread_proc ()
   from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 27 (Thread 0x7f82d05f3700 (LWP 23082)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 26 (Thread 0x7f82d022c700 (LWP 23163)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 25 (Thread 0x7f82d0634700 (LWP 23081)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

---Type <return> to continue, or q <return> to quit---
Thread 24 (Thread 0x7f82c7fff700 (LWP 22850)):
#0  0x00007f82f33ba9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82f431abea in rpcsvc_request_handler () from /lib64/libgfrpc.so.0
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 23 (Thread 0x7f82d0330700 (LWP 23159)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 22 (Thread 0x7f82d0675700 (LWP 22885)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 21 (Thread 0x7f82d01eb700 (LWP 23164)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 20 (Thread 0x7f82ea350700 (LWP 22830)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82f45b6ef0 in syncenv_task () from /lib64/libglusterfs.so.0
#2  0x00007f82f45b7da0 in syncenv_processor () from /lib64/libglusterfs.so.0
#3  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 19 (Thread 0x7f82eab51700 (LWP 22829)):
#0  0x00007f82f2c4380d in nanosleep () from /lib64/libc.so.6
#1  0x00007f82f2c436a4 in sleep () from /lib64/libc.so.6
#2  0x00007f82f45a3678 in pool_sweeper () from /lib64/libglusterfs.so.0
#3  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 18 (Thread 0x7f82eb352700 (LWP 22828)):
#0  0x00007f82f33be381 in sigwait () from /lib64/libpthread.so.0
#1  0x0000558a11f461ab in glusterfs_sigwaiter ()
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 17 (Thread 0x7f82dcf39700 (LWP 22837)):
#0  0x00007f82f33ba9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82df5aad24 in index_worker () from /usr/lib64/glusterfs/6.6/xlator/features/index.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 16 (Thread 0x7f82f4a734c0 (LWP 22826)):
#0  0x00007f82f33b7fd7 in pthread_join () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#1  0x00007f82f45dbcd8 in event_dispatch_epoll () from /lib64/libglusterfs.so.0
#2  0x0000558a11f42723 in main ()

Thread 15 (Thread 0x7f82d0ff9700 (LWP 22846)):
#0  0x00007f82f33ba9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82e5c352db in posix_fsyncer_pick () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#2  0x00007f82e5c35565 in posix_fsyncer () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#3  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 14 (Thread 0x7f82e9b4f700 (LWP 22831)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82f45b6ef0 in syncenv_task () from /lib64/libglusterfs.so.0
#2  0x00007f82f45b7da0 in syncenv_processor () from /lib64/libglusterfs.so.0
#3  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 13 (Thread 0x7f82dca34700 (LWP 22839)):
#0  0x00007f82f33ba9f5 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82e51cbe23 in changelog_ev_connector ()
   from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 12 (Thread 0x7f82ebb53700 (LWP 22827)):
#0  0x00007f82f33bde5d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007f82f4586396 in gf_timer_proc () from /lib64/libglusterfs.so.0
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 11 (Thread 0x7f82f48a74c0 (LWP 730)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82f37d186c in handle_fildes_io () from /lib64/librt.so.1
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f82d27fc700 (LWP 22843)):
#0  0x00007f82f2c4380d in nanosleep () from /lib64/libc.so.6
#1  0x00007f82f2c436a4 in sleep () from /lib64/libc.so.6
#2  0x00007f82e5c350b0 in posix_disk_space_check_thread_proc ()
   from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#3  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f82d1ffb700 (LWP 22844)):
#0  0x00007f82f2c4380d in nanosleep () from /lib64/libc.so.6
#1  0x00007f82f2c436a4 in sleep () from /lib64/libc.so.6
#2  0x00007f82e5c34422 in posix_fs_health_check () from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#3  0x00007f82e5c348f2 in posix_health_check_thread_proc ()
   from /usr/lib64/glusterfs/6.6/xlator/storage/posix.so
#4  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6
---Type <return> to continue, or q <return> to quit---

Thread 8 (Thread 0x7f82d2ffd700 (LWP 22842)):
#0  0x00007f82f2c73953 in select () from /lib64/libc.so.6
#1  0x00007f82e51cc08a in changelog_ev_dispatch ()
   from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f82d37fe700 (LWP 22841)):
#0  0x00007f82f2c73953 in select () from /lib64/libc.so.6
#1  0x00007f82e51cc08a in changelog_ev_dispatch ()
   from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f82d3fff700 (LWP 22840)):
#0  0x00007f82f2c73953 in select () from /lib64/libc.so.6
#1  0x00007f82e51cc08a in changelog_ev_dispatch ()
   from /usr/lib64/glusterfs/6.6/xlator/features/changelog.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f82e667c700 (LWP 22834)):
#0  0x00007f82f2c7ce63 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f82f45dc8c0 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f82e406a700 (LWP 22838)):
#0  0x00007f82f33bada2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00007f82dfdf6c9d in iot_worker () from /usr/lib64/glusterfs/6.6/xlator/performance/io-threads.so
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f82de442700 (LWP 22836)):
#0  0x00007f82f2c7ce63 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f82f45dc8c0 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f82dec43700 (LWP 22835)):
#0  0x00007f82f2c7ce63 in epoll_wait () from /lib64/libc.so.6
#1  0x00007f82f45dc8c0 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7f82e6e7d700 (LWP 22833)):
#0  0x00007f82e893964c in socket_event_handler () from /usr/lib64/glusterfs/6.6/rpc-transport/socket.so
#1  0x00007f82f45dcae6 in event_dispatch_epoll_worker () from /lib64/libglusterfs.so.0
#2  0x00007f82f33b6e65 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f82f2c7c88d in clone () from /lib64/libc.so.6


I guess that's all I can extract from the coredump. I can attach a link to the coredump file if that's helpful. 

Kind regards,
Dominik

Comment 25 Sanju 2020-01-07 14:21:40 UTC
Changing the component to core as it is crash by glusterfsd process.

Comment 26 Dominik Drazyk 2020-01-14 10:28:18 UTC
Hi Sanju,
I deployed another setup based on ovirt 4.3.7 and glusterfs with same hardware (LSI 3108 controller, Intel silver 41xx cpus) and had similar issues. However no segfaults have occurred so far (I've been testing it for 3 days). 
In the new setup, I removed features.shard from the gluster volume and connected via nfs-ganesha. There are no errors in the logs. Using native ovirt gluster connector throws below errors in brick log (similar to the original issue):
[2020-01-14 09:59:19.615459] E [MSGID: 113072] [posix-inode-fd-ops.c:1886:posix_writev] 0-ssd-posix: write failed: offset 0, [Invalid argument]
[2020-01-14 09:59:19.615497] E [MSGID: 115067] [server-rpc-fops_v2.c:1373:server4_writev_cbk] 0-ssd-server: 36: WRITEV 0 (0d4c3c9f-08bf-4b1c-926c-a1abb0e9898a), client: CTX_ID:4bda4abd-49d1-42e8-bcfe-e3353a418932-GRAPH_ID:0-PID:77040-HOST:node02-PC_NAME:ssd-client-0-RECON_NO:-0, error-xlator: ssd-posix [Invalid argument]

Does that help a bit with troubleshooting?

Kind regards,
Dominik

Comment 27 Xavi Hernandez 2020-02-11 14:13:18 UTC
This seems the same as bug #1782495. It should be fixed from versions 6.7 and 7.1. Your initial report was on Gluster 6.6. Can you check if it has been upgraded to 6.7 ? that would explain why it doesn't crash anymore.

Comment 28 Mohit Agrawal 2020-02-19 14:07:04 UTC
Hi Dominik,

  Please share if you have any updates.

Thanks,
Mohit Agrawal

Comment 29 Dominik Drazyk 2020-02-25 10:59:27 UTC
Hi Mohit,
I can try to upgrade Gluster to 7.1, but I need confirmation that it's compatible with oVirt 4.3. 

Kind regards,
Dominik

Comment 30 Worker Ant 2020-03-12 12:17:32 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/861, and will be tracked there from now on. Visit GitHub issues URL for further details


Note You need to log in before you can comment on or make changes to this bug.