1388221 – gluster brick daemon segfaulted in pairs

Bug 1388221 - gluster brick daemon segfaulted in pairs

Summary: gluster brick daemon segfaulted in pairs

Keywords:
Status:	CLOSED EOL
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	trash-xlator
Sub Component:
Version:	3.8
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	high
Target Milestone:	---
Assignee:	bugs@gluster.org
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2016-10-24 18:59 UTC by Jackie Tung
Modified:	2017-11-07 10:41 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-11-07 10:41:56 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
stack trace from one of the cores (487.24 KB, text/plain) 2016-10-25 15:17 UTC, Jackie Tung	no flags	Details
brick daemon log (734.25 KB, application/x-gzip) 2016-10-25 21:04 UTC, Jackie Tung	no flags	Details
View All

Description Jackie Tung 2016-10-24 18:59:45 UTC

Description of problem:

We are running a distributed replicated volume: 16 pairs of bricks (rep count 2), 2 nodes.

On Friday, 2 pairs of brick daemons seg-faulted within minutes of each other, leading to 2 subvolumes down (no replicas left).  We tried to bring them up again by doing a "volume start force”, which worked, but roughly 4 hours later this happened again, but to two other pairs of bricks.

There is nothing of note in brick logs for the downed bricks, except that it just suddenly stops logging.  In the other logs (nfs, glusterhd, etc), we simply start seeing errors saying “All sub volumes down” for those replicates.

This is on Ubuntu 16.04

Version-Release number of selected component (if applicable):
3.8.2

How reproducible:

It happened three rounds in total so far.

Steps to Reproduce:
1. force start volume
2. wait for crash

Additional info:
Core file too large to attach here (60-70M), is there an alternative way to submit it?

Did not see any stacktraces anywhere.

Comment 1 Kaleb KEITHLEY 2016-10-25 12:25:09 UTC

please post the log files.

You can get a stacktrace using gdb with something like `gdb /usr/bin/glusterfsd /core` then `bt`

Once we have that we can better determine the next steps.  Thanks

Comment 2 Jackie Tung 2016-10-25 15:17:09 UTC

Created attachment 1213948 [details]
stack trace from one of the cores

Comment 3 Jackie Tung 2016-10-25 21:04:09 UTC

Created attachment 1214063 [details]
brick daemon log

Comment 4 Niels de Vos 2017-11-07 10:41:56 UTC

This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.

Note You need to log in before you can comment on or make changes to this bug.