1682925 – Gluster volumes never heal during oVirt 4.2->4.3 upgrade

Bug 1682925 - Gluster volumes never heal during oVirt 4.2->4.3 upgrade

Summary: Gluster volumes never heal during oVirt 4.2->4.3 upgrade

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	replicate
Sub Component:
Version:	5
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Assignee:	Ravishankar N
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Gluster_5_Affecting_oVirt_4.3
TreeView+	depends on / blocked

Reported:	2019-02-25 20:41 UTC by Jason
Modified:	2019-10-25 13:31 UTC (History)
CC List:	17 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2019-08-05 08:32:55 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:
Flags:	jthomasp: needinfo-

Attachments	(Terms of Use)
Glusterd log (2.23 MB, text/plain) 2019-02-25 20:41 UTC, Jason	no flags	Details
Excerpt from glusterfsd.log. Whole log is >100MB (78.60 KB, text/plain) 2019-02-25 20:47 UTC, Jason	no flags	Details
data brick log (2.90 MB, text/plain) 2019-02-25 21:00 UTC, Jason	no flags	Details
data1ssd brick log (2.98 MB, text/plain) 2019-02-25 21:00 UTC, Jason	no flags	Details
data2 brick log (2.88 MB, text/plain) 2019-02-25 21:01 UTC, Jason	no flags	Details
engine brick log (3.06 MB, text/plain) 2019-02-25 21:01 UTC, Jason	no flags	Details
View All

Description Jason 2019-02-25 20:41:34 UTC

Created attachment 1538575 [details]
Glusterd log

Description of problem: Upgraded the hosted engine, then 2 of my 4 ovirt nodes.  After that, gluster never fully healed.  During troubleshooting with Telsin on IRC, we noticed that multiple glusterfsd processes were launched for each brick on the upgraded 4.3 nodes.  


Version-Release number of selected component (if applicable):
ovirt node 4.3
Gluster 5.3


How reproducible:
I have not tried to reproduce it.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
I am attaching logs as directed by Sahina Bose on the ovirt users mailing list.  The upgrade happened on February 20th and continued into February 21st until I rolled the two nodes back to oVirt node 4.2.

Comment 1 Jason 2019-02-25 20:47:07 UTC

Created attachment 1538577 [details]
Excerpt from glusterfsd.log.  Whole log is >100MB

Comment 2 Jason 2019-02-25 21:00:00 UTC

Created attachment 1538579 [details]
data brick log

Comment 3 Jason 2019-02-25 21:00:27 UTC

Created attachment 1538580 [details]
data1ssd brick log

Comment 4 Jason 2019-02-25 21:01:06 UTC

Created attachment 1538582 [details]
data2 brick log

Comment 5 Jason 2019-02-25 21:01:27 UTC

Created attachment 1538583 [details]
engine brick log

Comment 6 Sahina Bose 2019-03-27 06:27:37 UTC

Ravi, can you or someone on the team take a look?

Comment 7 Ravishankar N 2019-03-27 08:36:24 UTC

Hi Jason, this is probably a little late but what is the state now? For debugging incomplete heals, we would need the list of files (`gluster vol heal $volname info`) and the `getfattr -d -m. -e hex /path/to/brick/file-in-question` outputs of the files from all 3 the bricks of the replica along with the glustershd.log from all 3 nodes. Please also provide the output of` gluster volume info $volname`

Comment 8 Jason 2019-03-27 14:31:00 UTC

I reverted my nodes back to oVirt node 4.2 and they healed up just fine.  I do not have the results of the commands you've requested.  I plan to spin up a testing cluster, install 4.2 on it, then upgrade to 4.3 to see if there's still problems.  We have a lot of new hardware coming in soon, so I'll be light on time to mess with oVirt for a few weeks.

Comment 9 Ravishankar N 2019-08-05 08:32:55 UTC

I'm closing this bug as there is not much information on what the problem is. Please feel free to re-open with the relevant details/ reproducer steps if issue occurs again.

Note You need to log in before you can comment on or make changes to this bug.