Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1682925

Summary: Gluster volumes never heal during oVirt 4.2->4.3 upgrade
Product: [Community] GlusterFS Reporter: Jason <jthomasp>
Component: replicateAssignee: Ravishankar N <ravishankar>
Status: CLOSED INSUFFICIENT_DATA QA Contact:
Severity: urgent Docs Contact:
Priority: urgent    
Version: 5CC: atumball, budic, bugs, bugs, cshao, godas, huzhao, nlevy, pasik, qiyuan, ravishankar, sabose, sbonazzo, weiwang, yaniwang, ycui, yturgema
Target Milestone: ---Flags: jthomasp: needinfo-
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-05 08:32:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Gluster RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1677319    
Attachments:
Description Flags
Glusterd log
none
Excerpt from glusterfsd.log. Whole log is >100MB
none
data brick log
none
data1ssd brick log
none
data2 brick log
none
engine brick log none

Description Jason 2019-02-25 20:41:34 UTC
Created attachment 1538575 [details]
Glusterd log

Description of problem: Upgraded the hosted engine, then 2 of my 4 ovirt nodes.  After that, gluster never fully healed.  During troubleshooting with Telsin on IRC, we noticed that multiple glusterfsd processes were launched for each brick on the upgraded 4.3 nodes.  


Version-Release number of selected component (if applicable):
ovirt node 4.3
Gluster 5.3


How reproducible:
I have not tried to reproduce it.

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
I am attaching logs as directed by Sahina Bose on the ovirt users mailing list.  The upgrade happened on February 20th and continued into February 21st until I rolled the two nodes back to oVirt node 4.2.

Comment 1 Jason 2019-02-25 20:47:07 UTC
Created attachment 1538577 [details]
Excerpt from glusterfsd.log.  Whole log is >100MB

Comment 2 Jason 2019-02-25 21:00:00 UTC
Created attachment 1538579 [details]
data brick log

Comment 3 Jason 2019-02-25 21:00:27 UTC
Created attachment 1538580 [details]
data1ssd brick log

Comment 4 Jason 2019-02-25 21:01:06 UTC
Created attachment 1538582 [details]
data2 brick log

Comment 5 Jason 2019-02-25 21:01:27 UTC
Created attachment 1538583 [details]
engine brick log

Comment 6 Sahina Bose 2019-03-27 06:27:37 UTC
Ravi, can you or someone on the team take a look?

Comment 7 Ravishankar N 2019-03-27 08:36:24 UTC
Hi Jason, this is probably a little late but what is the state now? For debugging incomplete heals, we would need the list of files (`gluster vol heal $volname info`) and the `getfattr -d -m. -e hex /path/to/brick/file-in-question` outputs of the files from all 3 the bricks of the replica along with the glustershd.log from all 3 nodes. Please also provide the output of` gluster volume info $volname`

Comment 8 Jason 2019-03-27 14:31:00 UTC
I reverted my nodes back to oVirt node 4.2 and they healed up just fine.  I do not have the results of the commands you've requested.  I plan to spin up a testing cluster, install 4.2 on it, then upgrade to 4.3 to see if there's still problems.  We have a lot of new hardware coming in soon, so I'll be light on time to mess with oVirt for a few weeks.

Comment 9 Ravishankar N 2019-08-05 08:32:55 UTC
I'm closing this bug as there is not much information on what the problem is. Please feel free to re-open with the relevant details/ reproducer steps if issue occurs again.