Created attachment 1538575 [details] Glusterd log Description of problem: Upgraded the hosted engine, then 2 of my 4 ovirt nodes. After that, gluster never fully healed. During troubleshooting with Telsin on IRC, we noticed that multiple glusterfsd processes were launched for each brick on the upgraded 4.3 nodes. Version-Release number of selected component (if applicable): ovirt node 4.3 Gluster 5.3 How reproducible: I have not tried to reproduce it. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: I am attaching logs as directed by Sahina Bose on the ovirt users mailing list. The upgrade happened on February 20th and continued into February 21st until I rolled the two nodes back to oVirt node 4.2.
Created attachment 1538577 [details] Excerpt from glusterfsd.log. Whole log is >100MB
Created attachment 1538579 [details] data brick log
Created attachment 1538580 [details] data1ssd brick log
Created attachment 1538582 [details] data2 brick log
Created attachment 1538583 [details] engine brick log
Ravi, can you or someone on the team take a look?
Hi Jason, this is probably a little late but what is the state now? For debugging incomplete heals, we would need the list of files (`gluster vol heal $volname info`) and the `getfattr -d -m. -e hex /path/to/brick/file-in-question` outputs of the files from all 3 the bricks of the replica along with the glustershd.log from all 3 nodes. Please also provide the output of` gluster volume info $volname`
I reverted my nodes back to oVirt node 4.2 and they healed up just fine. I do not have the results of the commands you've requested. I plan to spin up a testing cluster, install 4.2 on it, then upgrade to 4.3 to see if there's still problems. We have a lot of new hardware coming in soon, so I'll be light on time to mess with oVirt for a few weeks.
I'm closing this bug as there is not much information on what the problem is. Please feel free to re-open with the relevant details/ reproducer steps if issue occurs again.