Description of problem: hi team, my configuration is as follows: 3 replica gluster servers containing VM lvm and VM hostedEngine 3.8.10 + 1 server running all VMs. Due to a prolonged nighttime power failure and ups battery consumption the system shut down, after starting looking at the logs I saw 2 boot in 3 minutes, so I assume that the power was cut several times. HostedEngine was paused upon complete restart of all services. I checked the situation with "gluster volume heal engine info" ... 3 split-brain connected nodes on .shard (15 files), all files were size = 0 on node1. 14 files I have recovered and aligned gfid from the replica, while I find a file size = 0 in all nodes. So the split-brain remains active. I would like to know how I can fix this and be able to recreate the segment with the correct size. Thanks in advance Version-Release number of selected component (if applicable):3.8.10 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.
@Vinayak Can you please help?
(In reply to Corrado Zabeo from comment #0) > Description of problem: > hi team, > my configuration is as follows: 3 replica gluster servers containing VM lvm > and VM hostedEngine 3.8.10 + 1 server running all VMs. > Due to a prolonged nighttime power failure and ups battery consumption the > system shut down, after starting looking at the logs I saw 2 boot in 3 > minutes, so I assume that the power was cut several times. > HostedEngine was paused upon complete restart of all services. > I checked the situation with "gluster volume heal engine info" ... 3 > split-brain connected nodes on .shard (15 files), all files were size = 0 on > node1. > 14 files I have recovered and aligned gfid from the replica, while I find a > file size = 0 in all nodes. So the split-brain remains active. > I would like to know how I can fix this and be able to recreate the segment > with the correct size. > Thanks in advance > > Version-Release number of selected component (if applicable):3.8.10 > > > How reproducible: > > > Steps to Reproduce: > 1. > 2. > 3. > > Actual results: > > > Expected results: > > > Additional info: hi, sorry for not replying earlier. I solved the split-brain problem in the following way: 1 - I identified the bricks at zero with the command "gluster volume heal engine info" and checked the differences with "getfattr -d -m. -E hex", in my case in /bricks/engine/brick/.shard 2 - I deleted the zero bricks in the .shard folder and the relative links in the .glusterfs folder 3 - the bricks have been automatically recreated 4 - I was left with one last problem brick 7013 was zero on all nodes, I proceeded to delete the brick and related links in the 3 nodes, they were automatically recreated and the split-brain disappeared The operating system restarted correctly, so fortunately the brick was empty. However, I don't understand why such an inconvenience happened. Below is the "volume heal" screen. Greetings [root@vmgluster01 zones]# gluster volume heal engine info Brick 192.170.254.3:/bricks/engine/brick /.shard - Is in split-brain /__DIRECT_IO_TEST__ /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7013 Status: Connected Number of entries: 3 Brick 192.170.254.4:/bricks/engine/brick /.shard - Is in split-brain /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7015 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7016 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7017 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7018 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7019 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7020 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7021 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7022 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7024 /__DIRECT_IO_TEST__ /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7013 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7023 Status: Connected Number of entries: 13 Brick 192.170.254.6:/bricks/engine/brick /.shard - Is in split-brain /__DIRECT_IO_TEST__ /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7013 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7015 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7016 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7017 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7018 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7019 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7020 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7021 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7022 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7023 /.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7024 Status: Connected Number of entries: 13
Moved to GitHub: https://github.com/oVirt/ovirt-hosted-engine-setup/issues/73