Bug 2096267 - HostedEngine .shard file size=0 in all nodes [NEEDINFO]
Summary: HostedEngine .shard file size=0 in all nodes
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General
Version: ---
Hardware: x86_64
OS: Linux
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Gobinda Das
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-06-13 12:06 UTC by Corrado Zabeo
Modified: 2023-01-16 10:13 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-01-16 10:13:36 UTC
oVirt Team: Gluster
Embargoed:
godas: needinfo? (vharihar)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-hosted-engine-setup issues 73 0 None open Bug 2096267 - HostedEngine .shard file size=0 in all nodes 2023-01-16 10:13:35 UTC
Red Hat Issue Tracker RHV-46398 0 None None None 2022-06-13 12:15:31 UTC

Description Corrado Zabeo 2022-06-13 12:06:08 UTC
Description of problem:
hi team,
my configuration is as follows: 3 replica gluster servers containing VM lvm and VM hostedEngine 3.8.10 + 1 server running all VMs.
Due to a prolonged nighttime power failure and ups battery consumption the system shut down, after starting looking at the logs I saw 2 boot in 3 minutes, so I assume that the power was cut several times.
HostedEngine was paused upon complete restart of all services.
I checked the situation with "gluster volume heal engine info" ... 3 split-brain connected nodes on .shard (15 files), all files were size = 0 on node1.
14 files I have recovered and aligned gfid from the replica, while I find a file size = 0 in all nodes. So the split-brain remains active.
I would like to know how I can fix this and be able to recreate the segment with the correct size.
Thanks in advance

Version-Release number of selected component (if applicable):3.8.10


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 RHEL Program Management 2022-06-16 11:09:14 UTC
The documentation text flag should only be set after 'doc text' field is provided. Please provide the documentation text and set the flag to '?' again.

Comment 2 Gobinda Das 2022-06-16 13:59:30 UTC
@Vinayak Can you please help?

Comment 3 Corrado Zabeo 2022-06-30 10:58:32 UTC
(In reply to Corrado Zabeo from comment #0)
> Description of problem:
> hi team,
> my configuration is as follows: 3 replica gluster servers containing VM lvm
> and VM hostedEngine 3.8.10 + 1 server running all VMs.
> Due to a prolonged nighttime power failure and ups battery consumption the
> system shut down, after starting looking at the logs I saw 2 boot in 3
> minutes, so I assume that the power was cut several times.
> HostedEngine was paused upon complete restart of all services.
> I checked the situation with "gluster volume heal engine info" ... 3
> split-brain connected nodes on .shard (15 files), all files were size = 0 on
> node1.
> 14 files I have recovered and aligned gfid from the replica, while I find a
> file size = 0 in all nodes. So the split-brain remains active.
> I would like to know how I can fix this and be able to recreate the segment
> with the correct size.
> Thanks in advance
> 
> Version-Release number of selected component (if applicable):3.8.10
> 
> 
> How reproducible:
> 
> 
> Steps to Reproduce:
> 1.
> 2.
> 3.
> 
> Actual results:
> 
> 
> Expected results:
> 
> 
> Additional info:

hi, sorry for not replying earlier.
I solved the split-brain problem in the following way:
1 - I identified the bricks at zero with the command "gluster volume heal engine info" and checked the differences with "getfattr -d -m. -E hex", in my case in /bricks/engine/brick/.shard
2 - I deleted the zero bricks in the .shard folder and the relative links in the .glusterfs folder
3 - the bricks have been automatically recreated
4 - I was left with one last problem brick 7013 was zero on all nodes, I proceeded to delete the brick and related links in the 3 nodes, they were automatically recreated and the split-brain disappeared
The operating system restarted correctly, so fortunately the brick was empty.
However, I don't understand why such an inconvenience happened.
Below is the "volume heal" screen.
Greetings


[root@vmgluster01 zones]# gluster volume heal engine info
Brick 192.170.254.3:/bricks/engine/brick
/.shard - Is in split-brain

/__DIRECT_IO_TEST__ 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7013 
Status: Connected
Number of entries: 3

Brick 192.170.254.4:/bricks/engine/brick
/.shard - Is in split-brain

/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7015 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7016 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7017 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7018 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7019 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7020 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7021 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7022 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7024 
/__DIRECT_IO_TEST__ 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7013 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7023 
Status: Connected
Number of entries: 13

Brick 192.170.254.6:/bricks/engine/brick
/.shard - Is in split-brain

/__DIRECT_IO_TEST__ 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7013 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7015 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7016 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7017 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7018 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7019 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7020 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7021 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7022 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7023 
/.shard/6a48d9f7-8aaa-4763-84ef-98adee5781d9.7024 
Status: Connected
Number of entries: 13

Comment 5 Sandro Bonazzola 2023-01-16 10:13:36 UTC
Moved to GitHub: https://github.com/oVirt/ovirt-hosted-engine-setup/issues/73


Note You need to log in before you can comment on or make changes to this bug.