Bug 1784402 - storage.reserve ignored by self-heal so that bricks are 100% full
Summary: storage.reserve ignored by self-heal so that bricks are 100% full
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: GlusterFS
Classification: Community
Component: selfheal
Version: 5
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-12-17 11:16 UTC by david.spisla
Modified: 2023-09-14 05:48 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-12 12:19:08 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Gluster vo info and status, df -hT, heal info, logs of glfsheal and all related bricks (14.09 MB, application/gzip)
2019-12-17 11:16 UTC, david.spisla
no flags Details

Description david.spisla 2019-12-17 11:16:35 UTC
Created attachment 1645849 [details]
Gluster vo info and status, df -hT, heal info, logs of glfsheal and all related bricks

Description of problem:
Setup: 3-Node VMWare Cluster (2 Storage Nodes and 1 Arbiter Node), Distribute-Replica 2 Volume with 1 Arbiter brick per Replica-Tupel (see attached file for the detail configuration).

Version-Release number of selected component (if applicable):
Gluster FS v5.10

How reproducible:
Steps to Reproduce:
1. Mount volume from a dedicated client machine
2. Disable network of node 2
3. Write to node 1 in the volume until it is full. The storage.reserve limit of the local bricks should take effect and the bricks should therefore be +-1% empty.
4. Disable network of node 1
5. Enable network of node 2
6. Write to node 2 in the same volume, but write the data into another subfolder or use completely different data. Otherwise one would get an Split-brain error which is not the issue here. Also write data until the bricks reaches the storage.reserve limit.
7. Now the volume is filled up with twice the amount of data
8. Enable network of node 1

Actual results:
storage.reserve was ignored and all bricks are 100% full within a few seconds. All brick processes died. Volume not mountable and can not trigger heal.

Expected results:
self-heal process should be blocked by storage.reserve and brick processes still running and volume is accessible.

Additional info:
See attached file

The above scenario was not only reproduced on a VM Cluster. We could also monitor it on a real HW Cluster

Comment 1 sankarshan 2019-12-23 04:35:51 UTC
Question for the assigned maintainer/developer - (1) can this be reproduced in a newer release (2) is this something that was known for this specific release as reported? Please review (2) in terms of how, if at all, a recovery sequence can be made available so as to not cause this space exhaustion issue.

Comment 2 Ravishankar N 2019-12-23 11:09:17 UTC
I think this behaviour is peculiar to arbiter volumes (as opposed to replica 3) as arbiter does not store data. If it had been a normal replica 3, then step-6 in the description would have failed because node 3 would have been full. Mohit, what is your take on the bug?

Comment 3 Mohit Agrawal 2019-12-23 12:15:28 UTC
storage.reserve restriction check is applicable only for an external client not for an internal client.
I think it is an internal client responsibility before writing the data to check disk space.

Comment 4 Ravishankar N 2019-12-24 04:46:03 UTC
That would be a leak in abstraction for an option that is per brick specific. It looks like you added the check for internal clients via BZ 1506083 but I can't find any specific problem in the BZ. One problem is that if we subject writes from self-heal also to the same check, then with the case described in this bug, heals would never be able to complete. But that is not any different than the case where this option is *not* enabled but the I/O was pumped till the disk was full. So maybe we should allow internal clients as well?

Comment 5 Ravishankar N 2019-12-24 04:49:24 UTC
(In reply to Ravishankar N from comment #4)
> So maybe we should allow internal clients as well?
Sorry  I mean we should not allow internal clients as well.

Comment 6 Mohit Agrawal 2019-12-24 05:12:13 UTC
We can't ignore fops for internal client otherwise there was no requirement to implement this feature.
We restricted for an internal client because the feature was primarily implemented for rebalance daemon.
At the time of adding a brick rebalance daemon needs some space at the backend for rebalancing the data so
we put a check to ignore the internal client.

Comment 7 Worker Ant 2020-03-12 12:19:08 UTC
This bug is moved to https://github.com/gluster/glusterfs/issues/869, and will be tracked there from now on. Visit GitHub issues URL for further details

Comment 8 Red Hat Bugzilla 2023-09-14 05:48:52 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.