Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1572561 - Heketi is Failed to allocate new volume even though we have space for arbiter, when one brick is supported and all other bricks are disabled.
Heketi is Failed to allocate new volume even though we have space for arbiter...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: heketi (Show other bugs)
3.3
Unspecified Unspecified
high Severity high
: ---
: CNS 3.10
Assigned To: John Mulligan
Nitin Goyal
:
Depends On:
Blocks: 1568862
  Show dependency treegraph
 
Reported: 2018-04-27 06:28 EDT by Nitin Goyal
Modified: 2018-09-12 05:23 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, the arbiter brick was created after data bricks and could fail to be placed if the device available for the arbiter brick had already got a data brick. The fix ensures arbiter brick placement is prioritised, to increase the likelihood that it is placed on the most appropriate device.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-09-12 05:22:12 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
node info and device info (5.69 KB, text/plain)
2018-04-27 06:28 EDT, Nitin Goyal
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
Github heketi/heketi/pull/1172 None None None 2018-05-09 11:48 EDT
Red Hat Product Errata RHEA-2018:2686 None None None 2018-09-12 05:23 EDT

  None (edit)
Description Nitin Goyal 2018-04-27 06:28:49 EDT
Created attachment 1427611 [details]
node info and device info

Description of problem: I have 3 nodes. All are in diff zones. All the devices on node1 and node2 are tagged as disabled. All except one device of node3 is also marked as disabled. The other device is marked as supported on node3. Volume creation for arbiter volume was done for 10 times. The volume creation was successful only 3 times out of 10.

Version-Release number of selected component (if applicable): 6.0.0-11

How reproducible:7/10

Steps to Reproduce:
1. mark all devices of node1 to disabled
2. mark all devices of node2 to disabled
3. mark all devices of node3 to disabled except one device.
4. mark that one device which is left on node3 to supported.
5. create arbiter volume.

Actual results: volume is not created 7 out of 10 times.

Expected results: volume should be created 10 out of 10 times.
Comment 2 John Mulligan 2018-05-07 11:04:35 EDT
Please list the exact commands that you used.
Comment 3 Nitin Goyal 2018-05-08 02:10:55 EDT
See this scenario:
N1          N2          N3
#100gb-d    #100gb-d    #100gb-d
#100gb-d    #100gb-d    #100gb-d
#100gb-d    #100gb-d    #100gb-s

N stans for node
# stands for device
d stands for arbiter disabled
s stands for arbiter supported

commands which i used:

1. Give the required tag to drives of nodes.
# heketi-cli device settags device_id arbiter:disabled

2. Give the supported tag to the last device of third node.
# heketi-cli device settags device_id arbiter:supported

3. Create volume.
# heketi-cli volume create --size=10  --gluster-volume-options='user.heketi.arbiter true'

4. Try to create 10 volumes atleast.

5. Check that it is actually working according to tags.
# heketi-cli topology info
# gluster v info
Comment created

You will see that some times it will create volumes and some times it will not create volumes.
Comment 4 Nitin Goyal 2018-05-08 02:14:06 EDT
In these two scenario it is also create volumes 4 times out of 10

N1          N2          N3
#100gb-d    #100gb-d    #100gb-d
#100gb-d    #100gb-d    #100gb-d
#100gb-d    #100gb-d    #100gb

-------------------------------------

N1          N2          N3
#100gb-d    #100gb-d    #100gb-d
#100gb-d    #100gb-d    #100gb-d
#100gb-d    #100gb-d    #100gb-r
Comment 5 John Mulligan 2018-05-08 10:54:10 EDT
I've looked into this a bit and think I understand what you're running into. 

There's a random element to how the devices are selected for bricks and this can lead to placements where all the constraints are impossible to satisfy. This is true of regular volumes as well as arbiter. However, it gets a bit worse on arbiter when you start tagging the devices as you're adding additional restrictions where bricks can go.

What triggers the problem is when it picks devices in a way that it can't satisfy both the constraint that no bricks in a brick set share a node and the free size and tag constraints.

This can't be made 100% reliable in the current version of Heketi, as this is baked in at the moment. However, I will look into trying to make the likelihood of successful placement higher.
Comment 10 Anjana 2018-08-30 09:23:25 EDT
Updated doc text in the Doc Text field. Please review for technical accuracy.
Comment 13 errata-xmlrpc 2018-09-12 05:22:12 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2686

Note You need to log in before you can comment on or make changes to this bug.