Bug 1572561 - Heketi is Failed to allocate new volume even though we have space for arbiter, when one brick is supported and all other bricks are disabled.
Summary: Heketi is Failed to allocate new volume even though we have space for arbiter...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: heketi
Version: rhgs-3.3
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: CNS 3.10
Assignee: John Mulligan
QA Contact: Nitin Goyal
URL:
Whiteboard:
Depends On:
Blocks: 1568862
TreeView+ depends on / blocked
 
Reported: 2018-04-27 10:28 UTC by Nitin Goyal
Modified: 2018-09-12 09:23 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, the arbiter brick was created after data bricks and could fail to be placed if the device available for the arbiter brick had already got a data brick. The fix ensures arbiter brick placement is prioritised, to increase the likelihood that it is placed on the most appropriate device.
Clone Of:
Environment:
Last Closed: 2018-09-12 09:22:12 UTC
Embargoed:


Attachments (Terms of Use)
node info and device info (5.69 KB, text/plain)
2018-04-27 10:28 UTC, Nitin Goyal
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github heketi heketi pull 1172 0 None None None 2018-05-09 15:48:55 UTC
Red Hat Product Errata RHEA-2018:2686 0 None None None 2018-09-12 09:23:14 UTC

Description Nitin Goyal 2018-04-27 10:28:49 UTC
Created attachment 1427611 [details]
node info and device info

Description of problem: I have 3 nodes. All are in diff zones. All the devices on node1 and node2 are tagged as disabled. All except one device of node3 is also marked as disabled. The other device is marked as supported on node3. Volume creation for arbiter volume was done for 10 times. The volume creation was successful only 3 times out of 10.

Version-Release number of selected component (if applicable): 6.0.0-11

How reproducible:7/10

Steps to Reproduce:
1. mark all devices of node1 to disabled
2. mark all devices of node2 to disabled
3. mark all devices of node3 to disabled except one device.
4. mark that one device which is left on node3 to supported.
5. create arbiter volume.

Actual results: volume is not created 7 out of 10 times.

Expected results: volume should be created 10 out of 10 times.

Comment 2 John Mulligan 2018-05-07 15:04:35 UTC
Please list the exact commands that you used.

Comment 3 Nitin Goyal 2018-05-08 06:10:55 UTC
See this scenario:
N1          N2          N3
#100gb-d    #100gb-d    #100gb-d
#100gb-d    #100gb-d    #100gb-d
#100gb-d    #100gb-d    #100gb-s

N stans for node
# stands for device
d stands for arbiter disabled
s stands for arbiter supported

commands which i used:

1. Give the required tag to drives of nodes.
# heketi-cli device settags device_id arbiter:disabled

2. Give the supported tag to the last device of third node.
# heketi-cli device settags device_id arbiter:supported

3. Create volume.
# heketi-cli volume create --size=10  --gluster-volume-options='user.heketi.arbiter true'

4. Try to create 10 volumes atleast.

5. Check that it is actually working according to tags.
# heketi-cli topology info
# gluster v info
Comment created

You will see that some times it will create volumes and some times it will not create volumes.

Comment 4 Nitin Goyal 2018-05-08 06:14:06 UTC
In these two scenario it is also create volumes 4 times out of 10

N1          N2          N3
#100gb-d    #100gb-d    #100gb-d
#100gb-d    #100gb-d    #100gb-d
#100gb-d    #100gb-d    #100gb

-------------------------------------

N1          N2          N3
#100gb-d    #100gb-d    #100gb-d
#100gb-d    #100gb-d    #100gb-d
#100gb-d    #100gb-d    #100gb-r

Comment 5 John Mulligan 2018-05-08 14:54:10 UTC
I've looked into this a bit and think I understand what you're running into. 

There's a random element to how the devices are selected for bricks and this can lead to placements where all the constraints are impossible to satisfy. This is true of regular volumes as well as arbiter. However, it gets a bit worse on arbiter when you start tagging the devices as you're adding additional restrictions where bricks can go.

What triggers the problem is when it picks devices in a way that it can't satisfy both the constraint that no bricks in a brick set share a node and the free size and tag constraints.

This can't be made 100% reliable in the current version of Heketi, as this is baked in at the moment. However, I will look into trying to make the likelihood of successful placement higher.

Comment 10 Anjana KD 2018-08-30 13:23:25 UTC
Updated doc text in the Doc Text field. Please review for technical accuracy.

Comment 13 errata-xmlrpc 2018-09-12 09:22:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2686


Note You need to log in before you can comment on or make changes to this bug.