1578658 – volume creation is failing when first node is tagged as arbiter:required

Bug 1578658 - volume creation is failing when first node is tagged as arbiter:required

Summary: volume creation is failing when first node is tagged as arbiter:required

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	heketi
Sub Component:
Version:	rhgs-3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	CNS 3.10
Assignee:	Michael Adam
QA Contact:	Nitin Goyal
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1568862
TreeView+	depends on / blocked

Reported:	2018-05-16 06:13 UTC by Nitin Goyal
Modified:	2018-09-12 09:23 UTC (History)
CC List:	10 users (show)
Fixed In Version:	heketi-6.0.0-14.el7rhgs
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-09-12 09:22:13 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
heketi logs (856.75 KB, text/plain) 2018-05-17 10:35 UTC, Nitin Goyal	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHEA-2018:2686	0	None	None	None	2018-09-12 09:23:23 UTC

Description Nitin Goyal 2018-05-16 06:13:15 UTC

Description of problem:
Volume creation is failing when first node is tagged as required and there is no tags on other nodes. It is failing 4 times out of 50 times.

There is two scenarios where it is failing.

First scenario:

N1-r      N2        N3
#200gb    #200gb    #200gb
#200gb    #200gb    #200gb
#200gb    #200gb    #200gb

Second scenerio:

N1          N2        N3
#200gb-r    #200gb    #200gb
#200gb-r    #200gb    #200gb
#200gb-r    #200gb    #200gb

N stands for nodes.
# stands for devices.
r stands for arbiter:required.

Version-Release number of selected component (if applicable):
6.0.0-12

How reproducible:

Steps to Reproduce:
$ for i in {1..50}; do heketi-cli volume create --size=2  --gluster-volume-options='user.heketi.arbiter true' ; done

Actual results:
Error: Unable to execute command on glusterfs-storage-qlfbq: volume create: vol27: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Use 'force' at the end of the command if you want to override this behavior.

Expected results:
It should create volume every time.

Comment 3 Michael Adam 2018-05-16 09:52:35 UTC

Do you have heketi logs?

If this is specific for arbiter, it's not a regression, since this is a new feature...

Comment 4 Michael Adam 2018-05-16 09:54:56 UTC

This is likely fixed by https://github.com/heketi/heketi/pull/1182 already and part of the same root cause as other BZs raised.

Comment 5 Nitin Goyal 2018-05-16 10:39:46 UTC

(In reply to Michael Adam from comment #3)
> Do you have heketi logs?
I will create heketi logs

> If this is specific for arbiter, it's not a regression, since this is a new
> feature...

It was working fine in previous build (6.0.0-11)

Comment 6 Nitin Goyal 2018-05-16 11:35:03 UTC

Link to heketi logs.

http://rhsqe-repo.lab.eng.blr.redhat.com/cns/logs/1578658/

Comment 10 Nitin Goyal 2018-05-17 10:35:02 UTC

It is still not working. sometimes it is creating volumes and sometimes it is not creating volumes.

It is failing with error:
Error: Failed to allocate new volume: No space.

Comment 11 Nitin Goyal 2018-05-17 10:35:37 UTC

Created attachment 1437882 [details]
heketi logs

Comment 12 Nitin Goyal 2018-05-17 10:48:03 UTC

[root@dhcp47-64 home]# for i in {11..20}; do ./volume_create.sh $i 4 ; done


Error: Failed to allocate new volume: No space




Error: Failed to allocate new volume: No space




Error: Failed to allocate new volume: No space




Name: vol14
Size: 4
Volume Id: 37d3640c529c7efe94e7c89137f10e35
Cluster Id: 4536dcde74709294cf36201467f45812
Mount: 10.70.46.73:vol14
Mount Options: backup-volfile-servers=10.70.46.184,10.70.46.80,10.70.46.148,10.70.46.152
Block: false
Free Size: 0
Block Volumes: []
Durability Type: replicate
Distributed+Replica: 3




Error: Failed to allocate new volume: No space




Name: vol16
Size: 4
Volume Id: 2ba7957450be4457763b27c299542449
Cluster Id: 4536dcde74709294cf36201467f45812
Mount: 10.70.46.73:vol16
Mount Options: backup-volfile-servers=10.70.46.184,10.70.46.80,10.70.46.148,10.70.46.152
Block: false
Free Size: 0
Block Volumes: []
Durability Type: replicate
Distributed+Replica: 3




Error: Failed to allocate new volume: No space




Error: Failed to allocate new volume: No space




Error: Failed to allocate new volume: No space




Error: Failed to allocate new volume: No space

Comment 13 Nitin Goyal 2018-05-17 10:49:47 UTC

This is my "volume_create.sh" script 

echo -e "\n"
heketi-cli volume create --name=vol$1 --size=$2  --gluster-volume-options='user.heketi.arbiter true'
echo -e "\n"

Comment 14 John Mulligan 2018-05-17 14:36:44 UTC

Which scenario are you using when this fails? First, second or both?

Is the cluster empty of volumes when you start the loop? The starting number of 11 makes me suspect that the cluster may not be empty.

Am I correct in thinking you are not seeing error messages that contain the text "Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Use 'force' at the end of the command if you want to override this behavior" any more?

Comment 15 John Mulligan 2018-05-17 17:14:30 UTC

I've been digging into this some more today. I noticed that the tagging pattern is essentially node1(arbiter:required) node2(*) node2(*) in both scenarios.

This can be translated to:
node1(no data bricks) node2(any brick type) node3(any brick type)

Thus when heketi picks a brick for the arbiter volume it can land on any node. However it must then place two data bricks. It can only place them on node 2 or node 3. If heketi picks nodes 2 or 3 for the arbiter brick that leaves only one valid node for two data bricks, and the placement fails.

I'm currently looking into adding (another) retry as a workaround but this can not guarantee a successful placement. So I'm skeptical of this approach but am still experimenting with it.

The important thing for this bug is that we're not longer attempting to place >1 brick on the same node and triggering that error from the gluster command.

We may need to resort to better documentation around the nature of tagging and placement with regards to arbiter.

Comment 16 Michael Adam 2018-05-17 22:41:59 UTC

Let me phrase it this way:


The setup [n1(arbiter brick only) n2(any brick) n3(any brick)]
can be seen as an *invalid* config in the sense that placements
with arbiter bricks on n2 or n3 will fail.


As John said, we might do retries.

For the time being, let's say the only valid configs for a 3-node cluster
are of the following two forms:

1. (data|any) (data|any) (data|any)

2. (arbiter)  (data)     (data)


Cheers - Michael

Comment 17 Nitin Goyal 2018-05-18 05:39:22 UTC

(In reply to John Mulligan from comment #14)
> Which scenario are you using when this fails? First, second or both?
I was using both the scenarios. It is failing in both scenarios.

> Is the cluster empty of volumes when you start the loop? The starting number
> of 11 makes me suspect that the cluster may not be empty.
it was empty. 
 
> Am I correct in thinking you are not seeing error messages that contain the
> text "Multiple bricks of a replicate volume are present on the same server.
> This setup is not optimal. Use 'force' at the end of the command if you want
> to override this behavior" any more?
No i am not seeing that error.

Comment 18 Michael Adam 2018-05-18 09:25:23 UTC

(In reply to Nitin Goyal from comment #17)
> (In reply to John Mulligan from comment #14)
> > Which scenario are you using when this fails? First, second or both?
> I was using both the scenarios. It is failing in both scenarios.

Please see comment C#16: This is kind of by design and
both are to be considered invalid configs currently...



> > Is the cluster empty of volumes when you start the loop? The starting number
> > of 11 makes me suspect that the cluster may not be empty.
> it was empty. 
>  
> > Am I correct in thinking you are not seeing error messages that contain the
> > text "Multiple bricks of a replicate volume are present on the same server.
> > This setup is not optimal. Use 'force' at the end of the command if you want
> > to override this behavior" any more?
> No i am not seeing that error.

Comment 19 John Mulligan 2018-05-18 18:33:49 UTC

Michael, Talur and I discussed this in the context of usability and user expectations. We came up with a basic design that grants "priority" to devices that only take arbiter or data bricks when placing arbiter or data bricks respectively.

This is a tweak to the design of arbiter but one that we think will make the system more usable. In short, if arbiter:required is applied to some devices but arbiter:disabled is not applied anywhere, it will work better because arbiter bricks "will prefer" being placed on the arbiter:required devices.

I'm developing this feature upstream now.

Comment 20 John Mulligan 2018-05-18 19:42:55 UTC

https://github.com/heketi/heketi/pull/1191

Comment 22 errata-xmlrpc 2018-09-12 09:22:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2686

Note You need to log in before you can comment on or make changes to this bug.