Description of problem: Volume creation is failing when first node is tagged as required and there is no tags on other nodes. It is failing 4 times out of 50 times. There is two scenarios where it is failing. First scenario: N1-r N2 N3 #200gb #200gb #200gb #200gb #200gb #200gb #200gb #200gb #200gb Second scenerio: N1 N2 N3 #200gb-r #200gb #200gb #200gb-r #200gb #200gb #200gb-r #200gb #200gb N stands for nodes. # stands for devices. r stands for arbiter:required. Version-Release number of selected component (if applicable): 6.0.0-12 How reproducible: Steps to Reproduce: $ for i in {1..50}; do heketi-cli volume create --size=2 --gluster-volume-options='user.heketi.arbiter true' ; done Actual results: Error: Unable to execute command on glusterfs-storage-qlfbq: volume create: vol27: failed: Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Use 'force' at the end of the command if you want to override this behavior. Expected results: It should create volume every time.
Do you have heketi logs? If this is specific for arbiter, it's not a regression, since this is a new feature...
This is likely fixed by https://github.com/heketi/heketi/pull/1182 already and part of the same root cause as other BZs raised.
(In reply to Michael Adam from comment #3) > Do you have heketi logs? I will create heketi logs > If this is specific for arbiter, it's not a regression, since this is a new > feature... It was working fine in previous build (6.0.0-11)
Link to heketi logs. http://rhsqe-repo.lab.eng.blr.redhat.com/cns/logs/1578658/
It is still not working. sometimes it is creating volumes and sometimes it is not creating volumes. It is failing with error: Error: Failed to allocate new volume: No space.
Created attachment 1437882 [details] heketi logs
[root@dhcp47-64 home]# for i in {11..20}; do ./volume_create.sh $i 4 ; done Error: Failed to allocate new volume: No space Error: Failed to allocate new volume: No space Error: Failed to allocate new volume: No space Name: vol14 Size: 4 Volume Id: 37d3640c529c7efe94e7c89137f10e35 Cluster Id: 4536dcde74709294cf36201467f45812 Mount: 10.70.46.73:vol14 Mount Options: backup-volfile-servers=10.70.46.184,10.70.46.80,10.70.46.148,10.70.46.152 Block: false Free Size: 0 Block Volumes: [] Durability Type: replicate Distributed+Replica: 3 Error: Failed to allocate new volume: No space Name: vol16 Size: 4 Volume Id: 2ba7957450be4457763b27c299542449 Cluster Id: 4536dcde74709294cf36201467f45812 Mount: 10.70.46.73:vol16 Mount Options: backup-volfile-servers=10.70.46.184,10.70.46.80,10.70.46.148,10.70.46.152 Block: false Free Size: 0 Block Volumes: [] Durability Type: replicate Distributed+Replica: 3 Error: Failed to allocate new volume: No space Error: Failed to allocate new volume: No space Error: Failed to allocate new volume: No space Error: Failed to allocate new volume: No space
This is my "volume_create.sh" script echo -e "\n" heketi-cli volume create --name=vol$1 --size=$2 --gluster-volume-options='user.heketi.arbiter true' echo -e "\n"
Which scenario are you using when this fails? First, second or both? Is the cluster empty of volumes when you start the loop? The starting number of 11 makes me suspect that the cluster may not be empty. Am I correct in thinking you are not seeing error messages that contain the text "Multiple bricks of a replicate volume are present on the same server. This setup is not optimal. Use 'force' at the end of the command if you want to override this behavior" any more?
I've been digging into this some more today. I noticed that the tagging pattern is essentially node1(arbiter:required) node2(*) node2(*) in both scenarios. This can be translated to: node1(no data bricks) node2(any brick type) node3(any brick type) Thus when heketi picks a brick for the arbiter volume it can land on any node. However it must then place two data bricks. It can only place them on node 2 or node 3. If heketi picks nodes 2 or 3 for the arbiter brick that leaves only one valid node for two data bricks, and the placement fails. I'm currently looking into adding (another) retry as a workaround but this can not guarantee a successful placement. So I'm skeptical of this approach but am still experimenting with it. The important thing for this bug is that we're not longer attempting to place >1 brick on the same node and triggering that error from the gluster command. We may need to resort to better documentation around the nature of tagging and placement with regards to arbiter.
Let me phrase it this way: The setup [n1(arbiter brick only) n2(any brick) n3(any brick)] can be seen as an *invalid* config in the sense that placements with arbiter bricks on n2 or n3 will fail. As John said, we might do retries. For the time being, let's say the only valid configs for a 3-node cluster are of the following two forms: 1. (data|any) (data|any) (data|any) 2. (arbiter) (data) (data) Cheers - Michael
(In reply to John Mulligan from comment #14) > Which scenario are you using when this fails? First, second or both? I was using both the scenarios. It is failing in both scenarios. > Is the cluster empty of volumes when you start the loop? The starting number > of 11 makes me suspect that the cluster may not be empty. it was empty. > Am I correct in thinking you are not seeing error messages that contain the > text "Multiple bricks of a replicate volume are present on the same server. > This setup is not optimal. Use 'force' at the end of the command if you want > to override this behavior" any more? No i am not seeing that error.
(In reply to Nitin Goyal from comment #17) > (In reply to John Mulligan from comment #14) > > Which scenario are you using when this fails? First, second or both? > I was using both the scenarios. It is failing in both scenarios. Please see comment C#16: This is kind of by design and both are to be considered invalid configs currently... > > Is the cluster empty of volumes when you start the loop? The starting number > > of 11 makes me suspect that the cluster may not be empty. > it was empty. > > > Am I correct in thinking you are not seeing error messages that contain the > > text "Multiple bricks of a replicate volume are present on the same server. > > This setup is not optimal. Use 'force' at the end of the command if you want > > to override this behavior" any more? > No i am not seeing that error.
Michael, Talur and I discussed this in the context of usability and user expectations. We came up with a basic design that grants "priority" to devices that only take arbiter or data bricks when placing arbiter or data bricks respectively. This is a tweak to the design of arbiter but one that we think will make the system more usable. In short, if arbiter:required is applied to some devices but arbiter:disabled is not applied anywhere, it will work better because arbiter bricks "will prefer" being placed on the arbiter:required devices. I'm developing this feature upstream now.
https://github.com/heketi/heketi/pull/1191
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2686