Red Hat Bugzilla – Bug 1388868
Numbers of unesseceary volumes created on endpoints automatic creation failure
Last modified: 2017-04-19 22:52:27 EDT
Description of problem: Setup heketi and create StorageClass for dynamic provisioning. On occasion of a failure in endpoints/service creation, lots of unesseceary volumes were provisioned. On the heketi server, disk space were quickly used up because the program was retrying the provision and plenty of volumes are being created. Version-Release number of selected component (if applicable): openshift v3.4.0.15+9c963ec kubernetes v1.4.0+776c994 etcd 3.1.0-alpha.1 How reproducible: Always Steps to Reproduce: 1. Setup heketi and create StorageClass. In my setup, I was using HOSTNAMEs for node.hostnames.storage in heketi topology. The right configuration should be IPs because this field is used to create endpoints by provisioner. 2. Create PVC to provision volumes, at this point endpoints, service, PV are not created, but volume got created on the storage server. 3. Leave it for a while, go to heketi server and list volumes Actual results: After step 3: Found lots of volumes being created, disk space got used up. Expected results: On provision failure, all to-be provisioned resources: volume, endpoints, service, PV should not be created. Additional info:
The fix for this issue (https://github.com/kubernetes/kubernetes/pull/35285) is in merge queue of upstream kubernetes. I will backport the patch to OCP as soon as its done.
merged upstream, waiting on OSE PR
(In reply to Bradley Childs from comment #2) > merged upstream, waiting on OSE PR I have filed https://github.com/openshift/origin/pull/11722
This has been merged into ose and is in OSE v3.4.0.22 or newer.
Verified on openshift v3.4.0.22+5c56720 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 1. Make endpoints creation fail: Failed to provision volume with StorageClass "glusterprovisioner": glusterfs: create volume err: failed to create endpoint/service <nil>. 2. Go to heketi server, list volumes: Found no volumes there, repeatedly list volumes, found there was one volume created but was then immediately deleted . Considering this is an edge scenario which only happens when there is a wrong heketi topology configuration, the above fix is acceptable. Mark this one as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066
(In reply to Jianwei Hou from comment #8) > Verified on > openshift v3.4.0.22+5c56720 > kubernetes v1.4.0+776c994 > etcd 3.1.0-rc.0 > > 1. Make endpoints creation fail: Failed to provision volume with StorageClass > "glusterprovisioner": glusterfs: create volume err: failed to create > endpoint/service <nil>. > > 2. Go to heketi server, list volumes: Found no volumes there, repeatedly > list volumes, found there was one volume created but was then immediately > deleted . > > Considering this is an edge scenario which only happens when there is a > wrong heketi topology configuration, the above fix is acceptable. Mark this > one as verified. Hi, I am experiencing same behavior (as originaly stated) on: OpenShift Master: v3.5.0.53 Kubernetes Master: v1.5.2+43a9be4 I have heketi server addressed by storageclass (volumetype=replicate:3) After create -f pvc.yaml it switches to Pending state. There are several volumes created, none of them is connected to pvc and all space is used up. In description I encountered messages like: - Token used before issued (~ in heketi.json I added exact iat and exp to prevent this) - No space (~ no more space for next volume causes this whole bug) - failed to create endpoint/service <nil> (~ i think IP vs glusternodename causes this. in etc/hosts it is unable to recognize node when creating endpoint, why?) - Id not found (~it was overloaded?) - Host 'IP' is not in 'Peer in Cluster' state (~ i see it is: name+ IP both are there) My questions: what was your "wrong" topology configuration? Does it make sence to try this on version not yet containing fix: https://github.com/kubernetes/kubernetes/commit/fc62687b2c4924c9f1b95c7d1314787bc7b7cada PS i tried replica:2 and onenode solution ... it creates and deletes volume one by one... but with secret it uses up all space (volumes persist but not bound)
@penehyba I had my glusterfs hosted on EC2, I found by using hostname(in topology.json, node.hostnames.storage) my endpoints could not be created when lots of volumes were created. They quickly used up the space. However after the fix even if endpoints were not created, the volume should have been immediately deleted. After I replaced hostnames with private_dns_name, the endpoints could be created and I did not see this issue again. By the time I reported the bug, volumetype parameter in storageclass was not supported yet.