Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1388868 - Numbers of unesseceary volumes created on endpoints automatic creation failure
Numbers of unesseceary volumes created on endpoints automatic creation failure
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Storage (Show other bugs)
3.4.0
Unspecified Unspecified
urgent Severity urgent
: ---
: ---
Assigned To: hchen
Jianwei Hou
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2016-10-26 06:40 EDT by Jianwei Hou
Modified: 2017-04-19 22:52 EDT (History)
7 users (show)

See Also:
Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-01-18 07:46:23 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0066 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4 RPM Release Advisory 2017-01-18 12:23:26 EST

  None (edit)
Description Jianwei Hou 2016-10-26 06:40:16 EDT
Description of problem:
Setup heketi and create StorageClass for dynamic provisioning. On occasion of a failure in endpoints/service creation, lots of unesseceary volumes were provisioned. On the heketi server, disk space were quickly used up because the program was retrying the provision and plenty of volumes are being created. 

Version-Release number of selected component (if applicable):
openshift v3.4.0.15+9c963ec
kubernetes v1.4.0+776c994
etcd 3.1.0-alpha.1

How reproducible:
Always

Steps to Reproduce:
1. Setup heketi and create StorageClass. In my setup, I was using HOSTNAMEs for node.hostnames.storage in heketi topology. The right configuration should be IPs because this field is used to create endpoints by provisioner.
2. Create PVC to provision volumes, at this point endpoints, service, PV are not created, but volume got created on the storage server.
3. Leave it for a while, go to heketi server and list volumes

Actual results:
After step 3:
Found lots of volumes being created, disk space got used up.

Expected results:
On provision failure, all to-be provisioned resources: volume, endpoints, service, PV should not be created. 

Additional info:
Comment 1 Humble Chirammal 2016-10-26 08:47:48 EDT
The fix for this issue (https://github.com/kubernetes/kubernetes/pull/35285)  is in merge queue of  upstream kubernetes. I will backport the patch to OCP as soon as its done.
Comment 2 Bradley Childs 2016-10-31 14:32:46 EDT
merged upstream, waiting on OSE PR
Comment 3 Humble Chirammal 2016-11-02 05:17:25 EDT
(In reply to Bradley Childs from comment #2)
> merged upstream, waiting on OSE PR

I have filed https://github.com/openshift/origin/pull/11722
Comment 6 Troy Dawson 2016-11-04 14:45:42 EDT
This has been merged into ose and is in OSE v3.4.0.22 or newer.
Comment 8 Jianwei Hou 2016-11-07 01:06:50 EST
Verified on 
openshift v3.4.0.22+5c56720
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

1. Make endpoints creation fail: Failed to provision volume with StorageClass
"glusterprovisioner": glusterfs: create volume err: failed to create endpoint/service <nil>.

2. Go to heketi server, list volumes: Found no volumes there, repeatedly list volumes, found there was one volume created but was then immediately deleted .

Considering this is an edge scenario which only happens when there is a wrong heketi topology configuration, the above fix is acceptable. Mark this one as verified.
Comment 10 errata-xmlrpc 2017-01-18 07:46:23 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0066
Comment 11 penehyba 2017-04-19 09:40:02 EDT
(In reply to Jianwei Hou from comment #8)
> Verified on 
> openshift v3.4.0.22+5c56720
> kubernetes v1.4.0+776c994
> etcd 3.1.0-rc.0
> 
> 1. Make endpoints creation fail: Failed to provision volume with StorageClass
> "glusterprovisioner": glusterfs: create volume err: failed to create
> endpoint/service <nil>.
> 
> 2. Go to heketi server, list volumes: Found no volumes there, repeatedly
> list volumes, found there was one volume created but was then immediately
> deleted .
> 
> Considering this is an edge scenario which only happens when there is a
> wrong heketi topology configuration, the above fix is acceptable. Mark this
> one as verified.

Hi, I am experiencing same behavior (as originaly stated) on:
OpenShift Master:
    v3.5.0.53
Kubernetes Master:
    v1.5.2+43a9be4 

I have heketi server addressed by storageclass (volumetype=replicate:3)
After create -f pvc.yaml it switches to Pending state.
There are several volumes created, none of them is connected to pvc and all space is used up.

In description I encountered messages like:
- Token used before issued (~ in heketi.json I added exact iat and exp to prevent this)

- No space (~ no more space for next volume causes this whole bug)

- failed to create endpoint/service <nil> (~ i think IP vs glusternodename causes this. in etc/hosts it is unable to recognize node when creating endpoint, why?)

- Id not found (~it was overloaded?)

- Host 'IP' is not in 'Peer in Cluster' state (~ i see it is: name+ IP both are there)

My questions: what was your "wrong" topology configuration? 
Does it make sence to try this on version not yet containing fix: 
https://github.com/kubernetes/kubernetes/commit/fc62687b2c4924c9f1b95c7d1314787bc7b7cada

PS i tried replica:2 and onenode solution ... it creates and deletes volume one by one... but with secret it uses up all space (volumes persist but not bound)
Comment 12 Jianwei Hou 2017-04-19 22:52:27 EDT
@penehyba I had my glusterfs hosted on EC2, I found by using hostname(in topology.json, node.hostnames.storage) my endpoints could not be created when lots of volumes were created. They quickly used up the space. However after the fix even if endpoints were not created, the volume should have been immediately deleted. After I replaced hostnames with private_dns_name, the endpoints could be created and I did not see this issue again. 

By the time I reported the bug, volumetype parameter in storageclass was not supported yet.

Note You need to log in before you can comment on or make changes to this bug.