| Summary: | Numbers of unesseceary volumes created on endpoints automatic creation failure | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Jianwei Hou <jhou> |
| Component: | Storage | Assignee: | hchen |
| Status: | CLOSED ERRATA | QA Contact: | Jianwei Hou <jhou> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 3.4.0 | CC: | aos-bugs, bchilds, hchiramm, jhou, penehyba, rcyriac, tdawson |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: |
undefined
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-01-18 12:46:23 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Jianwei Hou
2016-10-26 10:40:16 UTC
The fix for this issue (https://github.com/kubernetes/kubernetes/pull/35285) is in merge queue of upstream kubernetes. I will backport the patch to OCP as soon as its done. merged upstream, waiting on OSE PR (In reply to Bradley Childs from comment #2) > merged upstream, waiting on OSE PR I have filed https://github.com/openshift/origin/pull/11722 This has been merged into ose and is in OSE v3.4.0.22 or newer. Verified on openshift v3.4.0.22+5c56720 kubernetes v1.4.0+776c994 etcd 3.1.0-rc.0 1. Make endpoints creation fail: Failed to provision volume with StorageClass "glusterprovisioner": glusterfs: create volume err: failed to create endpoint/service <nil>. 2. Go to heketi server, list volumes: Found no volumes there, repeatedly list volumes, found there was one volume created but was then immediately deleted . Considering this is an edge scenario which only happens when there is a wrong heketi topology configuration, the above fix is acceptable. Mark this one as verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:0066 (In reply to Jianwei Hou from comment #8) > Verified on > openshift v3.4.0.22+5c56720 > kubernetes v1.4.0+776c994 > etcd 3.1.0-rc.0 > > 1. Make endpoints creation fail: Failed to provision volume with StorageClass > "glusterprovisioner": glusterfs: create volume err: failed to create > endpoint/service <nil>. > > 2. Go to heketi server, list volumes: Found no volumes there, repeatedly > list volumes, found there was one volume created but was then immediately > deleted . > > Considering this is an edge scenario which only happens when there is a > wrong heketi topology configuration, the above fix is acceptable. Mark this > one as verified. Hi, I am experiencing same behavior (as originaly stated) on: OpenShift Master: v3.5.0.53 Kubernetes Master: v1.5.2+43a9be4 I have heketi server addressed by storageclass (volumetype=replicate:3) After create -f pvc.yaml it switches to Pending state. There are several volumes created, none of them is connected to pvc and all space is used up. In description I encountered messages like: - Token used before issued (~ in heketi.json I added exact iat and exp to prevent this) - No space (~ no more space for next volume causes this whole bug) - failed to create endpoint/service <nil> (~ i think IP vs glusternodename causes this. in etc/hosts it is unable to recognize node when creating endpoint, why?) - Id not found (~it was overloaded?) - Host 'IP' is not in 'Peer in Cluster' state (~ i see it is: name+ IP both are there) My questions: what was your "wrong" topology configuration? Does it make sence to try this on version not yet containing fix: https://github.com/kubernetes/kubernetes/commit/fc62687b2c4924c9f1b95c7d1314787bc7b7cada PS i tried replica:2 and onenode solution ... it creates and deletes volume one by one... but with secret it uses up all space (volumes persist but not bound) @penehyba I had my glusterfs hosted on EC2, I found by using hostname(in topology.json, node.hostnames.storage) my endpoints could not be created when lots of volumes were created. They quickly used up the space. However after the fix even if endpoints were not created, the volume should have been immediately deleted. After I replaced hostnames with private_dns_name, the endpoints could be created and I did not see this issue again. By the time I reported the bug, volumetype parameter in storageclass was not supported yet. |