Bug 1664365

Summary:	Unable to get a gluster-block when using 2 gluster clusters (heketi-timeout)
Product:	[Red Hat Storage] Red Hat Gluster Storage	Reporter:	Radomir Ludva <rludva>
Component:	kubernetes	Assignee:	Humble Chirammal <hchiramm>
Status:	CLOSED DUPLICATE	QA Contact:	Prasanth <pprakash>
Severity:	medium	Docs Contact:
Priority:	unspecified
Version:	ocs-3.11	CC:	aos-bugs, aos-storage-staff, dmoessne, gquites, hchiramm, jarrpa, kramdoss, madam, mbagnara, pasik, puebele, rhs-bugs, sankarshan
Target Milestone:	---	Keywords:	ZStream
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-07-09 09:41:30 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Radomir Ludva 2019-01-08 15:20:15 UTC

Description of problem:
On OCP with two Gluster Clusters, gluster-blocks, there is a timeout while provisioning PVC with heketi. 
It looks like OCP is not using right gluster provisionner so gluster block creation do not end well.

$ oc get ds
NAME                 DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE SELECTOR             AGE
glusterfs-registry   3         3         3         3            3           glusterfs=registry-host   5h

$ oc get dc
NAME                                   REVISION   DESIRED   CURRENT   TRIGGERED BY
glusterblock-registry-provisioner-dc   1          1         1         config
heketi-registry                        1          1         1         config

$ oc get pod
NAME                                           READY     STATUS    RESTARTS   AGE
glusterblock-registry-provisioner-dc-1-79mzx   1/1       Running   0          5h
glusterfs-registry-6t9xs                       1/1       Running   0          5h
glusterfs-registry-kgh7x                       1/1       Running   0          5h
glusterfs-registry-km7n5                       1/1       Running   0          5h
heketi-registry-1-v2fwj                        1/1       Running   0          5h

$ oc get pvc
NAME                                 STATUS    VOLUME    CAPACITY   ACCESS MODES   STORAGECLASS               AGE
prometheus-k8s-db-prometheus-k8s-0   Pending                                       glusterfs-registry-block   5h

$ oc describe pvc prometheus-k8s-db-prometheus-k8s-0
Normal   Provisioning          56m (x477 over 5h)  gluster.org/glusterblock 7554b7cf-ff93-11e8-bfbe-0a580a810403  External provisioner is provisioning volume for claim "openshift-monitoring/prometheus-k8s-db-prometheus-k8s-0"
Normal   ExternalProvisioning  2m (x8192 over 4h)  persistentvolume-controller                                    waiting for a volume to be created, either by external provisioner "gluster.org/glusterblock" or manually created by system administrator
Warning  ProvisioningFailed    1m (x576 over 5h)   gluster.org/glusterblock 7554b7cf-ff93-11e8-bfbe-0a580a810403  Failed to provision volume with StorageClass "glusterfs-registry-block": failed to create volume: heketi block volume creation failed: [heketi] failed to create volume: Post http://heketi-registry.openshift-glusterfs-infra.svc:8080/blockvolumes: dial tcp 172.30.210.102:8080: i/o timeout 

$ oc logs glusterblock-registry-provisioner-dc-1-79mzx | grep -i error
#erreurs diverse sur des operations avec des verrou
I1214 11:39:05.527185       1 leaderelection.go:156] attempting to acquire leader lease...
E1214 11:39:05.565884       1 leaderelection.go:273] Failed to update lock: Operation cannot be fulfilled on persistentvolumeclaims "prometheus-k8s-db-prometheus-k8s
-0": the object has been modified; please apply your changes to the latest version and try again
I1214 11:41:48.206955       1 leaderelection.go:156] attempting to acquire leader lease...
E1214 11:41:48.225538       1 leaderelection.go:273] Failed to update lock: Operation cannot be fulfilled on persistentvolumeclaims "metrics-cassandra-1": the object
 has been modified; please apply your changes to the latest version and try again
W1214 11:44:09.557049       1 reflector.go:341] github.com/kubernetes-incubator/external-storage/lib/controller/controller.go:644: watch of *v1.PersistentVolume ende
d with: The resourceVersion for the provided watch is too old.
I1214 11:45:29.515567       1 leaderelection.go:156] attempting to acquire leader lease...
E1214 11:45:29.540499       1 leaderelection.go:273] Failed to update lock: Operation cannot be fulfilled on persistentvolumeclaims "logging-es-0": the object has be
en modified; please apply your changes to the latest version and try again
I1214 11:46:33.784631       1 leaderelection.go:156] attempting to acquire leader lease...
E1214 11:46:33.811019       1 leaderelection.go:273] Failed to update lock: Operation cannot be fulfilled on persistentvolumeclaims "logging-es-1": the object has be
en modified; please apply your changes to the latest version and try again

## Gluster-application provisioner logs :


 Failed to provision volume for claim "openshift-logging/logging-es-2" with StorageClass "glusterfs-registry-block": failed to create volume: heketi block volume creation failed: [heketi] failed to create volume: Post http://heketi-registry.openshift-glusterfs-infra.svc:8080/blockvolumes: dial tcp 172.30.210.102:8080: i/o timeout
I1214 17:46:22.067862       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"openshift-logging", Name:"logging-es-2", UID:"0a7402a9-ff96-11e8-b73d-0050568e86fb", APIVersion:"v1", ResourceVersion:"134479", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' Failed to provision volume with StorageClass "glusterfs-registry-block": failed to create volume: heketi block volume creation failed: [heketi] failed to create volume: Post http://heketi-registry.openshift-glusterfs-infra.svc:8080/blockvolumes: dial tcp 172.30.210.102:8080: i/o timeout
I1214 17:46:22.356874       1 leaderelection.go:204] stopped trying to renew lease to provision for pvc openshift-logging/logging-es-2, timeout reached
W1214 17:46:22.356933       1 controller.go:686] retrying syncing claim "openshift-logging/logging-es-2" because failures 0 < threshold 15
E1214 17:46:22.356974       1 controller.go:701] error syncing claim "openshift-logging/logging-es-2": failed to create volume: heketi block volume creation failed: [heketi] failed to create volume: Post http://heketi-registry.openshift-glusterfs-infra.svc:8080/blockvolumes: dial tcp 172.30.210.102:8080: i/o timeout

Version-Release number of selected component (if applicable):
atomic-openshift-clients-3.11.43-1.git.0.647ac05.el7.x86_64
redhat-release-server-7.5-8.el7.x86_64
kernel-3.10.0-862.14.4.el7.x86_64 
kernel-3.10.0-862.14.4.el7.x86_64
kernel-3.10.0-862.14.4.el7.x86_64 
kernel-3.10.0-862.14.4.el7.x86_64 
kernel-3.10.0-862.14.4.el7.x86_64
OCS 3.12.2 (glusterd)


How reproducible:
Clean installaion with two gluster clusters block storage.

Expected results:
All PVC are mounted to PV.

StorageClass Dump (if StorageClass used by PV/PVC):
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: null
  name: glusterfs-registry-block
parameters:
  chapauthenabled: "true"
  hacount: "3"
  restsecretname: heketi-registry-admin-secret-block
  restsecretnamespace: openshift-glusterfs-infra
  resturl: http://heketi-registry.openshift-glusterfs-infra.svc:8080
  restuser: admin
provisioner: gluster.org/glusterblock
reclaimPolicy: Delete
volumeBindingMode: Immediate

Comment 8 Matt Bagnara 2019-03-12 17:51:47 UTC

I am also experiencing this same issue with a similar environment and setup of 2 gluster clusters.

Comment 10 Humble Chirammal 2019-07-09 09:41:30 UTC

It looks to me that, this is duplicate of bug#1703239, I am closing this on the same thought. Please feel free to reopen if thats not the case.

*** This bug has been marked as a duplicate of bug 1703239 ***