Description of problem: When creating storage using multiple storage classes in OCP 3.11, we are seeing communication issues (timeouts) when attempting to make calls to the heketi svc provided by Openshift. The cluster is using the ovs-multitenant plugin for host networks. Version-Release number of selected component (if applicable): RHGS 3.11 OCP 3.11 How reproducible: Intermittent Steps to Reproduce: 1. Create storage using either app-storage or infra-storage storage classes. 2. Observe namespace events. Will see connection timeouts talking to the heketi svc 3. Actual results:Connection timeouts Expected results:block storage carved out by storage class Additional info:
Hi @John, Would we be able to provide some insight about the analysis so far? So we can let customer updated. Thank you!
Switching needinfo to Talur who was going to do some investigation. Talur, any thoughts on the potential workarounds we discussed previously?
Root cause: When glusterblock external provisioners are deployed, all of them start with "gluster.org/glusterblock" id. If the storageclass is not in the same namespace as the external provisioner selected by storage controller then the communication might not go through based on SDN settings. Solution(s), choose either: a. Deploy only one glusterblock external provisioner(in any project) and enable network connection from the project where storageclass is defined to the project where the provisioner pod is. b. Deploy each glusterblock external provisioner with its own id/Provisioner_name and adjust the storageclass definition accordingly. In my environment, I had to do the following changes: 1. oc edit dc glusterblock-storage-provisioner-dc #the spec for container was changed to look like this spec: containers: - args: - -id - rtalur env: - name: PROVISIONER_NAME value: gluster.org/glusterblock-app-storage 2. created a storageclass with the yaml contents as shown apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: glusterfs-storage-block namespace: "" parameters: chapauthenabled: "true" hacount: "3" restsecretname: heketi-storage-admin-secret-block restsecretnamespace: app-storage resturl: http://heketi-storage.app-storage.svc:8080 restuser: admin provisioner: gluster.org/glusterblock-app-storage reclaimPolicy: Delete volumeBindingMode: Immediate NOTE: a. the name of the provisioner in the storageclass and the value of the env PROVSIONER_NAME in the deploymentconfig. b. the requirement of id in the container spec seems like a bug to me as the https://github.com/kubernetes-incubator/external-storage/blob/master/gluster/block/cmd/glusterblock-provisioner/glusterblock-provisioner.go#L909 code is not clear.
Couple of quick questions here to understand the requirement: Why we need to deploy 2 block provisioners? If thats really a requirement: Why cant we deploy provisioner with a different name ? and serve out PVCs from 2 different SCs ?
(In reply to Humble Chirammal from comment #17) > Couple of quick questions here to understand the requirement: > > > Why we need to deploy 2 block provisioners? > > If thats really a requirement: > > Why cant we deploy provisioner with a different name ? and serve out PVCs > from 2 different SCs ? It could be customer/user preference to have multiple provisioners. And as described in the comment 15, creating them with different names is the solution. The question is, what is the right way to create provisioners with different names? I believe it should be possible to do so with just changing the value of the PROVISIONER_NAME in the DC but it is not.
John, Talur, below PR should address the limitation discussed here in the bug https://github.com/kubernetes-incubator/external-storage/pull/1170/
When I edit the DC, including the lines below: spec: containers: - args: - -id - gluster.org/glusterblock-app-storage - env: - name: PROVISIONER_NAME value: gluster.org/glusterblock-app-storage I receive the follow message: # oc edit dc glusterblock-storage-provisioner-dc A copy of your changes has been stored to "/tmp/oc-edit-kynqo.yaml" error: map: map[args:[-id gluster.org/glusterblock-app-storage]] does not contain declared merge key: name # Please edit the object below. Lines beginning with a '#' will be ignored, # and an empty file will abort the edit. If an error occurs while saving this file will be # reopened with the relevant failures. # apiVersion: apps.openshift.io/v1 kind: DeploymentConfig metadata: annotations: description: Defines how to deploy the glusterblock provisioner pod. creationTimestamp: 2019-03-18T20:50:15Z generation: 3 labels: glusterblock: storage-provisioner-dc glusterfs: block-storage-provisioner-dc name: glusterblock-storage-provisioner-dc namespace: app-storage resourceVersion: "18818403" selfLink: /apis/apps.openshift.io/v1/namespaces/app-storage/deploymentconfigs/glusterblock-storage-provisioner-dc uid: 6ea4d65f-49bf-11e9-a508-0050568f1a87 spec: replicas: 1 revisionHistoryLimit: 10 selector: glusterfs: block-storage-provisioner-pod strategy: activeDeadlineSeconds: 21600 recreateParams: timeoutSeconds: 600 resources: {} type: Recreate template: metadata: creationTimestamp: null labels: glusterfs: block-storage-provisioner-pod name: glusterblock-provisioner spec: containers: - args: - -id - gluster.org/glusterblock-app-storage - env: - name: PROVISIONER_NAME value: gluster.org/glusterblock-app-storage image: registry.redhat.io/rhgs3/rhgs-gluster-block-prov-rhel7:v3.11 imagePullPolicy: IfNotPresent name: glusterblock-provisioner resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: glusterblock-storage-provisioner serviceAccountName: glusterblock-storage-provisioner terminationGracePeriodSeconds: 30 test: false triggers: - type: ConfigChange status: availableReplicas: 1 conditions: - lastTransitionTime: 2019-04-30T19:07:25Z lastUpdateTime: 2019-04-30T19:07:25Z message: Deployment config has minimum availability. status: "True" type: Available - lastTransitionTime: 2019-04-30T19:07:22Z lastUpdateTime: 2019-04-30T19:07:26Z message: replication controller "glusterblock-storage-provisioner-dc-3" successfully rolled out reason: NewReplicationControllerAvailable status: "True" type: Progressing details: causes: - type: ConfigChange message: config change latestVersion: 3 observedGeneration: 3 readyReplicas: 1 replicas: 1 unavailableReplicas: 0 updatedReplicas: 1 Is it "- -id" or only "id" ? " does not contain declared merge key: name " seems to be yaml issue right ? I may be worng so please correct and help me.
VERIFICATION STEPS: If you deployed OCS with infra storage enabled, you already have one gluster block provisioner. Now change to app-storage project and deploy a second gluster block provisioner using the templates either from cns-deploy or openshift-ansible. Before deploying change the env PROVISIONER_NAME to something which is different than what you see in `oc get pods` output for the existing block provisioner. Then create a storageclass which points to this new provisioner. Example: I had provided the name gluster.org/glusterblock-app-storage for the name of the provisioner. and I created a storageclass with the definition like this: apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: glusterfs-storage-block namespace: "" parameters: chapauthenabled: "true" hacount: "3" restsecretname: heketi-storage-admin-secret-block restsecretnamespace: app-storage resturl: http://heketi-storage.app-storage.svc:8080 restuser: admin provisioner: gluster.org/glusterblock-app-storage reclaimPolicy: Delete volumeBindingMode: Immediate NOTE: you might have to create the secret mentioned in the storageclass. Then you create a PVC that refers to this new storageclass and it should be able to create the volume.
*** Bug 1664365 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:3260