Bug 1480835

Summary:	[RFE] Support dynamic provisioning on Gluster for metrics and logging in openshift-ansible
Product:	OpenShift Container Platform	Reporter:	Nicholas Schuetz <nick>
Component:	Installer	Assignee:	Jose A. Rivera <jarrpa>
Status:	CLOSED ERRATA	QA Contact:	Anping Li <anli>
Severity:	high	Docs Contact:
Priority:	medium
Version:	3.6.1	CC:	aos-bugs, aos-storage-staff, dmoessne, dsutherland1492, jarrpa, jokerman, jsafrane, mmccomas, nschuetz, pprakash, sdodson, sudo, weshi
Target Milestone:	---
Target Release:	3.9.0
Hardware:	x86_64
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Enhancement
Doc Text:	Following a successful deployment of CNS/CRS with glusterblock, OpenShift Logging and Metrics can be deployed using glusterblock as their backend storage for fault-tolerant, distributed persistent storage.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-03-28 14:06:20 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Nicholas Schuetz 2017-08-12 04:33:42 UTC

Description of problem:

Kick off a fresh install of OCP 3.6 using openshift-ansible.  configure gluster in the ansible hosts file and set your docker-registry, metrics, and logging to be deployed on dynamic PVs.  only the docker registry will get deployed.  metrics and logging doesnt deploy because it is unable to mount cassandra and elasticsearch PVs (respectively).  PVCs get create but the PVs do not get dynamically created.  After the install completes, if you 'oc edit sc glusterfs-storage' and then add an annotation making it the "default" StorageClass and then re-kick off the metrics and logging install, everything works as expected.

annotations:
   storageclass.kubernetes.io/is-default-class: "true"

Can we look at making the gluster-storage StorageClass the default SC if set to be deployed?


Ansible hosts file:

[OSEv3:children]
masters
nodes
new_nodes
etcd
lb
glusterfs
[OSEv3:vars]
deployment_type=openshift-enterprise
openshift_disable_check=disk_availability,docker_storage,memory_availability
ansible_ssh_user=root
openshift_clock_enabled=true

openshift_master_default_subdomain=apps.ocp.nicknach.net
openshift_master_cluster_method=native
openshift_master_cluster_hostname=api.ocp.nicknach.net
openshift_master_cluster_public_hostname=console.ocp.nicknach.net

#########################################
openshift_hosted_metrics_deploy=true
openshift_hosted_metrics_storage_kind=dynamic

openshift_hosted_logging_deploy=true
openshift_hosted_logging_storage_kind=dynamic

openshift_hosted_manage_registry=true
openshift_hosted_registry_storage_kind=glusterfs

openshift_storage_glusterfs_namespace=glusterfs
openshift_storage_glusterfs_name=storage
##########################################

[lb]
lb.ocp.nicknach.net
[etcd]
master01.ocp.nicknach.net
master02.ocp.nicknach.net
master03.ocp.nicknach.net
[masters]
master01.ocp.nicknach.net
master02.ocp.nicknach.net
master03.ocp.nicknach.net
[nodes]
master01.ocp.nicknach.net openshift_node_labels="{'region': 'masters', 'zone': 'a'}" openshift_schedulable=false
master02.ocp.nicknach.net openshift_node_labels="{'region': 'masters', 'zone': 'b'}" openshift_schedulable=false
master03.ocp.nicknach.net openshift_node_labels="{'region': 'masters', 'zone': 'c'}" openshift_schedulable=false
infra01.ocp.nicknach.net openshift_node_labels="{'region': 'infra', 'zone': 'a'}"
infra02.ocp.nicknach.net openshift_node_labels="{'region': 'infra', 'zone': 'b'}"
infra03.ocp.nicknach.net openshift_node_labels="{'region': 'infra', 'zone': 'c'}"
node01.ocp.nicknach.net openshift_node_labels="{'region': 'primary', 'zone': 'a'}"
node02.ocp.nicknach.net openshift_node_labels="{'region': 'primary', 'zone': 'b'}"
node03.ocp.nicknach.net openshift_node_labels="{'region': 'primary', 'zone': 'c'}"
[glusterfs]
infra01.ocp.nicknach.net glusterfs_devices='["/dev/vdc"]'
infra02.ocp.nicknach.net glusterfs_devices='["/dev/vdc"]'
infra03.ocp.nicknach.net glusterfs_devices='["/dev/vdc"]'
node01.ocp.nicknach.net glusterfs_devices='["/dev/vdc"]'
node02.ocp.nicknach.net glusterfs_devices='["/dev/vdc"]'
node03.ocp.nicknach.net glusterfs_devices='["/dev/vdc"]'


Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:

Master Log:

Node Log (of failed PODs):

PV Dump:

PVC Dump:

StorageClass Dump (if StorageClass used by PV/PVC):

Additional info:

Comment 1 Jianwei Hou 2017-08-14 03:10:45 UTC

I think we can add a parameter to let user decide whether they want it installed as a default StorageClass.

Comment 2 Jan Safranek 2017-08-14 12:15:33 UTC

There is a code in openshift-ansible that installs a default storage class, it should be updated to make the gluster one default if there is no other default one (e.g. for AWS or GCE).

Comment 3 Scott Dodson 2017-08-14 12:36:32 UTC

Jose,

What should we do here? We need to consider both technical details and making sure that if enabling this without certain subscriptions is to be supported.

Comment 5 Scott Dodson 2017-08-14 13:34:42 UTC

We'll consider making CNS a default storage provider in 3.7.

Comment 6 Jose A. Rivera 2017-09-08 13:27:06 UTC

It should definitely be technically feasible to update the openshift_default_storage_class role to support setting glusterfs as the default SC. I would officially be wary of this, however, as we don't support logging and metrics on raw GlusterFS. Metadata-heavy workloads, in general, are some of the worst-case scenarios for GlusterFS. That support will come with gluster-block (which would be its own SC) which basically transforms metadata operations into reads and writes. gluster-block also scales better than raw GlusterFS, so I'd be more inclined to make that a default SC. The catch, of course, is that gluster-block is ReadWriteOnce instead of ReadWriteMany.

Comment 7 Jose A. Rivera 2017-09-20 13:42:56 UTC

I'm rewording the title of the BZ to reflect what I believe is its current state, which is as an RFE. 

To be clear, while I want to give the ability to set a GlusterFS SC as the default SC this would not be something I'd recommend as a general practice and should definitely not be automatically the case.

The more specific goal I'd want to achieve for this BZ would be to allow openshift_hosted_metrics_storage_kind and openshift_hosted_logging_storage_kind to be "glusterfs" or "gluster-block". This would work like it does for the registry, where there are some default configuration parameters set such that openshift-ansible can provision its own GlusterFS or gluster-block volumes.

Comment 12 Magnus Glantz 2017-11-18 23:19:03 UTC

To do annotation and then afterwards install metrics, logging, do:

oc annotate sc glusterfs-storage storageclass.kubernetes.io/is-default-class="true"

and then run metrics, logging playbooks, like:

ansible-playbook -i /etc/ansible/hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-logging.yml -e openshift_logging_install_logging=True -e openshift_hosted_logging_storage_kind=dynamic -e openshift_master_logging_public_url=https://kibana.mydomain.suffix -e openshift_logging_es_pvc_dynamic=True

ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/openshift-metrics.yml -e openshift_metrics_install_metrics=True -e openshift_metrics_hawkular_hostname=hawkular-metrics.mydomain.suffix -e openshift_metrics_cassandra_storage_type=pv

Comment 13 Magnus Glantz 2017-11-18 23:20:49 UTC

Doing above in OCP 3.6 is not supported, as it means logging will consume non-block storage from Gluster. But yeah.

Comment 15 dsutherland1492 2017-12-11 16:23:43 UTC

So what is the proper way to accomplish this in 3.6?  It sounds like this isn't feasible at all based on Magnus's comments.

Comment 16 dsutherland1492 2017-12-11 16:24:02 UTC

So what is the proper way to accomplish this in 3.6?  It sounds like this isn't feasible at all based on Magnus's comments.

Comment 17 Jose A. Rivera 2017-12-11 16:30:28 UTC

The proper method is to deploy OCP without Logging and Metrics, follow the CNS 3.6 admin guide[1] to deploy CNS, then follow that same guide to create a glusterblock StorageClass, mark it as default, then deploy Logging and Metrics using the storage type "dynamic" for both.

OCP 3.7 will, sometime this week, support a much smoother deployment mechanism. We hope to iron it out even further by CNS 3.7 release.

[1] https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html-single/container-native_storage_for_openshift_container_platform/

Comment 18 Jose A. Rivera 2018-02-14 13:34:39 UTC

Merged PR: https://github.com/openshift/openshift-ansible/pull/6922

Comment 20 Anping Li 2018-03-01 08:19:47 UTC

The pvc logging-es-0 and metrics-cassandra-1  are created and attached to pod. But the logging-es-ops-0  couldn't be provision. heketi-registry-1-7n4jl report the following error. The request size is 2Gi.

[asynchttp] INFO 2018/03/01 06:49:29 asynchttp.go:125: Started job 559faf75576b1408eebe1c26017bfcc6
[heketi] INFO 2018/03/01 06:49:29 Creating block volume 04690a8576ded4ab2e0e8438207b52a3
[negroni] Started GET /queue/559faf75576b1408eebe1c26017bfcc6
[negroni] Completed 200 OK in 91.177µs
[heketi] WARNING 2018/03/01 06:49:29 Free size is lesser than the block volume requested
[heketi] INFO 2018/03/01 06:49:29 No block hosting volumes found in the cluster list
[heketi] INFO 2018/03/01 06:49:29 brick_num: 0
[heketi] INFO 2018/03/01 06:49:29 brick_num: 0
[heketi] INFO 2018/03/01 06:49:29 brick_num: 1
[heketi] INFO 2018/03/01 06:49:29 brick_num: 0
[heketi] INFO 2018/03/01 06:49:29 brick_num: 1
[heketi] INFO 2018/03/01 06:49:29 brick_num: 2
[heketi] ERROR 2018/03/01 06:49:29 /src/github.com/heketi/heketi/apps/glusterfs/volume_entry_allocate.go:38: Minimum brick size limit reached.  Out of space.
[heketi] ERROR 2018/03/01 06:49:29 /src/github.com/heketi/heketi/apps/glusterfs/block_volume_entry.go:58: Failed to create Block Hosting Volume: No space
[heketi] ERROR 2018/03/01 06:49:29 /src/github.com/heketi/heketi/apps/glusterfs/app_block_volume.go:90: Failed to create block volume: No space
[asynchttp] INFO 2018/03/01 06:49:29 asynchttp.go:129: Completed job 559faf75576b1408eebe1c26017bfcc6 in 551.978807ms
[negroni] Started GET /queue/559faf75576b1408eebe1c26017bfcc6
[negroni] Completed 500 Internal Server Error in 137.17µs



openshift_ansible_vars:
  openshift_storage_glusterfs_registry_block_storageclass: true
  openshift_storage_glusterfs_registry_block_storageclass_default: true
  openshift_storageclass_default: false
  openshift_storage_glusterfs_block_host_vol_size: 5
  openshift_logging_install_logging: true
  openshift_logging_image_version: v3.9
  openshift_logging_es_pvc_dynamic: true
  openshift_logging_es_memory_limit: 2Gi
  openshift_logging_es_pvc_size: 2Gi
  openshift_logging_use_ops: true
  openshift_logging_es_ops_memory_limit: 2Gi
  openshift_logging_es_ops_pvc_size: 2Gi
  openshift_metrics_cassandra_replicas: 1
  openshift_metrics_cassandra_pvc_size:  256Mi
  openshift_metrics_cassandra_storage_type: dynamic
  openshift_metrics_image_version: v3.9
  openshift_metrics_install_metrics: true


[root@qe-anligbmaster-etcd-1 ~]# oc get pvc
NAME           STATUS    VOLUME                                 CAPACITY   ACCESS MODES   STORAGECLASS               AGE
logging-es-0   Bound   pvc-2d8c620c-1d1a-11e8-9d0c-42010af00012   2Gi        RWO            glusterfs-registry-block   1h
logging-es-ops-0   Pending                                                                        glusterfs-registry-block   1h

Comment 21 Jose A. Rivera 2018-03-01 16:21:06 UTC

Is there a block-hosting volume with sufficient space to hold the requested block volumes? If not, is there enough space to create a new block-hosting volume to do so? If so, is that space enough for heketi to create a new block-hosting volume automatically of size 5GB?

Comment 22 Anping Li 2018-03-01 16:36:10 UTC

Only 6 GiB free space at the beginning. Deploy again with the following variables. All pvc are provision. so move bug to verified.  


@weshi, 60Mi data are written to the disk in 1 mintues. In case of the disk is full. I had to finish test in 10 Minutes.  Could we allocate more disk to the instances?

openshift_storage_glusterfs_block_host_vol_size: 1 
openshift_logging_es_pvc_size: 512Mi
openshift_logging_es_ops_pvc_size: 512Mi
openshift_metrics_cassandra_pvc_size:  512Mi

Devices:
Id:e25c71c82be0c485dc0aad44ddf05520   Name:/dev/vsda           State:online    Size (GiB):14      Used (GiB):8      Free (GiB):6

Comment 23 Wenkai Shi 2018-03-05 02:26:34 UTC

@anli Sure, you simple feed a larger value during installation.

Comment 28 errata-xmlrpc 2018-03-28 14:06:20 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0489