Bug 1620222 - openshift-metering install: cannot control the PVC class name and size for hdfs pods
Summary: openshift-metering install: cannot control the PVC class name and size for hd...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.11.0
Assignee: Chance Zibolski
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-22 17:53 UTC by Hongkai Liu
Modified: 2018-10-10 15:24 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-10-10 15:24:37 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Hongkai Liu 2018-08-22 17:53:16 UTC
Description of problem:

Version-Release number of the following components:
# yum list installed | grep openshift
atomic-openshift.x86_64         3.11.0-0.16.0.git.0.6bcbde8.el7

rpm -q openshift-ansible
# git log --oneline -1
ec6d8ca Automatic commit of package [openshift-ansible] release [3.11.0-0.20.0].
# rpm -q ansible
ansible-2.6.2-1.el7ae.noarch
# ansible --version
ansible 2.6.2
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Jul 16 2018, 19:52:45) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]

How reproducible: Always

Steps to Reproduce:
1. the content of /tmp/openshift_metering_config/metering.yaml is here: https://raw.githubusercontent.com/hongkailiu/svt-case-doc/master/files/metering.yaml

Reference: https://github.com/operator-framework/operator-metering/blob/master/manifests/metering-config/custom-storageclass-values.yaml

2. # ansible-playbook -i aaa/ openshift-ansible/playbooks/openshift-metering/config.yml --extra-vars "@/tmp/openshift_metering_config/metering.yaml"
3.

Actual results:
Please include the entire output from the last TASK line through the end of output if an error is generated

Expected results:

Additional info:
Please attach logs from ansible-playbook with the -vvv flag


# oc get all
NAME                                      READY     STATUS    RESTARTS   AGE
pod/hdfs-datanode-0                       1/1       Running   0          1m
pod/hdfs-namenode-0                       1/1       Running   0          1m
pod/hive-metastore-0                      1/1       Running   0          1m
pod/hive-server-0                         1/1       Running   0          1m
pod/metering-operator-df67bb6cb-f6sk6     2/2       Running   0          2m
pod/presto-coordinator-554bd785b8-mbtlt   1/1       Running   0          1m
pod/reporting-operator-787ddc9dcd-5xjg7   0/1       Running   0          1m

NAME                                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)               AGE
service/glusterfs-dynamic-hive-metastore-db-data   ClusterIP   172.25.67.187    <none>        1/TCP                 1m
service/hdfs-datanode                              ClusterIP   None             <none>        50010/TCP             1m
service/hdfs-datanode-web                          ClusterIP   172.27.95.26     <none>        50075/TCP             1m
service/hdfs-namenode                              ClusterIP   None             <none>        8020/TCP              1m
service/hdfs-namenode-proxy                        ClusterIP   172.24.89.142    <none>        8020/TCP              1m
service/hdfs-namenode-web                          ClusterIP   172.27.172.202   <none>        50070/TCP             1m
service/hive-metastore                             ClusterIP   172.24.117.93    <none>        9083/TCP              1m
service/hive-server                                ClusterIP   172.25.47.121    <none>        10000/TCP,10002/TCP   1m
service/presto                                     ClusterIP   172.24.80.183    <none>        8080/TCP,8082/TCP     1m
service/reporting-operator                         ClusterIP   172.27.190.42    <none>        8080/TCP              1m
service/reporting-operator-metrics                 ClusterIP   172.27.223.145   <none>        8082/TCP              1m

NAME                                 DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/metering-operator    1         1         1            1           2m
deployment.apps/presto-coordinator   1         1         1            1           1m
deployment.apps/presto-worker        0         0         0            0           1m
deployment.apps/reporting-operator   1         1         1            0           1m

NAME                                            DESIRED   CURRENT   READY     AGE
replicaset.apps/metering-operator-df67bb6cb     1         1         1         2m
replicaset.apps/presto-coordinator-554bd785b8   1         1         1         1m
replicaset.apps/presto-worker-78694b57b6        0         0         0         1m
replicaset.apps/reporting-operator-787ddc9dcd   1         1         0         1m

NAME                              DESIRED   CURRENT   AGE
statefulset.apps/hdfs-datanode    1         1         1m
statefulset.apps/hdfs-namenode    1         1         1m
statefulset.apps/hive-metastore   1         1         1m
statefulset.apps/hive-server      1         1         1m

# oc get pvc
NAME                                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS        AGE
hdfs-datanode-data-hdfs-datanode-0   Bound     pvc-91055d7b-a632-11e8-8dc1-026cee33ed60   5Gi        RWO            gp2                 1m
hdfs-namenode-data-hdfs-namenode-0   Bound     pvc-9117a4f5-a632-11e8-8dc1-026cee33ed60   5Gi        RWO            gp2                 1m
hive-metastore-db-data               Bound     pvc-90d84ea8-a632-11e8-8dc1-026cee33ed60   15Gi       RWO            glusterfs-storage   1m


# oc rsh metering-operator-df67bb6cb-f6sk6
Defaulting container name to metering-operator.
Use 'oc describe pod/metering-operator-df67bb6cb-f6sk6 -n openshift-metering' to see all of the containers in this pod.
sh-4.2$ cat /tmp/openshift-metering-values.yaml
{"presto":{"spec":{"hdfs":{"datanode":{"storage":{"class":"glusterfs-storage","size":"15Gi"}},"namenode":{"storage":{"class":"glusterfs-storage","size":"15Gi"}}},"hive":{"metastore":{"storage":{"class":"glusterfs-storage","size":"15Gi"}}}}}}

===================
From the content in the metering-operator pod, the var value is transferred correctly. But it seems not to be picked up by the hdfs pods.

Comment 1 Hongkai Liu 2018-08-22 17:55:00 UTC
# oc get pvc
NAME                                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS        AGE
hdfs-datanode-data-hdfs-datanode-0   Bound     pvc-91055d7b-a632-11e8-8dc1-026cee33ed60   5Gi        RWO            gp2                 1m
hdfs-namenode-data-hdfs-namenode-0   Bound     pvc-9117a4f5-a632-11e8-8dc1-026cee33ed60   5Gi        RWO            gp2                 1m
hive-metastore-db-data               Bound     pvc-90d84ea8-a632-11e8-8dc1-026cee33ed60   15Gi       RWO            glusterfs-storage   1m


The PVC used by hive pod is with the right size and type.

Comment 2 Scott Dodson 2018-08-22 18:48:25 UTC
Chance,

Can you take a look at this and determine whether you feel like it's a 3.11 blocker or not? A blocker would need to be fixed and tested by QE within the next two weeks.

Comment 3 Chance Zibolski 2018-08-22 21:41:25 UTC
Both of these can be controlled, and we even have docs on this. https://github.com/operator-framework/operator-metering/blob/master/Documentation/metering-config.md#dynamically-provisioning-persistent-volumes-using-storage-classes

The docs have some minor ordering/section issues that I'm going to fix now, but it mentions configuring the storage class here: https://github.com/operator-framework/operator-metering/blob/master/Documentation/metering-config.md#configuring-the-storage-class-for-metering and the sizing of the volumes is (incorrectly) in the "manually creating persistent volumes" section just below the section linked.

As far as anything metering, it isn't not a 3.11 blocker because we're not officially part of the 3.11 release, we're targeting 4.0 for tech preview.

Comment 4 Hongkai Liu 2018-08-23 12:45:35 UTC
Thanks for the info.

I was misled by the example in the doc.
https://github.com/operator-framework/operator-metering/blob/master/manifests/metering-config/custom-storageclass-values.yaml
It would be nice if you can fix that too.

It is working now.
This is the working var definition:

--- 
openshift_metering_config: 
  presto: 
    spec:
      hive:
        metastore: 
          storage: 
            class: glusterfs-storage
            size: 15Gi

  hdfs:
    spec:
      datanode:
        storage:
          class: glusterfs-storage
          size: 15Gi
      namenode:
        storage:
          class: glusterfs-storage
          size: 15Gi

Comment 5 Hongkai Liu 2018-08-23 12:46:44 UTC
# oc get pvc
NAME                                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS        AGE
hdfs-datanode-data-hdfs-datanode-0   Bound     pvc-dc0220c1-a6d0-11e8-b49c-0279bbe13b54   15Gi       RWO            glusterfs-storage   2m
hdfs-namenode-data-hdfs-namenode-0   Bound     pvc-dc11ccb9-a6d0-11e8-b49c-0279bbe13b54   15Gi       RWO            glusterfs-storage   2m
hive-metastore-db-data               Bound     pvc-dbd5f44c-a6d0-11e8-b49c-0279bbe13b54   15Gi       RWO            glusterfs-storage   2m

Comment 6 Chance Zibolski 2018-08-23 18:21:33 UTC
Sorry, I misunderstood the original issue. That is also covered in our documentation, though it's within the ansible installation doc: https://github.com/operator-framework/operator-metering/blob/master/Documentation/ansible-install.md#configuration perhaps I can link back to this from configuring-metering to make it more clear.

Comment 7 Hongkai Liu 2018-08-23 18:47:20 UTC
The example in the doc is wrong:
https://github.com/operator-framework/operator-metering/blob/master/manifests/metering-config/custom-storageclass-values.yaml

"uncomment" as it says would not work.
It needs to be the format in Comment 4.

Comment 8 Chance Zibolski 2018-08-24 17:22:18 UTC
I'm not sure bugzilla is the best medium for this discussion, but let me know if there's a better spot.

If you read the instructions in https://github.com/operator-framework/operator-metering/blob/master/Documentation/ansible-install.md#configuration, it specifically says you "To supply custom configuration options set the openshift_metering_config variable to a dictionary containing the contents of the Metering spec field you wish to set."

The example is just an example of a `Metering` resource in general. When using ansible to adjust it, you need to take the content of a `Metering` resource's `.spec` into `openshift_metering_config`.


Additionally, long term the plan is to install using OLM, making most of this a moot issue. We're supporting ansible so internal Red hat users and alpha testers can install onto 3.11, so I think actively  adding more docs specific for using Ansible is unnecessary currently.

Comment 9 Chance Zibolski 2018-08-24 17:24:47 UTC
Ohhh I see what you're referring to, in the example the indentation of the contents are all under spec.presto indented incorrectly. Ill fix that.

Comment 10 Chance Zibolski 2018-08-24 17:27:46 UTC
Here's a PR to fix the example manifest: https://github.com/operator-framework/operator-metering/pull/358

Comment 11 Hongkai Liu 2018-08-24 17:45:45 UTC
Thanks for the fix. Chance. ^_^

Comment 12 Chance Zibolski 2018-08-24 17:55:52 UTC
Yep! Sorry about the confusion, I didn't quite realize you were referring to the example itself having problems, I thought you were saying the docs were the issue.

I'm new to bugzilla, do we update the status now? I merged the docs change.

Comment 14 Chance Zibolski 2018-08-24 18:15:47 UTC
Awesome thanks!

Comment 16 Chance Zibolski 2018-10-10 15:24:37 UTC
Closing as we've made a release since.


Note You need to log in before you can comment on or make changes to this bug.