Description of problem: I am installing OCP v3.4 on AWS using the cloud provider and using ansible to install metrics with dynamic storage. The cassandra pod is in a pending state ``` [ec2-user@ip-172-31-21-210 ~]$ oc get pods NAME READY STATUS RESTARTS AGE hawkular-cassandra-1-xxhif 0/1 Pending 0 7m hawkular-metrics-q6eph 0/1 Running 0 7m heapster-52m24 0/1 Running 0 7m metrics-deployer-qcvzl 1/1 Running 0 7m ``` Looking at the logs it shows a zone issue ``` [ec2-user@ip-172-31-21-210 ~]$ journalctl -f | grep cass Jan 20 14:21:30 ip-172-31-21-210.us-west-1.compute.internal atomic-openshift-master[25687]: I0120 14:21:30.133534 25687 predicates.go:410] Won't schedule pod "hawkular-cassandra-1-xxhif" onto node "ip-172-31-21-210.us-west-1.compute.internal" due to volume "pvc-002fa207-df45-11e6-b58c-02b055e60362" (mismatch on "failure-domain.beta.kubernetes.io/zone") ``` This leaves the cassandra pod in an always "pending" state. I did notice that no storage class was setup. ``` [ec2-user@ip-172-31-21-210 ~]$ oc get storageclass No resources found. ``` I used the following in my ansible host file ``` openshift_hosted_metrics_public_url=https://hawkular.apps.54.67.62.196.xip.io/hawkular/metrics openshift_hosted_metrics_deploy=true openshift_hosted_metrics_storage_kind=dynamic openshift_hosted_metrics_deployer_version=v3.4 ``` Here is my full ansible host file: https://paste.fedoraproject.org/531769/84940752/ Version-Release number of selected component (if applicable): oc v3.4.0.39 kubernetes v1.4.0+776c994 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ip-172-31-21-210.us-west-1.compute.internal:8443 openshift v3.4.0.39 kubernetes v1.4.0+776c994 How reproducible: Always Steps to Reproduce: 1. Spin up AWS instance 2. Set up ansible host file with aws cloud-provider with dynamic storage for metrics and logging 3. Run the installer Actual results: Metrics and logging pods stay in "pending state" Expected results: Metrics and logging pods use EBS storage provided by OCP and AWS Additional info: Seems to be the same for logging ``` Jan 20 14:30:42 ip-172-31-21-210.us-west-1.compute.internal atomic-openshift-master[50163]: I0120 14:30:42.730405 50163 predicates.go:410] Won't schedule pod "logging-es-ce8leui7-1-u39se" onto node "ip-172-31-21-210.us-west-1.compute.internal" due to volume "pvc-a0add851-df46-11e6-b7a8-02b055e60362" (mismatch on "failure-domain.beta.kubernetes.io/zone") ```
Updated info: If I let the ansible installer fail. And, I clean up openshift-infra and logging projects (basically do the "cleanup" steps outlined in the docs); then do the following...create the yaml file: ``` kind: StorageClass apiVersion: storage.k8s.io/v1beta1 metadata: name: aws-ebs-slow annotations: storageclass.beta.kubernetes.io/is-default-class: "true" provisioner: kubernetes.io/aws-ebs parameters: type: gp2 zone: us-west-1b iopsPerGB: "100" encrypted: "false" ``` Create the storageclass with `oc create -f storage-class-default.yaml` ...I restart the installer at this point then it goes as expected.
QE has test Metrics deploy with cloud-provider enabled AWS instance in version 3.3.1 and later, and could bound dynamic pv to cassandra pod.
I am trying to figure out what the issue is here exactly. This does not sounds like its a metrics or logging issues, but an installation problem. @sdodson: why did you move this to metrics? From https://bugzilla.redhat.com/show_bug.cgi?id=1415297#c1 it sounds like the problem is that the StorageClass was not setup properly. Once its installed correctly, then the metrics and logging installation goes without issue. Is this suppose to be done automatically in the ansible install? or is this suppose to be a separate step that a user would be required to perform? I don't know if this is suppose to be an installation issue, a problem where our docs don't outline that you have to do this set, or just that the user forgot a step. I am reassigning this to the installer component. The metrics component is not responsible for setting up things like StorageClass
Additional thoughts. @mwringe I agree this is an installation issue. @all 1) I believe that the installer should create a storageclass if we are setting dynamic storage in the ansible host file IF NOT 2) The ansible installer should provide a mechanism for setting up storageclasses IF NOT 3) The VERY least there it should be stated on the docs that you need a storageclass HOWEVER, that's a "chicken/egg" problem because how can I set up storage class if OCP isn't installed?
Need to add configuration validation to ensure that if *_storage_type=dynamic then we also have a cloud provider defined.
Yes both were set.
comment 8 has details, implement in openshift_sanitize_inventory role
Per https://github.com/openshift/openshift-ansible/pull/5566 we should now fail the install if we are setting any storage kind == 'dynamic' without enabling dynamic provisioning.
1. The message [1] [2] will be reported if we setpenshift_hosted_logging_storage_kind=dynamic or openshift_hosted_metrics_storage_kind=dynamic [1] ok: [openshift-181.lab.eng.nay.redhat.com] => { "msg": [ "[DEPRECATION WARNING]: openshift_hosted_logging_storage_kind is a deprecated variable and will be no longer be used in the next minor release. Please update your inventory accordingly.", "[DEPRECATION WARNING]: openshift_hosted_metrics_storage_kind is a deprecated variable and will be no longer be used in the next minor release. Please update your inventory accordingly." ] } [2]TASK [openshift_sanitize_inventory : Ensure that dynamic provisioning is set if using dynamic storage] ************************************************************************************************************ task path: /usr/share/ansible/openshift-ansible/roles/openshift_sanitize_inventory/tasks/unsupported.yml:24 fatal: [openshift-182.lab.eng.nay.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": "Using a storage kind of 'dynamic' without enabling dynamic provisioning nor\nsetting a cloud provider will cause generated PVCs to not be able to bind as\nintended. Either update to not use a dynamic storage or set\nopenshift_master_dynamic_provisioning_enabled to True and set an\nopenshift_cloudprovider_kind. You can disable this check with\n'dynamic_volumes_check=False'."} fatal: [openshift-181.lab.eng.nay.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": "Using a storage kind of 'dynamic' without enabling dynamic provisioning nor\nsetting a cloud provider will cause generated PVCs to not be able to bind as\nintended. Either update to not use a dynamic storage or set\nopenshift_master_dynamic_provisioning_enabled to True and set an\nopenshift_cloudprovider_kind. You can disable this check with\n'dynamic_volumes_check=False'."} fatal: [openshift-217.lab.eng.nay.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": "Using a storage kind of 'dynamic' without enabling dynamic provisioning nor\nsetting a cloud provider will cause generated PVCs to not be able to bind as\nintended. Either update to not use a dynamic storage or set\nopenshift_master_dynamic_provisioning_enabled to True and set an\nopenshift_cloudprovider_kind. You can disable this check with\n'dynamic_volumes_check=False'."} fatal: [openshift-210.lab.eng.nay.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": "Using a storage kind of 'dynamic' without enabling dynamic provisioning nor\nsetting a cloud provider will cause generated PVCs to not be able to bind as\nintended. Either update to not use a dynamic storage or set\nopenshift_master_dynamic_provisioning_enabled to True and set an\nopenshift_cloudprovider_kind. You can disable this check with\n'dynamic_volumes_check=False'."} fatal: [openshift-226.lab.eng.nay.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": "Using a storage kind of 'dynamic' without enabling dynamic provisioning nor\nsetting a cloud provider will cause generated PVCs to not be able to bind as\nintended. Either update to not use a dynamic storage or set\nopenshift_master_dynamic_provisioning_enabled to True and set an\nopenshift_cloudprovider_kind. You can disable this check with\n'dynamic_volumes_check=False'."}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2017:3188