Bug 1415297 - Metrics does not install with cloud-provider and dynamic storage
Summary: Metrics does not install with cloud-provider and dynamic storage
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.4.0
Hardware: Unspecified
OS: Linux
unspecified
medium
Target Milestone: ---
: 3.7.0
Assignee: ewolinet
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-20 19:35 UTC by Christian Hernandez
Modified: 2018-07-19 02:54 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2017-11-28 21:52:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:3188 0 normal SHIPPED_LIVE Moderate: Red Hat OpenShift Container Platform 3.7 security, bug, and enhancement update 2017-11-29 02:34:54 UTC

Description Christian Hernandez 2017-01-20 19:35:22 UTC
Description of problem:

I am installing OCP v3.4 on AWS using the cloud provider and using ansible to install metrics with dynamic storage.

The cassandra pod is in a pending state

```
[ec2-user@ip-172-31-21-210 ~]$ oc get pods
NAME                         READY     STATUS    RESTARTS   AGE
hawkular-cassandra-1-xxhif   0/1       Pending   0          7m
hawkular-metrics-q6eph       0/1       Running   0          7m
heapster-52m24               0/1       Running   0          7m
metrics-deployer-qcvzl       1/1       Running   0          7m
```

Looking at the logs it shows a zone issue

```
[ec2-user@ip-172-31-21-210 ~]$ journalctl -f | grep cass
Jan 20 14:21:30 ip-172-31-21-210.us-west-1.compute.internal atomic-openshift-master[25687]: I0120 14:21:30.133534   25687 predicates.go:410] Won't schedule pod "hawkular-cassandra-1-xxhif" onto node "ip-172-31-21-210.us-west-1.compute.internal" due to volume "pvc-002fa207-df45-11e6-b58c-02b055e60362" (mismatch on "failure-domain.beta.kubernetes.io/zone")
```

This leaves the cassandra pod in an always "pending" state. I did notice that no storage class was setup.

```
[ec2-user@ip-172-31-21-210 ~]$ oc get storageclass 
No resources found.
```

I used the following in my ansible host file

```
openshift_hosted_metrics_public_url=https://hawkular.apps.54.67.62.196.xip.io/hawkular/metrics
openshift_hosted_metrics_deploy=true
openshift_hosted_metrics_storage_kind=dynamic
openshift_hosted_metrics_deployer_version=v3.4
```

Here is my full ansible host file: https://paste.fedoraproject.org/531769/84940752/

Version-Release number of selected component (if applicable):

oc v3.4.0.39
kubernetes v1.4.0+776c994
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ip-172-31-21-210.us-west-1.compute.internal:8443
openshift v3.4.0.39
kubernetes v1.4.0+776c994


How reproducible:

Always


Steps to Reproduce:
1. Spin up AWS instance
2. Set up ansible host file with aws cloud-provider with dynamic storage for metrics and logging
3. Run the installer

Actual results:

Metrics and logging pods stay in "pending state"

Expected results:

Metrics and logging pods use EBS storage provided by OCP and AWS

Additional info:

Seems to be the same for logging

```
Jan 20 14:30:42 ip-172-31-21-210.us-west-1.compute.internal atomic-openshift-master[50163]: I0120 14:30:42.730405   50163 predicates.go:410] Won't schedule pod "logging-es-ce8leui7-1-u39se" onto node "ip-172-31-21-210.us-west-1.compute.internal" due to volume "pvc-a0add851-df46-11e6-b7a8-02b055e60362" (mismatch on "failure-domain.beta.kubernetes.io/zone")
```

Comment 1 Christian Hernandez 2017-01-20 20:24:56 UTC
Updated info:

If I let the ansible installer fail. And, I clean up openshift-infra and logging projects (basically do the "cleanup" steps outlined in the docs); then do the following...create the yaml file:

```
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
  name: aws-ebs-slow
  annotations:
    storageclass.beta.kubernetes.io/is-default-class: "true"
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  zone: us-west-1b
  iopsPerGB: "100" 
  encrypted: "false"
```

Create the storageclass with `oc create -f storage-class-default.yaml` ...I restart the installer at this point then it goes as expected.

Comment 4 Peng Li 2017-02-13 09:52:20 UTC
QE has test Metrics deploy with cloud-provider enabled AWS instance in version 3.3.1 and later, and could bound dynamic pv to cassandra pod.

Comment 5 Matt Wringe 2017-02-17 15:42:23 UTC
I am trying to figure out what the issue is here exactly.

This does not sounds like its a metrics or logging issues, but an installation problem.

@sdodson: why did you move this to metrics?

From https://bugzilla.redhat.com/show_bug.cgi?id=1415297#c1 it sounds like the problem is that the StorageClass was not setup properly. Once its installed correctly, then the metrics and logging installation goes without issue.

Is this suppose to be done automatically in the ansible install? or is this suppose to be a separate step that a user would be required to perform?

I don't know if this is suppose to be an installation issue, a problem where our docs don't outline that you have to do this set, or just that the user forgot a step.

I am reassigning this to the installer component. The metrics component is not responsible for setting up things like StorageClass

Comment 6 Christian Hernandez 2017-02-17 19:01:34 UTC
Additional thoughts.

@mwringe I agree this is an installation issue.

@all

1) I believe that the installer should create a storageclass if we are setting dynamic storage in the ansible host file

IF NOT

2) The ansible installer should provide a mechanism for setting up storageclasses

IF NOT

3) The VERY least there it should be stated on the docs that you need a storageclass

HOWEVER, that's a "chicken/egg" problem because how can I set up storage class if OCP isn't installed?

Comment 8 Scott Dodson 2017-06-12 14:25:26 UTC
Need to add configuration validation to ensure that if *_storage_type=dynamic then we also have a cloud provider defined.

Comment 10 Christian Hernandez 2017-06-12 15:56:41 UTC
Yes both were set.

Comment 12 Scott Dodson 2017-06-26 14:15:17 UTC
comment 8 has details, implement in openshift_sanitize_inventory role

Comment 13 ewolinet 2017-09-27 22:08:16 UTC
Per https://github.com/openshift/openshift-ansible/pull/5566 we should now fail the install if we are setting any storage kind == 'dynamic' without enabling dynamic provisioning.

Comment 15 Anping Li 2017-10-17 15:49:01 UTC
1.  The message [1] [2] will be reported if we setpenshift_hosted_logging_storage_kind=dynamic or openshift_hosted_metrics_storage_kind=dynamic

[1]
ok: [openshift-181.lab.eng.nay.redhat.com] => {
    "msg": [
        "[DEPRECATION WARNING]: openshift_hosted_logging_storage_kind is a deprecated variable and will be no longer be used in the next minor release. Please update your inventory accordingly.", 
        "[DEPRECATION WARNING]: openshift_hosted_metrics_storage_kind is a deprecated variable and will be no longer be used in the next minor release. Please update your inventory accordingly."
    ]
}


[2]TASK [openshift_sanitize_inventory : Ensure that dynamic provisioning is set if using dynamic storage] ************************************************************************************************************
task path: /usr/share/ansible/openshift-ansible/roles/openshift_sanitize_inventory/tasks/unsupported.yml:24
fatal: [openshift-182.lab.eng.nay.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": "Using a storage kind of 'dynamic' without enabling dynamic provisioning nor\nsetting a cloud provider will cause generated PVCs to not be able to bind as\nintended. Either update to not use a dynamic storage or set\nopenshift_master_dynamic_provisioning_enabled to True and set an\nopenshift_cloudprovider_kind. You can disable this check with\n'dynamic_volumes_check=False'."}
fatal: [openshift-181.lab.eng.nay.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": "Using a storage kind of 'dynamic' without enabling dynamic provisioning nor\nsetting a cloud provider will cause generated PVCs to not be able to bind as\nintended. Either update to not use a dynamic storage or set\nopenshift_master_dynamic_provisioning_enabled to True and set an\nopenshift_cloudprovider_kind. You can disable this check with\n'dynamic_volumes_check=False'."}
fatal: [openshift-217.lab.eng.nay.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": "Using a storage kind of 'dynamic' without enabling dynamic provisioning nor\nsetting a cloud provider will cause generated PVCs to not be able to bind as\nintended. Either update to not use a dynamic storage or set\nopenshift_master_dynamic_provisioning_enabled to True and set an\nopenshift_cloudprovider_kind. You can disable this check with\n'dynamic_volumes_check=False'."}
fatal: [openshift-210.lab.eng.nay.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": "Using a storage kind of 'dynamic' without enabling dynamic provisioning nor\nsetting a cloud provider will cause generated PVCs to not be able to bind as\nintended. Either update to not use a dynamic storage or set\nopenshift_master_dynamic_provisioning_enabled to True and set an\nopenshift_cloudprovider_kind. You can disable this check with\n'dynamic_volumes_check=False'."}
fatal: [openshift-226.lab.eng.nay.redhat.com]: FAILED! => {"changed": false, "failed": true, "msg": "Using a storage kind of 'dynamic' without enabling dynamic provisioning nor\nsetting a cloud provider will cause generated PVCs to not be able to bind as\nintended. Either update to not use a dynamic storage or set\nopenshift_master_dynamic_provisioning_enabled to True and set an\nopenshift_cloudprovider_kind. You can disable this check with\n'dynamic_volumes_check=False'."}

Comment 19 errata-xmlrpc 2017-11-28 21:52:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3188


Note You need to log in before you can comment on or make changes to this bug.