Bug 1565405
| Summary: | the Prometheus ansible installer playbook does not take into account a multi AZ deployment | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | raffaele spazzoli <rspazzol> |
| Component: | Monitoring | Assignee: | Zohar Gal-Or <zgalor> |
| Status: | CLOSED WONTFIX | QA Contact: | Junqi Zhao <juzhao> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.9.0 | CC: | aos-bugs, fbranczy, spasquie |
| Target Milestone: | --- | ||
| Target Release: | 3.11.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2018-09-05 20:44:59 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1622945, 1625817 | ||
| Bug Blocks: | |||
|
Description
raffaele spazzoli
2018-04-10 00:45:34 UTC
In 3.11 the cluster-monitoring stack deploys all components as separate Pods, so this will not be an issue anymore. Issue is not fixed,prometheus and prometheus-alertmanager PVs are in the same zone, but prometheus-alertbuffer is in another zone
prometheus images version:v3.11.0-0.25.0
openshift-ansible-3.11.0-0.25.0.git.0.7497e69.el7.noarch
# oc get pod -n openshift-metrics
NAME READY STATUS RESTARTS AGE
prometheus-0 0/6 Pending 0 16m
prometheus-node-exporter-6bndx 1/1 Running 0 16m
prometheus-node-exporter-lnsx9 1/1 Running 0 16m
prometheus-node-exporter-m78zx 1/1 Running 0 16m
prometheus-node-exporter-mlzcc 1/1 Running 0 16m
prometheus-node-exporter-v74tk 1/1 Running 0 16m
# oc describe po prometheus-0 -n openshift-metrics
************************snip**************************************************
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m (x484 over 17m) default-scheduler 0/5 nodes are available: 1 node(s) didn't match node selector, 4 node(s) had no available volume zone.
************************snip**************************************************
# oc get pv | grep prometheus
pvc-d25f0e84-ab76-11e8-9515-0ede6b3c22da 10Gi RWO Delete Bound openshift-metrics/prometheus gp2 18m
pvc-d43e82be-ab76-11e8-9515-0ede6b3c22da 10Gi RWO Delete Bound openshift-metrics/prometheus-alertmanager gp2 18m
pvc-d66a1d85-ab76-11e8-9515-0ede6b3c22da 10Gi RWO Delete Bound openshift-metrics/prometheus-alertbuffer gp2 18m
# oc get pv pvc-d25f0e84-ab76-11e8-9515-0ede6b3c22da -o yaml
************************snip**************************************************
labels:
failure-domain.beta.kubernetes.io/region: us-east-1
failure-domain.beta.kubernetes.io/zone: us-east-1d
************************snip**************************************************
# oc get pv pvc-d43e82be-ab76-11e8-9515-0ede6b3c22da -o yaml
************************snip**************************************************
labels:
failure-domain.beta.kubernetes.io/region: us-east-1
failure-domain.beta.kubernetes.io/zone: us-east-1d
************************snip**************************************************
# oc get pv pvc-d66a1d85-ab76-11e8-9515-0ede6b3c22da -o yaml
************************snip**************************************************
labels:
failure-domain.beta.kubernetes.io/region: us-east-1
failure-domain.beta.kubernetes.io/zone: us-east-1c
************************snip**************************************************
# oc get sc
NAME PROVISIONER AGE
gp2 (default) kubernetes.io/aws-ebs 2h
# oc get sc gp2 -o yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
annotations:
storageclass.beta.kubernetes.io/is-default-class: "true"
creationTimestamp: 2018-08-29T08:17:50Z
name: gp2
resourceVersion: "2075"
selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2
uid: 04ecf704-ab64-11e8-9515-0ede6b3c22da
parameters:
encrypted: "false"
kmsKeyId: ""
type: gp2
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Delete
volumeBindingMode: Immediate
workaround please see https://bugzilla.redhat.com/show_bug.cgi?id=1554921#c20 I'm marking this won't fix as we are not going to fix this for the deprecated tech preview stack, but it's going to be solved in the new Prometheus based cluster monitoring stack. |