Bug 1565405
Summary: | the Prometheus ansible installer playbook does not take into account a multi AZ deployment | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | raffaele spazzoli <rspazzol> |
Component: | Monitoring | Assignee: | Zohar Gal-Or <zgalor> |
Status: | CLOSED WONTFIX | QA Contact: | Junqi Zhao <juzhao> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 3.9.0 | CC: | aos-bugs, fbranczy, spasquie |
Target Milestone: | --- | ||
Target Release: | 3.11.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-09-05 20:44:59 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1622945, 1625817 | ||
Bug Blocks: |
Description
raffaele spazzoli
2018-04-10 00:45:34 UTC
In 3.11 the cluster-monitoring stack deploys all components as separate Pods, so this will not be an issue anymore. Issue is not fixed,prometheus and prometheus-alertmanager PVs are in the same zone, but prometheus-alertbuffer is in another zone prometheus images version:v3.11.0-0.25.0 openshift-ansible-3.11.0-0.25.0.git.0.7497e69.el7.noarch # oc get pod -n openshift-metrics NAME READY STATUS RESTARTS AGE prometheus-0 0/6 Pending 0 16m prometheus-node-exporter-6bndx 1/1 Running 0 16m prometheus-node-exporter-lnsx9 1/1 Running 0 16m prometheus-node-exporter-m78zx 1/1 Running 0 16m prometheus-node-exporter-mlzcc 1/1 Running 0 16m prometheus-node-exporter-v74tk 1/1 Running 0 16m # oc describe po prometheus-0 -n openshift-metrics ************************snip************************************************** Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 2m (x484 over 17m) default-scheduler 0/5 nodes are available: 1 node(s) didn't match node selector, 4 node(s) had no available volume zone. ************************snip************************************************** # oc get pv | grep prometheus pvc-d25f0e84-ab76-11e8-9515-0ede6b3c22da 10Gi RWO Delete Bound openshift-metrics/prometheus gp2 18m pvc-d43e82be-ab76-11e8-9515-0ede6b3c22da 10Gi RWO Delete Bound openshift-metrics/prometheus-alertmanager gp2 18m pvc-d66a1d85-ab76-11e8-9515-0ede6b3c22da 10Gi RWO Delete Bound openshift-metrics/prometheus-alertbuffer gp2 18m # oc get pv pvc-d25f0e84-ab76-11e8-9515-0ede6b3c22da -o yaml ************************snip************************************************** labels: failure-domain.beta.kubernetes.io/region: us-east-1 failure-domain.beta.kubernetes.io/zone: us-east-1d ************************snip************************************************** # oc get pv pvc-d43e82be-ab76-11e8-9515-0ede6b3c22da -o yaml ************************snip************************************************** labels: failure-domain.beta.kubernetes.io/region: us-east-1 failure-domain.beta.kubernetes.io/zone: us-east-1d ************************snip************************************************** # oc get pv pvc-d66a1d85-ab76-11e8-9515-0ede6b3c22da -o yaml ************************snip************************************************** labels: failure-domain.beta.kubernetes.io/region: us-east-1 failure-domain.beta.kubernetes.io/zone: us-east-1c ************************snip************************************************** # oc get sc NAME PROVISIONER AGE gp2 (default) kubernetes.io/aws-ebs 2h # oc get sc gp2 -o yaml apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: annotations: storageclass.beta.kubernetes.io/is-default-class: "true" creationTimestamp: 2018-08-29T08:17:50Z name: gp2 resourceVersion: "2075" selfLink: /apis/storage.k8s.io/v1/storageclasses/gp2 uid: 04ecf704-ab64-11e8-9515-0ede6b3c22da parameters: encrypted: "false" kmsKeyId: "" type: gp2 provisioner: kubernetes.io/aws-ebs reclaimPolicy: Delete volumeBindingMode: Immediate workaround please see https://bugzilla.redhat.com/show_bug.cgi?id=1554921#c20 I'm marking this won't fix as we are not going to fix this for the deprecated tech preview stack, but it's going to be solved in the new Prometheus based cluster monitoring stack. |