Bug 1848833
| Summary: | 1 of 2 prometheus-user-workload pods fails to run | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Daneyon Hansen <dhansen> |
| Component: | Monitoring | Assignee: | Pawel Krupa <pkrupa> |
| Status: | CLOSED DUPLICATE | QA Contact: | Junqi Zhao <juzhao> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.3.z | CC: | alegrand, anpicker, erooth, kakkoyun, lcosic, mloibl, pkrupa, surbania |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-06-19 07:01:27 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
*** This bug has been marked as a duplicate of bug 1848450 *** |
Description of problem: 1 of 2 prometheus-user-workload pods fail to run when trying to monitor my own service. The 2nd pod Version-Release number of selected component (if applicable): 4.3.19 How reproducible: Always Steps to Reproduce: 1. Create a cluster 2. Follow product docs [1] to monitor my own service Actual results: $ oc -n openshift-user-workload-monitoring get pod NAME READY STATUS RESTARTS AGE prometheus-operator-756b9cbd89-zx6w6 1/1 Running 0 5h28m prometheus-user-workload-0 0/5 Pending 0 4h23m prometheus-user-workload-1 5/5 Running 1 4h23m Expected results: $ oc -n openshift-user-workload-monitoring get pod NAME READY STATUS RESTARTS AGE prometheus-operator-756b9cbd89-zx6w6 1/1 Running 0 5h28m prometheus-user-workload-0 5/5 Running 0 4h23m prometheus-user-workload-1 5/5 Running 1 4h23m Additional info: $ oc -n openshift-user-workload-monitoring describe po/prometheus-user-workload-0 <SNIP> Warning FailedScheduling <unknown> default-scheduler 0/6 nodes are available: 3 Insufficient cpu, 3 node(s) had taints that the pod didn't tolerate. Worker nodes have no taints and master nodes have the following taint: taints: - effect: NoSchedule key: node-role.kubernetes.io/master Other than control plane pods, only my test app pod is running: $ oc get po -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES network-check-9ffbdd476-8lp96 1/1 Running 0 4h57m 10.131.0.20 ip-10-0-159-58.us-west-2.compute.internal <none> <none> No worker node is reporting memory, disk or PID pressure. See [2] for details of worker nodes. The 2nd pod runs after making master nodes schedulable: $ oc -n openshift-user-workload-monitoring get pod NAME READY STATUS RESTARTS AGE prometheus-operator-756b9cbd89-zx6w6 1/1 Running 0 5h45m prometheus-user-workload-0 5/5 Running 1 4h40m prometheus-user-workload-1 5/5 Running 1 4h40m [1] https://docs.openshift.com/container-platform/4.3/monitoring/monitoring-your-own-services.html#creating-a-role-for-setting-up-metrics-collection_monitoring-your-own-services [2] https://gist.githubusercontent.com/danehans/3b075e36cf65184ffacc0569103e25d1/raw/7b874d05c15c6eb0f2515fe65549db31645459c3/02_worker_node_details