Bug 1698201
| Summary: | Prometheus is unable to scrape control plane components | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Derek Carr <decarr> |
| Component: | Master | Assignee: | Michal Fojtik <mfojtik> |
| Status: | CLOSED ERRATA | QA Contact: | Xingxing Xia <xxia> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.1.0 | CC: | anpicker, aos-bugs, erooth, fbranczy, jokerman, juzhao, mloibl, mmccomas, pkrupa, sjenning, surbania, xxia |
| Target Milestone: | --- | ||
| Target Release: | 4.1.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2019-06-04 10:47:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Derek Carr
2019-04-09 19:43:45 UTC
We should create BZ for each of these components as each team owns their respective monitoring components. We already have an open PR to add checks to the e2e-aws test suite so this doesn't happen again once fixed: https://github.com/openshift/origin/pull/22513. The controller-manager/scheduler is due to the insecure port being disabled without migration. This is being fixed in https://github.com/openshift/cluster-monitoring-operator/pull/316 on the cluster-monitoring-operator, and the final fix for scraping to work is: https://github.com/openshift/installer/pull/1576. I'm opening separate bugs for OLM, catalog and SDN and will keep this bugzilla dedicated for scheduler/controller-manager. SDN: https://bugzilla.redhat.com/show_bug.cgi?id=1698525 catalog-operator: https://bugzilla.redhat.com/show_bug.cgi?id=1698530 olm-operator: https://bugzilla.redhat.com/show_bug.cgi?id=1698533 Reassigning this to the master team, as we've done everything on our side and we need their support now. *** Bug 1698722 has been marked as a duplicate of this bug. *** As part of the fix for this. Re-enable the e2e tests for this in the cluster-monitoring-operator repo, that were disabled here: https://github.com/openshift/cluster-monitoring-operator/pull/318 In the future we will also make sure that this configuration is handled by the controller-manager-operator and scheduler-operator respectively, and remove any dependency for this from the cluster-monitoring-operator, as each component owns their own monitoring configuration. That's planned for 4.2, for 4.1 we just want working metrics for core components back :) controller manager moved to secure port after rebase: openshift-monitoring/kube-controller-manager/0 (0/3 up) Get http://10.0.139.202:10252/metrics: dial tcp 10.0.139.202:10252: connect: connection refused It is 10257 now (and HTTPS). *** Bug 1700060 has been marked as a duplicate of this bug. *** Moving to POST as all the PRs required to fix this have been opened. Status update: All PRs on the monitoring repos have been merged. The installer PR to open the new ports is still outstanding (https://github.com/openshift/installer/pull/1576). And once that's merged the e2e test to verify that scheduler and controller-manager metrics are collected have to be disabled. As monitoring involvement is done, moving back to master component. Manually testing a cluster from the above installer PR show that this is the last thing for this to be resolved, but should there be any more issues with the cluster-monitoring-operator side, please feel free to move this back to us. Thanks Frederic, I will move this ON_QA when the installer pull merge. Verified in env of payload 4.1.0-0.nightly-2019-04-25-002910. Now openshift-monitoring/kube-controller-manager target items are UP with 10257 port and have no the issue. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758 |