Created attachment 1493168 [details] endpoint for alertmamager and alert-buffer are down Description of problem: Deploy prometheus v3.9.45-1 # oc -n openshift-metrics get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE prometheus-0 6/6 Running 0 3h 10.2.2.4 share3-wmengr76o39-master-etcd-2 prometheus-node-exporter-25v67 1/1 Running 0 3h 192.168.100.14 share3-wmengr76o39-nrri-1 prometheus-node-exporter-9v6gs 1/1 Running 0 3h 192.168.100.12 share3-wmengr76o39-master-etcd-3 prometheus-node-exporter-bkn67 1/1 Running 0 3h 192.168.100.20 share3-wmengr76o39-node-primary-3 prometheus-node-exporter-d9wfc 1/1 Running 0 3h 192.168.100.8 share3-wmengr76o39-node-primary-1 prometheus-node-exporter-fnngw 1/1 Running 0 3h 192.168.100.9 share3-wmengr76o39-nrri-2 prometheus-node-exporter-g7km9 1/1 Running 0 3h 192.168.100.4 share3-wmengr76o39-master-etcd-1 prometheus-node-exporter-jlf2v 1/1 Running 0 3h 192.168.100.16 share3-wmengr76o39-node-primary-2 prometheus-node-exporter-k986p 1/1 Running 0 3h 192.168.100.7 share3-wmengr76o39-master-etcd-2 Checked the targets, Endpoints for alertmamager and alert-buffer are down target for alertmamager and alert-buffer gave HTTP response to HTTPS client. # oc -n openshift-metrics rsh prometheus-0 sh-4.2$ curl -k https://10.2.2.4:9093/metrics curl: (35) SSL received a record that exceeded the maximum permissible length. Test with http, thers are metrics output sh-4.2$ curl -k http://10.2.2.4:9093/metrics # HELP alertmanager_alerts How many alerts by state. # TYPE alertmanager_alerts gauge alertmanager_alerts{state="active"} 0 alertmanager_alerts{state="suppressed"} 0 # HELP alertmanager_alerts_invalid_total The total number of received alerts that were invalid. # TYPE alertmanager_alerts_invalid_total counter alertmanager_alerts_invalid_total 0 # HELP alertmanager_build_info A metric with a constant '1' value labeled by version, revision, branch, and goversion from which alertmanager was built. # TYPE alertmanager_build_info gauge ................................................................................ Version-Release number of selected component (if applicable): prometheus v3.9.45-1 How reproducible: Always Steps to Reproduce: 1. Deploy prometheus v3.9.45-1 and check /targets page 2. 3. Actual results: endpoint for alertmamager and alert-buffer gave HTTP response to HTTPS client Expected results: endpoint should are in UP state Additional info:
This issue only happen with prometheus 3.9, version above 3.10 does not scape alertmamager and alert-buffer
This is due to prometheus automatically discovering the container ports listed in the stateful set config. https://github.com/openshift/openshift-ansible/pull/10424
endpoints for alertmamager and alert-buffer are removed openshift-ansible: openshift-ansible-3.9.49-1
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3748