Description of problem: After launch IPI on OSP cluster, found endpoints are DOWN: 1936/tcp(router) 9537/tcp(crio metrics) 9101/tcp(sdn metrics) Version-Release number of the following components: 4.2.0-0.nightly-2019-09-05-234433 How reproducible: Always Steps to Reproduce: 1. Launch IPI on OSP cluster 2. check prometheus target dashboard 3. Actual results: Found 1936/tcp(router) 9537/tcp(crio metrics) 9101/tcp(sdn metrics) down. Expected results: Additional info:
We've aligned the OpenStack security group rules on AWS ones recently. The 9537/tcp (crio metrics) and 9101/tcp (sdn metrics) rules should have been added as part of https://github.com/openshift/installer/pull/2304 that merged a couple of days ago. For 1936/tcp (router), there is no such port open in AWS or GCP security groups. Does it need to be open?
I've opened port 1936 for the compute nodes in https://github.com/openshift/installer/pull/2347 and also tightened the security group rules to match AWS better.
Verified on 4.2.0-0.nightly-2019-09-15-052022 ➜ ~ oc -n openshift-monitoring exec prometheus-k8s-0 -c prometheus -- curl -s http://localhost:9090/api/v1/query\?query\=up%7Bjob%3D%22crio%22%7D%20or%20up%7Bjob%3D%22sdn%22%7D%20or%20up%7Bjob%3D%22router-internal-default%22%7D | json_reformat { "status": "success", "data": { "resultType": "vector", "result": [ { "metric": { "__name__": "up", "endpoint": "crio", "instance": "192.168.0.15:9537", "job": "crio", "namespace": "kube-system", "node": "share-0916c-8vp8z-master-2", "service": "kubelet" }, "value": [ 1568613183.383, "1" ] }, { "metric": { "__name__": "up", "endpoint": "crio", "instance": "192.168.0.17:9537", "job": "crio", "namespace": "kube-system", "node": "share-0916c-8vp8z-worker-sc4nc", "service": "kubelet" }, "value": [ 1568613183.383, "1" ] }, { "metric": { "__name__": "up", "endpoint": "crio", "instance": "192.168.0.18:9537", "job": "crio", "namespace": "kube-system", "node": "share-0916c-8vp8z-worker-ntvnv", "service": "kubelet" }, "value": [ 1568613183.383, "1" ] }, { "metric": { "__name__": "up", "endpoint": "crio", "instance": "192.168.0.25:9537", "job": "crio", "namespace": "kube-system", "node": "share-0916c-8vp8z-master-1", "service": "kubelet" }, "value": [ 1568613183.383, "1" ] }, { "metric": { "__name__": "up", "endpoint": "crio", "instance": "192.168.0.33:9537", "job": "crio", "namespace": "kube-system", "node": "share-0916c-8vp8z-worker-sv7x8", "service": "kubelet" }, "value": [ 1568613183.383, "1" ] }, { "metric": { "__name__": "up", "endpoint": "crio", "instance": "192.168.0.39:9537", "job": "crio", "namespace": "kube-system", "node": "share-0916c-8vp8z-master-0", "service": "kubelet" }, "value": [ 1568613183.383, "1" ] }, { "metric": { "__name__": "up", "endpoint": "metrics", "instance": "192.168.0.15:9101", "job": "sdn", "namespace": "openshift-sdn", "pod": "sdn-pnbxm", "service": "sdn" }, "value": [ 1568613183.383, "1" ] }, { "metric": { "__name__": "up", "endpoint": "metrics", "instance": "192.168.0.17:9101", "job": "sdn", "namespace": "openshift-sdn", "pod": "sdn-75l56", "service": "sdn" }, "value": [ 1568613183.383, "1" ] }, { "metric": { "__name__": "up", "endpoint": "metrics", "instance": "192.168.0.18:9101", "job": "sdn", "namespace": "openshift-sdn", "pod": "sdn-rkm9w", "service": "sdn" }, "value": [ 1568613183.383, "1" ] }, { "metric": { "__name__": "up", "endpoint": "metrics", "instance": "192.168.0.25:9101", "job": "sdn", "namespace": "openshift-sdn", "pod": "sdn-c4vbz", "service": "sdn" }, "value": [ 1568613183.383, "1" ] }, { "metric": { "__name__": "up", "endpoint": "metrics", "instance": "192.168.0.33:9101", "job": "sdn", "namespace": "openshift-sdn", "pod": "sdn-ngvhb", "service": "sdn" }, "value": [ 1568613183.383, "1" ] }, { "metric": { "__name__": "up", "endpoint": "metrics", "instance": "192.168.0.39:9101", "job": "sdn", "namespace": "openshift-sdn", "pod": "sdn-zm8r9", "service": "sdn" }, "value": [ 1568613183.383, "1" ] }, { "metric": { "__name__": "up", "endpoint": "metrics", "instance": "192.168.0.17:1936", "job": "router-internal-default", "namespace": "openshift-ingress", "pod": "router-default-594bb9c7cc-lb2v7", "service": "router-internal-default" }, "value": [ 1568613183.383, "1" ] }, { "metric": { "__name__": "up", "endpoint": "metrics", "instance": "192.168.0.18:1936", "job": "router-internal-default", "namespace": "openshift-ingress", "pod": "router-default-594bb9c7cc-qkhlv", "service": "router-internal-default" }, "value": [ 1568613183.383, "1" ] } ] } }
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922