Bug 1749714

Summary: [IPI] [OSP] sg-worker lack 1936/tcp(router) 9537/tcp(crio metrics) 9101/tcp(sdn metrics) for prometheus pods
Product: OpenShift Container Platform Reporter: weiwei jiang <wjiang>
Component: InstallerAssignee: Martin André <m.andre>
Installer sub component: OpenShift on OpenStack QA Contact: David Sanz <dsanzmor>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: eduen, xtian
Version: 4.2.0   
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-10-16 06:40:35 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description weiwei jiang 2019-09-06 10:04:05 UTC
Description of problem:
After launch IPI on OSP cluster, found endpoints are DOWN:
1936/tcp(router) 9537/tcp(crio metrics) 9101/tcp(sdn metrics) 

Version-Release number of the following components:
4.2.0-0.nightly-2019-09-05-234433

How reproducible:
Always

Steps to Reproduce:
1. Launch IPI on OSP cluster
2. check prometheus target dashboard
3.

Actual results:
Found 1936/tcp(router) 9537/tcp(crio metrics) 9101/tcp(sdn metrics) down.
Expected results:

Additional info:

Comment 3 Martin André 2019-09-11 18:18:28 UTC
We've aligned the OpenStack security group rules on AWS ones recently.
The 9537/tcp (crio metrics) and 9101/tcp (sdn metrics) rules should have been added as part of https://github.com/openshift/installer/pull/2304 that merged a couple of days ago.

For 1936/tcp (router), there is no such port open in AWS or GCP security groups. Does it need to be open?

Comment 4 Martin André 2019-09-12 08:54:25 UTC
I've opened port 1936 for the compute nodes in https://github.com/openshift/installer/pull/2347 and also tightened the security group rules to match AWS better.

Comment 6 weiwei jiang 2019-09-16 05:54:58 UTC
Verified on 4.2.0-0.nightly-2019-09-15-052022

➜  ~ oc -n openshift-monitoring exec prometheus-k8s-0 -c prometheus -- curl -s http://localhost:9090/api/v1/query\?query\=up%7Bjob%3D%22crio%22%7D%20or%20up%7Bjob%3D%22sdn%22%7D%20or%20up%7Bjob%3D%22router-internal-default%22%7D | json_reformat 
{
    "status": "success",
    "data": {
        "resultType": "vector",
        "result": [
            {
                "metric": {
                    "__name__": "up",
                    "endpoint": "crio",
                    "instance": "192.168.0.15:9537",
                    "job": "crio",
                    "namespace": "kube-system",
                    "node": "share-0916c-8vp8z-master-2",
                    "service": "kubelet"
                },
                "value": [
                    1568613183.383,
                    "1"
                ]
            },
            {
                "metric": {
                    "__name__": "up",
                    "endpoint": "crio",
                    "instance": "192.168.0.17:9537",
                    "job": "crio",
                    "namespace": "kube-system",
                    "node": "share-0916c-8vp8z-worker-sc4nc",
                    "service": "kubelet"
                },
                "value": [
                    1568613183.383,
                    "1"
                ]
            },
            {
                "metric": {
                    "__name__": "up",
                    "endpoint": "crio",
                    "instance": "192.168.0.18:9537",
                    "job": "crio",
                    "namespace": "kube-system",
                    "node": "share-0916c-8vp8z-worker-ntvnv",
                    "service": "kubelet"
                },
                "value": [
                    1568613183.383,
                    "1"
                ]
            },
            {
                "metric": {
                    "__name__": "up",
                    "endpoint": "crio",
                    "instance": "192.168.0.25:9537",
                    "job": "crio",
                    "namespace": "kube-system",
                    "node": "share-0916c-8vp8z-master-1",
                    "service": "kubelet"
                },
                "value": [
                    1568613183.383,
                    "1"
                ]
            },
            {
                "metric": {
                    "__name__": "up",
                    "endpoint": "crio",
                    "instance": "192.168.0.33:9537",
                    "job": "crio",
                    "namespace": "kube-system",
                    "node": "share-0916c-8vp8z-worker-sv7x8",
                    "service": "kubelet"
                },
                "value": [
                    1568613183.383,
                    "1"
                ]
            },
            {
                "metric": {
                    "__name__": "up",
                    "endpoint": "crio",
                    "instance": "192.168.0.39:9537",
                    "job": "crio",
                    "namespace": "kube-system",
                    "node": "share-0916c-8vp8z-master-0",
                    "service": "kubelet"
                },
                "value": [
                    1568613183.383,
                    "1"
                ]
            },
            {
                "metric": {
                    "__name__": "up",
                    "endpoint": "metrics",
                    "instance": "192.168.0.15:9101",
                    "job": "sdn",
                    "namespace": "openshift-sdn",
                    "pod": "sdn-pnbxm",
                    "service": "sdn"
                },
                "value": [
                    1568613183.383,
                    "1"
                ]
            },
            {
                "metric": {
                    "__name__": "up",
                    "endpoint": "metrics",
                    "instance": "192.168.0.17:9101",
                    "job": "sdn",
                    "namespace": "openshift-sdn",
                    "pod": "sdn-75l56",
                    "service": "sdn"
                },
                "value": [
                    1568613183.383,
                    "1"
                ]
            },
            {
                "metric": {
                    "__name__": "up",
                    "endpoint": "metrics",
                    "instance": "192.168.0.18:9101",
                    "job": "sdn",
                    "namespace": "openshift-sdn",
                    "pod": "sdn-rkm9w",
                    "service": "sdn"
                },
                "value": [
                    1568613183.383,
                    "1"
                ]
            },
            {
                "metric": {
                    "__name__": "up",
                    "endpoint": "metrics",
                    "instance": "192.168.0.25:9101",
                    "job": "sdn",
                    "namespace": "openshift-sdn",
                    "pod": "sdn-c4vbz",
                    "service": "sdn"
                },
                "value": [
                    1568613183.383,
                    "1"
                ]
            },
            {
                "metric": {
                    "__name__": "up",
                    "endpoint": "metrics",
                    "instance": "192.168.0.33:9101",
                    "job": "sdn",
                    "namespace": "openshift-sdn",
                    "pod": "sdn-ngvhb",
                    "service": "sdn"
                },
                "value": [
                    1568613183.383,
                    "1"
                ]
            },
            {
                "metric": {
                    "__name__": "up",
                    "endpoint": "metrics",
                    "instance": "192.168.0.39:9101",
                    "job": "sdn",
                    "namespace": "openshift-sdn",
                    "pod": "sdn-zm8r9",
                    "service": "sdn"
                },
                "value": [
                    1568613183.383,
                    "1"
                ]
            },
            {
                "metric": {
                    "__name__": "up",
                    "endpoint": "metrics",
                    "instance": "192.168.0.17:1936",
                    "job": "router-internal-default",
                    "namespace": "openshift-ingress",
                    "pod": "router-default-594bb9c7cc-lb2v7",
                    "service": "router-internal-default"
                },
                "value": [
                    1568613183.383,
                    "1"
                ]
            },
            {
                "metric": {
                    "__name__": "up",
                    "endpoint": "metrics",
                    "instance": "192.168.0.18:1936",
                    "job": "router-internal-default",
                    "namespace": "openshift-ingress",
                    "pod": "router-default-594bb9c7cc-qkhlv",
                    "service": "router-internal-default"
                },
                "value": [
                    1568613183.383,
                    "1"
                ]
            }
        ]
    }
}

Comment 7 errata-xmlrpc 2019-10-16 06:40:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922