Bug 1830158

Summary:

OpenShift 3.11 with Kuryr is missing a few security group rules to allow monitoring to work properly

Product:

OpenShift Container Platform

Reporter:

Mohammad <mahmad>

Component:

Networking

Assignee:

Michał Dulko <mdulko>

Networking sub component:

kuryr

QA Contact:

Jon Uriarte <juriarte>

Status:

CLOSED ERRATA

Docs Contact:

Severity:

low

Priority:

medium

CC:

alegrand, anpicker, bshirren, erooth, juriarte, kakkoyun, lcosic, ltomasbo, mdulko, mloibl, pkrupa, surbania

Version:

3.11.0

Target Milestone:

---

Target Release:

3.11.z

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Enhancement

Doc Text:

Feature: "Global" namespaces are the ones that in multitenant mode of openshfit-sdn are accessible from all the other namespaces and can access all other namespaces. To emulate that behavior with Kuryr's namespace isolation mode, an [namespace_sg]global_namespaces option was added to kuryr.conf. All the namespace names listed there will be configured to be global. This can be configured on openshift-ansible level as using kuryr_openstack_global_namespaces inventory setting. Reason: This was done to allow openshift-monitoring namespace to act like a global one on deployments with kuryr-kubernetes. Result: By default kuryr_openstack_global_namespaces is set to "default,openshift-monitoring". If you have logging enabled, you should add openshift-logging there too.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-05-28 05:44:13 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
BZ1830158_Cluster_Console_Error	none
Cluster console	none

Description Mohammad 2020-04-30 23:41:26 UTC

Description of problem: When installing Openshift 3.11 with Kuryr on OpenStack 13, after installation, a few ports are blocked preventing openshift-monitoring from fully functioning.

Initially, these are identified as following:

1. Access to port 9091 on the Prometheus-k8s service IP (issue resolved when we added port 9091 to the service ip load balancer security group.)

2. Access to port 4443 on logging-es-data-master pods running in openshift-logging (issue resolved when port 4443 security rule was added to ns/openshift-logging-sg)

We are still testing, but these are the two issues that stood out.

Version-Release number of selected component (if applicable): Currently did the installation with 3.11.200


How reproducible: Installed OpenShift 3.11.200 with Kuryr and the config in [1].


Steps to Reproduce:
1. Use instructions: https://docs.openshift.com/container-platform/4.2/installing/installing_openstack/installing-openstack-installer-kuryr.html with the config above to install OpenShift.


Actual results:

Even though OpenShift installs there are a few ports closed preventing prometheus monitoring from working 100%

Expected results:

The OpenShift monitoring suite should be fully functional.


Additional info:


1- Kuryr config used:

```

## Kuryr configuration START
openshift_use_kuryr: True
openshift_use_openshift_sdn: False
use_trunk_ports: True
os_sdn_network_plugin_name: cni
openshift_node_proxy_mode: userspace
kuryr_openstack_pool_driver: nested
openshift_kuryr_precreate_subports: 5

kuryr_openstack_ca: "{{custom_certs_dir}}/CA_Bundle.txt"

openshift_openstack_kuryr_controller_image: "satellite.mydomain:5000/myorg-ocp311-openshift3_kuryr-controller:{{ openshift_image_tag }}"
openshift_openstack_kuryr_cni_image: "satellite.mydomain:5000/myorg-ocp311-openshift3_kuryr-cni:{{ openshift_image_tag }}"

kuryr_openstack_public_net_id: 123456789

# To disable namespace isolation, comment out the next 2 lines
openshift_kuryr_subnet_driver: namespace
openshift_kuryr_sg_driver: namespace

openshift_master_open_ports:
- service: dns tcp
  port: 53/tcp
- service: dns udp
  port: 53/udp
openshift_node_open_ports:
- service: dns tcp
  port: 53/tcp
- service: dns udp
  port: 53/udp
# End of Kuryr configuration

#SERVICE
openshift_openstack_kuryr_service_subnet_cidr: XX.XX.128.0/18
openshift_openstack_kuryr_service_pool_start: XX.XX.128.100
openshift_openstack_kuryr_service_pool_end: XX.XX.191.200
#POD
openshift_openstack_kuryr_pod_subnet_cidr: XX.XX.0.0/17

## Kuryr configuration END


```

Comment 1 Mohammad 2020-05-01 00:37:06 UTC

Some further output:

Regarding point 1:

```

[root@master-2 ~]# oc logs -f console-6d9485c899-8qxlv
2020/04/30 10:57:25 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout
2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout
2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout
2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout
2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout

[root@master-2 ~]# oc get svc --all-namespaces |grep XX.XX.176.88
openshift-monitoring       prometheus-k8s                ClusterIP   XX.XX.176.88    <none>        9091/TCP                  1h
```

Comment 3 Michał Dulko 2020-05-13 15:55:16 UTC

Can you please help me understand what exactly is the blocked communication? In particular - is this internamespace connectivity? I don't have 3.11 setup handy and looking at outputs I'm a bit confused.

IIUC the blocked traffic in question is:

1. From console pods on openshift-console namespace to prometheus-k8s service on openshift-monitoring namespace on port 9091? Or are the console pods on default namespace?

2. What is trying to reach logging-es-data-master on openshfit-logging namespace? Console pods?

3. Opening 443 egress on console LB is to have the console accessible by the user?

Sorry for questions about topology, but recent changes broke our automation that deploys 3.11 and we're in the process of fixing it, so I don't have access to 3.11 env right now.

It is possible that this is another instance of the Neutron issues with remote_group_id.

Comment 4 Mohammad 2020-05-18 04:55:14 UTC

Regarding your questions:

1. It is the console pods on openshift-console namespace to prometheus-k8s service on openshift-monitoring namespace on port 9091
2. No it is kube-controllers prometheus targets. This can be observed from https://prometheus-k8s-openshift-monitoring.apps.openshift.mydomain/targets#job-kube-controllers
3. No it seems the console needs to access the api IP/port and this is why it was opened.

To provide further elaboration, after a clean install:

First rule is observed when kube-controllers endpoints cannot be accessed:
https://prometheus-k8s-openshift-monitoring.apps.openshift.mydomain/targets#job-kube-controllers

This fixes it:
openstack security group rule create ns/openshift-logging-sg --ingress --protocol tcp --dst-port 4443 --ethertype IPv4

The second rule shows errors when attempting to get "alerts firing" and "Crashlooping pods" as per image attached, which are shown from the url:

https://console.apps.openshift-mydomain/status/all-namespaces

After checking the logs in openshift-console, the issue is fixed by adding the rule:
openstack security group rule create $(openstack security group show lb-$(openstack loadbalancer show openshift-monitoring/prometheus-k8s -f value -c id) -f value -c id)  --ingress --protocol tcp --dst-port 9091 --ethertype IPv4 # this worked

Comment 5 Mohammad 2020-05-18 04:56:04 UTC

Created attachment 1689510 [details]
BZ1830158_Cluster_Console_Error

Comment 8 Jon Uriarte 2020-05-22 16:04:46 UTC

Created attachment 1691135 [details]
Cluster console

Comment 9 Jon Uriarte 2020-05-22 16:05:46 UTC

Verified in openshift-ansible-3.11.219 (2020-05-20.1) on top of OSP 13 2020-05-19.2.

openshift-monitoring namespace is set now as a global namespace:

[openshift@master-0 ~]$ oc -n kuryr get cm kuryr-config -o yaml
    ...
    [namespace_sg]
    sg_allow_from_namespaces = 7f8a582d-4ad5-455a-ae67-777e4ea447dd
    sg_allow_from_default = 7bb544a0-b7e7-432b-a030-d6ae668ba2a0
    global_namespaces = default,openshift-monitoring
    lbaas_activation_timeout = 1200
    ...


Allow from default security group:
$ openstack security group rule list 7bb544a0-b7e7-432b-a030-d6ae668ba2a0
+--------------------------------------+-------------+---------------+------------+-----------------------+
| ID                                   | IP Protocol | IP Range      | Port Range | Remote Security Group |
+--------------------------------------+-------------+---------------+------------+-----------------------+
| 0d1db300-82a2-4e55-b7ea-e2ba3875a410 | None        | 10.11.10.0/24 |            | None                  |
| 4cee77bf-001f-4a30-917d-f692d54de3b2 | None        | None          |            | None                  |
| 60ecd42c-4a4e-4ef5-8e07-f7392fb89af9 | None        | None          |            | None                  |
| 62849e68-dc84-42ea-a97e-dc152b663a67 | None        | 10.11.7.0/24  |            | None                  |
+--------------------------------------+-------------+---------------+------------+-----------------------+

It now allows the traffic from the default namespace/subnet and from the monitoring namespace/subnet.

$ openstack subnet list
+--------------------------------------+--------------------------------+--------------------------------------+-----------------+
| ID                                   | Name                           | Network                              | Subnet          |
+--------------------------------------+--------------------------------+--------------------------------------+-----------------+
...
| 10f253e3-25ab-4da8-a0cf-a16d829971ca | ns/openshift-monitoring-subnet | 21d5b0e0-f8e3-4509-9a06-d847208ad626 | 10.11.10.0/24   |
| fcab5cd1-2a47-4d34-959f-3a94e511f2ba | ns/default-subnet              | 14670064-c0ce-4689-8450-7f98a0ebdefb | 10.11.7.0/24    |
+--------------------------------------+--------------------------------+--------------------------------------+-----------------+


The prometheus-k8s LB now is reachable from the namespaces/subnets in the subnet pool 10.11.0.0/16:

$ openstack security group rule list f3571022-f1a4-43d2-acb2-a3a18673977b
+--------------------------------------+-------------+-----------------+------------+-----------------------+
| ID                                   | IP Protocol | IP Range        | Port Range | Remote Security Group |
+--------------------------------------+-------------+-----------------+------------+-----------------------+
| 0c594df8-2a38-4412-adb6-06045a92a758 | None        | None            |            | None                  |
| 3bee49d1-1654-4148-a649-6c749d11d3e3 | tcp         | 192.168.99.0/24 | 9091:9091  | None                  |
| 401c4750-3715-4476-ba18-e210f0b88035 | tcp         | 10.11.0.0/16    | 9091:9091  | None                  | <------
| 66aec901-11c0-4dec-92d5-c30acfb735e0 | tcp         | 10.11.10.0/24   | 9091:9091  | None                  |
| 7f5f68d4-1514-4289-839e-3438c66355dc | tcp         | None            | 1025:1025  | None                  |
| 9044c2d0-6fe9-4565-a2e7-8e397d9d6823 | None        | None            |            | None                  |
| f28f62dc-eb09-4d3b-9f01-6e836172295c | tcp         | 172.30.0.0/16   | 9091:9091  | None                  |
+--------------------------------------+-------------+-----------------+------------+-----------------------+


[openshift@master-0 ~]$ oc -n openshift-console logs -f console-5984d988c4-c7lwp
2020/05/22 08:51:16 cmd/main: cookies are secure!
2020/05/22 08:51:16 cmd/main: Binding to 0.0.0.0:8443...
2020/05/22 08:51:16 cmd/main: using TLS
2020/05/22 10:44:01 server: authentication failed: http: named cookie not present
2020/05/22 10:44:01 server: authentication failed: http: named cookie not present
2020/05/22 10:44:01 server: authentication failed: http: named cookie not present
2020/05/22 10:44:01 server: authentication failed: http: named cookie not present
2020/05/22 10:44:01 server: authentication failed: http: named cookie not present
2020/05/22 10:44:01 server: authentication failed: http: named cookie not present
2020/05/22 10:44:02 server: authentication failed: http: named cookie not present
2020/05/22 10:44:02 server: authentication failed: http: named cookie not present
2020/05/22 10:44:02 server: authentication failed: http: named cookie not present
2020/05/22 10:44:02 server: authentication failed: http: named cookie not present
2020/05/22 10:44:02 server: authentication failed: http: named cookie not present
2020/05/22 10:44:07 auth: oauth success, redirecting to: "https://console.apps.openshift.example.com/"

Comment 11 errata-xmlrpc 2020-05-28 05:44:13 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2215