Bug 1830158 - OpenShift 3.11 with Kuryr is missing a few security group rules to allow monitoring to work properly
Summary: OpenShift 3.11 with Kuryr is missing a few security group rules to allow moni...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 3.11.0
Hardware: x86_64
OS: Linux
medium
low
Target Milestone: ---
: 3.11.z
Assignee: Michał Dulko
QA Contact: Jon Uriarte
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-04-30 23:41 UTC by Mohammad
Modified: 2023-10-06 19:51 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: "Global" namespaces are the ones that in multitenant mode of openshfit-sdn are accessible from all the other namespaces and can access all other namespaces. To emulate that behavior with Kuryr's namespace isolation mode, an [namespace_sg]global_namespaces option was added to kuryr.conf. All the namespace names listed there will be configured to be global. This can be configured on openshift-ansible level as using kuryr_openstack_global_namespaces inventory setting. Reason: This was done to allow openshift-monitoring namespace to act like a global one on deployments with kuryr-kubernetes. Result: By default kuryr_openstack_global_namespaces is set to "default,openshift-monitoring". If you have logging enabled, you should add openshift-logging there too.
Clone Of:
Environment:
Last Closed: 2020-05-28 05:44:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
BZ1830158_Cluster_Console_Error (37.58 KB, image/png)
2020-05-18 04:56 UTC, Mohammad
no flags Details
Cluster console (28.40 KB, image/png)
2020-05-22 16:04 UTC, Jon Uriarte
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github openshift kuryr-kubernetes pull 232 0 None closed Bug 1838001: Remove remote_group_id usage at loadbalancer SGs 2020-09-16 11:51:56 UTC
Github openshift openshift-ansible pull 12163 0 None closed Bug 1830158: Add support for global namespaces and removal of remote_group_id 2020-07-02 01:08:39 UTC
Red Hat Product Errata RHBA-2020:2215 0 None None None 2020-05-28 05:44:20 UTC

Description Mohammad 2020-04-30 23:41:26 UTC
Description of problem: When installing Openshift 3.11 with Kuryr on OpenStack 13, after installation, a few ports are blocked preventing openshift-monitoring from fully functioning.

Initially, these are identified as following:

1. Access to port 9091 on the Prometheus-k8s service IP (issue resolved when we added port 9091 to the service ip load balancer security group.)

2. Access to port 4443 on logging-es-data-master pods running in openshift-logging (issue resolved when port 4443 security rule was added to ns/openshift-logging-sg)

We are still testing, but these are the two issues that stood out.

Version-Release number of selected component (if applicable): Currently did the installation with 3.11.200


How reproducible: Installed OpenShift 3.11.200 with Kuryr and the config in [1].


Steps to Reproduce:
1. Use instructions: https://docs.openshift.com/container-platform/4.2/installing/installing_openstack/installing-openstack-installer-kuryr.html with the config above to install OpenShift.


Actual results:

Even though OpenShift installs there are a few ports closed preventing prometheus monitoring from working 100%

Expected results:

The OpenShift monitoring suite should be fully functional.


Additional info:


1- Kuryr config used:

```

## Kuryr configuration START
openshift_use_kuryr: True
openshift_use_openshift_sdn: False
use_trunk_ports: True
os_sdn_network_plugin_name: cni
openshift_node_proxy_mode: userspace
kuryr_openstack_pool_driver: nested
openshift_kuryr_precreate_subports: 5

kuryr_openstack_ca: "{{custom_certs_dir}}/CA_Bundle.txt"

openshift_openstack_kuryr_controller_image: "satellite.mydomain:5000/myorg-ocp311-openshift3_kuryr-controller:{{ openshift_image_tag }}"
openshift_openstack_kuryr_cni_image: "satellite.mydomain:5000/myorg-ocp311-openshift3_kuryr-cni:{{ openshift_image_tag }}"

kuryr_openstack_public_net_id: 123456789

# To disable namespace isolation, comment out the next 2 lines
openshift_kuryr_subnet_driver: namespace
openshift_kuryr_sg_driver: namespace

openshift_master_open_ports:
- service: dns tcp
  port: 53/tcp
- service: dns udp
  port: 53/udp
openshift_node_open_ports:
- service: dns tcp
  port: 53/tcp
- service: dns udp
  port: 53/udp
# End of Kuryr configuration

#SERVICE
openshift_openstack_kuryr_service_subnet_cidr: XX.XX.128.0/18
openshift_openstack_kuryr_service_pool_start: XX.XX.128.100
openshift_openstack_kuryr_service_pool_end: XX.XX.191.200
#POD
openshift_openstack_kuryr_pod_subnet_cidr: XX.XX.0.0/17

## Kuryr configuration END


```

Comment 1 Mohammad 2020-05-01 00:37:06 UTC
Some further output:

Regarding point 1:

```

[root@master-2 ~]# oc logs -f console-6d9485c899-8qxlv
2020/04/30 10:57:25 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout
2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout
2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout
2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout
2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout

[root@master-2 ~]# oc get svc --all-namespaces |grep XX.XX.176.88
openshift-monitoring       prometheus-k8s                ClusterIP   XX.XX.176.88    <none>        9091/TCP                  1h
```

Comment 3 Michał Dulko 2020-05-13 15:55:16 UTC
Can you please help me understand what exactly is the blocked communication? In particular - is this internamespace connectivity? I don't have 3.11 setup handy and looking at outputs I'm a bit confused.

IIUC the blocked traffic in question is:

1. From console pods on openshift-console namespace to prometheus-k8s service on openshift-monitoring namespace on port 9091? Or are the console pods on default namespace?

2. What is trying to reach logging-es-data-master on openshfit-logging namespace? Console pods?

3. Opening 443 egress on console LB is to have the console accessible by the user?

Sorry for questions about topology, but recent changes broke our automation that deploys 3.11 and we're in the process of fixing it, so I don't have access to 3.11 env right now.

It is possible that this is another instance of the Neutron issues with remote_group_id.

Comment 4 Mohammad 2020-05-18 04:55:14 UTC
Regarding your questions:

1. It is the console pods on openshift-console namespace to prometheus-k8s service on openshift-monitoring namespace on port 9091
2. No it is kube-controllers prometheus targets. This can be observed from https://prometheus-k8s-openshift-monitoring.apps.openshift.mydomain/targets#job-kube-controllers
3. No it seems the console needs to access the api IP/port and this is why it was opened.

To provide further elaboration, after a clean install:

First rule is observed when kube-controllers endpoints cannot be accessed:
https://prometheus-k8s-openshift-monitoring.apps.openshift.mydomain/targets#job-kube-controllers

This fixes it:
openstack security group rule create ns/openshift-logging-sg --ingress --protocol tcp --dst-port 4443 --ethertype IPv4

The second rule shows errors when attempting to get "alerts firing" and "Crashlooping pods" as per image attached, which are shown from the url:

https://console.apps.openshift-mydomain/status/all-namespaces

After checking the logs in openshift-console, the issue is fixed by adding the rule:
openstack security group rule create $(openstack security group show lb-$(openstack loadbalancer show openshift-monitoring/prometheus-k8s -f value -c id) -f value -c id)  --ingress --protocol tcp --dst-port 9091 --ethertype IPv4 # this worked

Comment 5 Mohammad 2020-05-18 04:56:04 UTC
Created attachment 1689510 [details]
BZ1830158_Cluster_Console_Error

Comment 8 Jon Uriarte 2020-05-22 16:04:46 UTC
Created attachment 1691135 [details]
Cluster console

Comment 9 Jon Uriarte 2020-05-22 16:05:46 UTC
Verified in openshift-ansible-3.11.219 (2020-05-20.1) on top of OSP 13 2020-05-19.2.

openshift-monitoring namespace is set now as a global namespace:

[openshift@master-0 ~]$ oc -n kuryr get cm kuryr-config -o yaml
    ...
    [namespace_sg]
    sg_allow_from_namespaces = 7f8a582d-4ad5-455a-ae67-777e4ea447dd
    sg_allow_from_default = 7bb544a0-b7e7-432b-a030-d6ae668ba2a0
    global_namespaces = default,openshift-monitoring
    lbaas_activation_timeout = 1200
    ...


Allow from default security group:
$ openstack security group rule list 7bb544a0-b7e7-432b-a030-d6ae668ba2a0
+--------------------------------------+-------------+---------------+------------+-----------------------+
| ID                                   | IP Protocol | IP Range      | Port Range | Remote Security Group |
+--------------------------------------+-------------+---------------+------------+-----------------------+
| 0d1db300-82a2-4e55-b7ea-e2ba3875a410 | None        | 10.11.10.0/24 |            | None                  |
| 4cee77bf-001f-4a30-917d-f692d54de3b2 | None        | None          |            | None                  |
| 60ecd42c-4a4e-4ef5-8e07-f7392fb89af9 | None        | None          |            | None                  |
| 62849e68-dc84-42ea-a97e-dc152b663a67 | None        | 10.11.7.0/24  |            | None                  |
+--------------------------------------+-------------+---------------+------------+-----------------------+

It now allows the traffic from the default namespace/subnet and from the monitoring namespace/subnet.

$ openstack subnet list
+--------------------------------------+--------------------------------+--------------------------------------+-----------------+
| ID                                   | Name                           | Network                              | Subnet          |
+--------------------------------------+--------------------------------+--------------------------------------+-----------------+
...
| 10f253e3-25ab-4da8-a0cf-a16d829971ca | ns/openshift-monitoring-subnet | 21d5b0e0-f8e3-4509-9a06-d847208ad626 | 10.11.10.0/24   |
| fcab5cd1-2a47-4d34-959f-3a94e511f2ba | ns/default-subnet              | 14670064-c0ce-4689-8450-7f98a0ebdefb | 10.11.7.0/24    |
+--------------------------------------+--------------------------------+--------------------------------------+-----------------+


The prometheus-k8s LB now is reachable from the namespaces/subnets in the subnet pool 10.11.0.0/16:

$ openstack security group rule list f3571022-f1a4-43d2-acb2-a3a18673977b
+--------------------------------------+-------------+-----------------+------------+-----------------------+
| ID                                   | IP Protocol | IP Range        | Port Range | Remote Security Group |
+--------------------------------------+-------------+-----------------+------------+-----------------------+
| 0c594df8-2a38-4412-adb6-06045a92a758 | None        | None            |            | None                  |
| 3bee49d1-1654-4148-a649-6c749d11d3e3 | tcp         | 192.168.99.0/24 | 9091:9091  | None                  |
| 401c4750-3715-4476-ba18-e210f0b88035 | tcp         | 10.11.0.0/16    | 9091:9091  | None                  | <------
| 66aec901-11c0-4dec-92d5-c30acfb735e0 | tcp         | 10.11.10.0/24   | 9091:9091  | None                  |
| 7f5f68d4-1514-4289-839e-3438c66355dc | tcp         | None            | 1025:1025  | None                  |
| 9044c2d0-6fe9-4565-a2e7-8e397d9d6823 | None        | None            |            | None                  |
| f28f62dc-eb09-4d3b-9f01-6e836172295c | tcp         | 172.30.0.0/16   | 9091:9091  | None                  |
+--------------------------------------+-------------+-----------------+------------+-----------------------+


[openshift@master-0 ~]$ oc -n openshift-console logs -f console-5984d988c4-c7lwp
2020/05/22 08:51:16 cmd/main: cookies are secure!
2020/05/22 08:51:16 cmd/main: Binding to 0.0.0.0:8443...
2020/05/22 08:51:16 cmd/main: using TLS
2020/05/22 10:44:01 server: authentication failed: http: named cookie not present
2020/05/22 10:44:01 server: authentication failed: http: named cookie not present
2020/05/22 10:44:01 server: authentication failed: http: named cookie not present
2020/05/22 10:44:01 server: authentication failed: http: named cookie not present
2020/05/22 10:44:01 server: authentication failed: http: named cookie not present
2020/05/22 10:44:01 server: authentication failed: http: named cookie not present
2020/05/22 10:44:02 server: authentication failed: http: named cookie not present
2020/05/22 10:44:02 server: authentication failed: http: named cookie not present
2020/05/22 10:44:02 server: authentication failed: http: named cookie not present
2020/05/22 10:44:02 server: authentication failed: http: named cookie not present
2020/05/22 10:44:02 server: authentication failed: http: named cookie not present
2020/05/22 10:44:07 auth: oauth success, redirecting to: "https://console.apps.openshift.example.com/"

Comment 11 errata-xmlrpc 2020-05-28 05:44:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2215


Note You need to log in before you can comment on or make changes to this bug.