Bug 1830158
Summary: | OpenShift 3.11 with Kuryr is missing a few security group rules to allow monitoring to work properly | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Mohammad <mahmad> | ||||||
Component: | Networking | Assignee: | Michał Dulko <mdulko> | ||||||
Networking sub component: | kuryr | QA Contact: | Jon Uriarte <juriarte> | ||||||
Status: | CLOSED ERRATA | Docs Contact: | |||||||
Severity: | low | ||||||||
Priority: | medium | CC: | alegrand, anpicker, bshirren, erooth, juriarte, kakkoyun, lcosic, ltomasbo, mdulko, mloibl, pkrupa, surbania | ||||||
Version: | 3.11.0 | ||||||||
Target Milestone: | --- | ||||||||
Target Release: | 3.11.z | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Enhancement | |||||||
Doc Text: |
Feature: "Global" namespaces are the ones that in multitenant mode of openshfit-sdn are accessible from all the other namespaces and can access all other namespaces. To emulate that behavior with Kuryr's namespace isolation mode, an [namespace_sg]global_namespaces option was added to kuryr.conf. All the namespace names listed there will be configured to be global. This can be configured on openshift-ansible level as using kuryr_openstack_global_namespaces inventory setting.
Reason: This was done to allow openshift-monitoring namespace to act like a global one on deployments with kuryr-kubernetes.
Result: By default kuryr_openstack_global_namespaces is set to "default,openshift-monitoring". If you have logging enabled, you should add openshift-logging there too.
|
Story Points: | --- | ||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2020-05-28 05:44:13 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
Mohammad
2020-04-30 23:41:26 UTC
Some further output: Regarding point 1: ``` [root@master-2 ~]# oc logs -f console-6d9485c899-8qxlv 2020/04/30 10:57:25 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout 2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout 2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout 2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout 2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout [root@master-2 ~]# oc get svc --all-namespaces |grep XX.XX.176.88 openshift-monitoring prometheus-k8s ClusterIP XX.XX.176.88 <none> 9091/TCP 1h ``` Can you please help me understand what exactly is the blocked communication? In particular - is this internamespace connectivity? I don't have 3.11 setup handy and looking at outputs I'm a bit confused. IIUC the blocked traffic in question is: 1. From console pods on openshift-console namespace to prometheus-k8s service on openshift-monitoring namespace on port 9091? Or are the console pods on default namespace? 2. What is trying to reach logging-es-data-master on openshfit-logging namespace? Console pods? 3. Opening 443 egress on console LB is to have the console accessible by the user? Sorry for questions about topology, but recent changes broke our automation that deploys 3.11 and we're in the process of fixing it, so I don't have access to 3.11 env right now. It is possible that this is another instance of the Neutron issues with remote_group_id. Regarding your questions: 1. It is the console pods on openshift-console namespace to prometheus-k8s service on openshift-monitoring namespace on port 9091 2. No it is kube-controllers prometheus targets. This can be observed from https://prometheus-k8s-openshift-monitoring.apps.openshift.mydomain/targets#job-kube-controllers 3. No it seems the console needs to access the api IP/port and this is why it was opened. To provide further elaboration, after a clean install: First rule is observed when kube-controllers endpoints cannot be accessed: https://prometheus-k8s-openshift-monitoring.apps.openshift.mydomain/targets#job-kube-controllers This fixes it: openstack security group rule create ns/openshift-logging-sg --ingress --protocol tcp --dst-port 4443 --ethertype IPv4 The second rule shows errors when attempting to get "alerts firing" and "Crashlooping pods" as per image attached, which are shown from the url: https://console.apps.openshift-mydomain/status/all-namespaces After checking the logs in openshift-console, the issue is fixed by adding the rule: openstack security group rule create $(openstack security group show lb-$(openstack loadbalancer show openshift-monitoring/prometheus-k8s -f value -c id) -f value -c id) --ingress --protocol tcp --dst-port 9091 --ethertype IPv4 # this worked Created attachment 1689510 [details] BZ1830158_Cluster_Console_Error Created attachment 1691135 [details]
Cluster console
Verified in openshift-ansible-3.11.219 (2020-05-20.1) on top of OSP 13 2020-05-19.2. openshift-monitoring namespace is set now as a global namespace: [openshift@master-0 ~]$ oc -n kuryr get cm kuryr-config -o yaml ... [namespace_sg] sg_allow_from_namespaces = 7f8a582d-4ad5-455a-ae67-777e4ea447dd sg_allow_from_default = 7bb544a0-b7e7-432b-a030-d6ae668ba2a0 global_namespaces = default,openshift-monitoring lbaas_activation_timeout = 1200 ... Allow from default security group: $ openstack security group rule list 7bb544a0-b7e7-432b-a030-d6ae668ba2a0 +--------------------------------------+-------------+---------------+------------+-----------------------+ | ID | IP Protocol | IP Range | Port Range | Remote Security Group | +--------------------------------------+-------------+---------------+------------+-----------------------+ | 0d1db300-82a2-4e55-b7ea-e2ba3875a410 | None | 10.11.10.0/24 | | None | | 4cee77bf-001f-4a30-917d-f692d54de3b2 | None | None | | None | | 60ecd42c-4a4e-4ef5-8e07-f7392fb89af9 | None | None | | None | | 62849e68-dc84-42ea-a97e-dc152b663a67 | None | 10.11.7.0/24 | | None | +--------------------------------------+-------------+---------------+------------+-----------------------+ It now allows the traffic from the default namespace/subnet and from the monitoring namespace/subnet. $ openstack subnet list +--------------------------------------+--------------------------------+--------------------------------------+-----------------+ | ID | Name | Network | Subnet | +--------------------------------------+--------------------------------+--------------------------------------+-----------------+ ... | 10f253e3-25ab-4da8-a0cf-a16d829971ca | ns/openshift-monitoring-subnet | 21d5b0e0-f8e3-4509-9a06-d847208ad626 | 10.11.10.0/24 | | fcab5cd1-2a47-4d34-959f-3a94e511f2ba | ns/default-subnet | 14670064-c0ce-4689-8450-7f98a0ebdefb | 10.11.7.0/24 | +--------------------------------------+--------------------------------+--------------------------------------+-----------------+ The prometheus-k8s LB now is reachable from the namespaces/subnets in the subnet pool 10.11.0.0/16: $ openstack security group rule list f3571022-f1a4-43d2-acb2-a3a18673977b +--------------------------------------+-------------+-----------------+------------+-----------------------+ | ID | IP Protocol | IP Range | Port Range | Remote Security Group | +--------------------------------------+-------------+-----------------+------------+-----------------------+ | 0c594df8-2a38-4412-adb6-06045a92a758 | None | None | | None | | 3bee49d1-1654-4148-a649-6c749d11d3e3 | tcp | 192.168.99.0/24 | 9091:9091 | None | | 401c4750-3715-4476-ba18-e210f0b88035 | tcp | 10.11.0.0/16 | 9091:9091 | None | <------ | 66aec901-11c0-4dec-92d5-c30acfb735e0 | tcp | 10.11.10.0/24 | 9091:9091 | None | | 7f5f68d4-1514-4289-839e-3438c66355dc | tcp | None | 1025:1025 | None | | 9044c2d0-6fe9-4565-a2e7-8e397d9d6823 | None | None | | None | | f28f62dc-eb09-4d3b-9f01-6e836172295c | tcp | 172.30.0.0/16 | 9091:9091 | None | +--------------------------------------+-------------+-----------------+------------+-----------------------+ [openshift@master-0 ~]$ oc -n openshift-console logs -f console-5984d988c4-c7lwp 2020/05/22 08:51:16 cmd/main: cookies are secure! 2020/05/22 08:51:16 cmd/main: Binding to 0.0.0.0:8443... 2020/05/22 08:51:16 cmd/main: using TLS 2020/05/22 10:44:01 server: authentication failed: http: named cookie not present 2020/05/22 10:44:01 server: authentication failed: http: named cookie not present 2020/05/22 10:44:01 server: authentication failed: http: named cookie not present 2020/05/22 10:44:01 server: authentication failed: http: named cookie not present 2020/05/22 10:44:01 server: authentication failed: http: named cookie not present 2020/05/22 10:44:01 server: authentication failed: http: named cookie not present 2020/05/22 10:44:02 server: authentication failed: http: named cookie not present 2020/05/22 10:44:02 server: authentication failed: http: named cookie not present 2020/05/22 10:44:02 server: authentication failed: http: named cookie not present 2020/05/22 10:44:02 server: authentication failed: http: named cookie not present 2020/05/22 10:44:02 server: authentication failed: http: named cookie not present 2020/05/22 10:44:07 auth: oauth success, redirecting to: "https://console.apps.openshift.example.com/" Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2215 |