Description of problem: When installing Openshift 3.11 with Kuryr on OpenStack 13, after installation, a few ports are blocked preventing openshift-monitoring from fully functioning. Initially, these are identified as following: 1. Access to port 9091 on the Prometheus-k8s service IP (issue resolved when we added port 9091 to the service ip load balancer security group.) 2. Access to port 4443 on logging-es-data-master pods running in openshift-logging (issue resolved when port 4443 security rule was added to ns/openshift-logging-sg) We are still testing, but these are the two issues that stood out. Version-Release number of selected component (if applicable): Currently did the installation with 3.11.200 How reproducible: Installed OpenShift 3.11.200 with Kuryr and the config in [1]. Steps to Reproduce: 1. Use instructions: https://docs.openshift.com/container-platform/4.2/installing/installing_openstack/installing-openstack-installer-kuryr.html with the config above to install OpenShift. Actual results: Even though OpenShift installs there are a few ports closed preventing prometheus monitoring from working 100% Expected results: The OpenShift monitoring suite should be fully functional. Additional info: 1- Kuryr config used: ``` ## Kuryr configuration START openshift_use_kuryr: True openshift_use_openshift_sdn: False use_trunk_ports: True os_sdn_network_plugin_name: cni openshift_node_proxy_mode: userspace kuryr_openstack_pool_driver: nested openshift_kuryr_precreate_subports: 5 kuryr_openstack_ca: "{{custom_certs_dir}}/CA_Bundle.txt" openshift_openstack_kuryr_controller_image: "satellite.mydomain:5000/myorg-ocp311-openshift3_kuryr-controller:{{ openshift_image_tag }}" openshift_openstack_kuryr_cni_image: "satellite.mydomain:5000/myorg-ocp311-openshift3_kuryr-cni:{{ openshift_image_tag }}" kuryr_openstack_public_net_id: 123456789 # To disable namespace isolation, comment out the next 2 lines openshift_kuryr_subnet_driver: namespace openshift_kuryr_sg_driver: namespace openshift_master_open_ports: - service: dns tcp port: 53/tcp - service: dns udp port: 53/udp openshift_node_open_ports: - service: dns tcp port: 53/tcp - service: dns udp port: 53/udp # End of Kuryr configuration #SERVICE openshift_openstack_kuryr_service_subnet_cidr: XX.XX.128.0/18 openshift_openstack_kuryr_service_pool_start: XX.XX.128.100 openshift_openstack_kuryr_service_pool_end: XX.XX.191.200 #POD openshift_openstack_kuryr_pod_subnet_cidr: XX.XX.0.0/17 ## Kuryr configuration END ```
Some further output: Regarding point 1: ``` [root@master-2 ~]# oc logs -f console-6d9485c899-8qxlv 2020/04/30 10:57:25 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout 2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout 2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout 2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout 2020/04/30 10:57:26 http: proxy error: dial tcp XX.XX.176.88:9091: i/o timeout [root@master-2 ~]# oc get svc --all-namespaces |grep XX.XX.176.88 openshift-monitoring prometheus-k8s ClusterIP XX.XX.176.88 <none> 9091/TCP 1h ```
Can you please help me understand what exactly is the blocked communication? In particular - is this internamespace connectivity? I don't have 3.11 setup handy and looking at outputs I'm a bit confused. IIUC the blocked traffic in question is: 1. From console pods on openshift-console namespace to prometheus-k8s service on openshift-monitoring namespace on port 9091? Or are the console pods on default namespace? 2. What is trying to reach logging-es-data-master on openshfit-logging namespace? Console pods? 3. Opening 443 egress on console LB is to have the console accessible by the user? Sorry for questions about topology, but recent changes broke our automation that deploys 3.11 and we're in the process of fixing it, so I don't have access to 3.11 env right now. It is possible that this is another instance of the Neutron issues with remote_group_id.
Regarding your questions: 1. It is the console pods on openshift-console namespace to prometheus-k8s service on openshift-monitoring namespace on port 9091 2. No it is kube-controllers prometheus targets. This can be observed from https://prometheus-k8s-openshift-monitoring.apps.openshift.mydomain/targets#job-kube-controllers 3. No it seems the console needs to access the api IP/port and this is why it was opened. To provide further elaboration, after a clean install: First rule is observed when kube-controllers endpoints cannot be accessed: https://prometheus-k8s-openshift-monitoring.apps.openshift.mydomain/targets#job-kube-controllers This fixes it: openstack security group rule create ns/openshift-logging-sg --ingress --protocol tcp --dst-port 4443 --ethertype IPv4 The second rule shows errors when attempting to get "alerts firing" and "Crashlooping pods" as per image attached, which are shown from the url: https://console.apps.openshift-mydomain/status/all-namespaces After checking the logs in openshift-console, the issue is fixed by adding the rule: openstack security group rule create $(openstack security group show lb-$(openstack loadbalancer show openshift-monitoring/prometheus-k8s -f value -c id) -f value -c id) --ingress --protocol tcp --dst-port 9091 --ethertype IPv4 # this worked
Created attachment 1689510 [details] BZ1830158_Cluster_Console_Error
Created attachment 1691135 [details] Cluster console
Verified in openshift-ansible-3.11.219 (2020-05-20.1) on top of OSP 13 2020-05-19.2. openshift-monitoring namespace is set now as a global namespace: [openshift@master-0 ~]$ oc -n kuryr get cm kuryr-config -o yaml ... [namespace_sg] sg_allow_from_namespaces = 7f8a582d-4ad5-455a-ae67-777e4ea447dd sg_allow_from_default = 7bb544a0-b7e7-432b-a030-d6ae668ba2a0 global_namespaces = default,openshift-monitoring lbaas_activation_timeout = 1200 ... Allow from default security group: $ openstack security group rule list 7bb544a0-b7e7-432b-a030-d6ae668ba2a0 +--------------------------------------+-------------+---------------+------------+-----------------------+ | ID | IP Protocol | IP Range | Port Range | Remote Security Group | +--------------------------------------+-------------+---------------+------------+-----------------------+ | 0d1db300-82a2-4e55-b7ea-e2ba3875a410 | None | 10.11.10.0/24 | | None | | 4cee77bf-001f-4a30-917d-f692d54de3b2 | None | None | | None | | 60ecd42c-4a4e-4ef5-8e07-f7392fb89af9 | None | None | | None | | 62849e68-dc84-42ea-a97e-dc152b663a67 | None | 10.11.7.0/24 | | None | +--------------------------------------+-------------+---------------+------------+-----------------------+ It now allows the traffic from the default namespace/subnet and from the monitoring namespace/subnet. $ openstack subnet list +--------------------------------------+--------------------------------+--------------------------------------+-----------------+ | ID | Name | Network | Subnet | +--------------------------------------+--------------------------------+--------------------------------------+-----------------+ ... | 10f253e3-25ab-4da8-a0cf-a16d829971ca | ns/openshift-monitoring-subnet | 21d5b0e0-f8e3-4509-9a06-d847208ad626 | 10.11.10.0/24 | | fcab5cd1-2a47-4d34-959f-3a94e511f2ba | ns/default-subnet | 14670064-c0ce-4689-8450-7f98a0ebdefb | 10.11.7.0/24 | +--------------------------------------+--------------------------------+--------------------------------------+-----------------+ The prometheus-k8s LB now is reachable from the namespaces/subnets in the subnet pool 10.11.0.0/16: $ openstack security group rule list f3571022-f1a4-43d2-acb2-a3a18673977b +--------------------------------------+-------------+-----------------+------------+-----------------------+ | ID | IP Protocol | IP Range | Port Range | Remote Security Group | +--------------------------------------+-------------+-----------------+------------+-----------------------+ | 0c594df8-2a38-4412-adb6-06045a92a758 | None | None | | None | | 3bee49d1-1654-4148-a649-6c749d11d3e3 | tcp | 192.168.99.0/24 | 9091:9091 | None | | 401c4750-3715-4476-ba18-e210f0b88035 | tcp | 10.11.0.0/16 | 9091:9091 | None | <------ | 66aec901-11c0-4dec-92d5-c30acfb735e0 | tcp | 10.11.10.0/24 | 9091:9091 | None | | 7f5f68d4-1514-4289-839e-3438c66355dc | tcp | None | 1025:1025 | None | | 9044c2d0-6fe9-4565-a2e7-8e397d9d6823 | None | None | | None | | f28f62dc-eb09-4d3b-9f01-6e836172295c | tcp | 172.30.0.0/16 | 9091:9091 | None | +--------------------------------------+-------------+-----------------+------------+-----------------------+ [openshift@master-0 ~]$ oc -n openshift-console logs -f console-5984d988c4-c7lwp 2020/05/22 08:51:16 cmd/main: cookies are secure! 2020/05/22 08:51:16 cmd/main: Binding to 0.0.0.0:8443... 2020/05/22 08:51:16 cmd/main: using TLS 2020/05/22 10:44:01 server: authentication failed: http: named cookie not present 2020/05/22 10:44:01 server: authentication failed: http: named cookie not present 2020/05/22 10:44:01 server: authentication failed: http: named cookie not present 2020/05/22 10:44:01 server: authentication failed: http: named cookie not present 2020/05/22 10:44:01 server: authentication failed: http: named cookie not present 2020/05/22 10:44:01 server: authentication failed: http: named cookie not present 2020/05/22 10:44:02 server: authentication failed: http: named cookie not present 2020/05/22 10:44:02 server: authentication failed: http: named cookie not present 2020/05/22 10:44:02 server: authentication failed: http: named cookie not present 2020/05/22 10:44:02 server: authentication failed: http: named cookie not present 2020/05/22 10:44:02 server: authentication failed: http: named cookie not present 2020/05/22 10:44:07 auth: oauth success, redirecting to: "https://console.apps.openshift.example.com/"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2215