Created attachment 1738748 [details] ceph_dashboard_default Description of problem: Post a fresh deployment of RHOSP 16.1 with ceph; grafana and ceph dashboards are inaccessible We notice that the haproxy binds to ctlplane vip for the ceph and grafana dashboards it is also seen that backend server ip(s) are on the ctlplane network ceph-ansible playbooks however config the dashboard to run on storage network instead of ctlplane Here are the observations: haproxy binds on the ctplane ip for ceph and grafana dashboard +++ [root@overcloud-controller-0 ~]# netstat -tunpl | grep -i 8444 tcp 0 0 192.168.24.215:8444 0.0.0.0:* LISTEN 752542/haproxy [root@overcloud-controller-0 ~]# netstat -tunpl | grep -i 3100 tcp 0 0 192.168.24.215:3100 0.0.0.0:* LISTEN 752542/haproxy +++ the backend servers are running on the storage network: +++ [root@overcloud-controller-0 ~]# podman exec -it d1ea7a60368a ceph mgr services { "dashboard": "http://overcloud-controller-0.storage.labrh2251.com:8444/", "prometheus": "http://overcloud-controller-0.labrh2251.com:9283/" } [root@overcloud-controller-0 ~]# +++ if we try to curl the ctlplane vip on port 8444 and 3100 where haproxy is listening: ceph dashboard: +++ [root@overcloud-controller-0 ~]# curl -vL http://192.168.24.215:8444 * Rebuilt URL to: http://192.168.24.215:8444/ * Trying 192.168.24.215... * TCP_NODELAY set * Connected to 192.168.24.215 (192.168.24.215) port 8444 (#0) > GET / HTTP/1.1 > Host: 192.168.24.215:8444 > User-Agent: curl/7.61.1 > Accept: */* > * HTTP 1.0, assume close after body < HTTP/1.0 503 Service Unavailable < Cache-Control: no-cache < Connection: close < Content-Type: text/html < <html><body><h1>503 Service Unavailable</h1> No server is available to handle this request. </body></html> * Closing connection 0 [root@overcloud-controller-0 ~]# +++ same for port 3100: +++ [root@overcloud-controller-0 ~]# curl -vL http://192.168.24.215:3100 * Rebuilt URL to: http://192.168.24.215:3100/ * Trying 192.168.24.215... * TCP_NODELAY set * Connected to 192.168.24.215 (192.168.24.215) port 3100 (#0) > GET / HTTP/1.1 > Host: 192.168.24.215:3100 > User-Agent: curl/7.61.1 > Accept: */* > * HTTP 1.0, assume close after body < HTTP/1.0 503 Service Unavailable < Cache-Control: no-cache < Connection: close < Content-Type: text/html < <html><body><h1>503 Service Unavailable</h1> No server is available to handle this request. </body></html> * Closing connection 0 [root@overcloud-controller-0 ~]# +++ here's the haproxy config which is deployed by default: +++ [root@overcloud-controller-0 ~]# cat /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg | grep -i ceph_dashboard -A10 listen ceph_dashboard bind 192.168.24.215:8444 transparent mode http balance source http-check expect rstatus 2[0-9][0-9] http-request set-header X-Forwarded-Proto https if { ssl_fc } http-request set-header X-Forwarded-Proto http if !{ ssl_fc } http-request set-header X-Forwarded-Port %[dst_port] option httpchk HEAD / server overcloud-controller-0.ctlplane.labrh2251.com 192.168.24.213:8444 check fall 5 inter 2000 rise 2 [root@overcloud-controller-0 ~]# cat /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg | grep -i ceph_grafana -A10 listen ceph_grafana bind 192.168.24.215:3100 transparent mode http balance source http-request set-header X-Forwarded-Proto https if { ssl_fc } http-request set-header X-Forwarded-Proto http if !{ ssl_fc } http-request set-header X-Forwarded-Port %[dst_port] option httpchk HEAD / server overcloud-controller-0.ctlplane.labrh2251.com 192.168.24.213:3100 check fall 5 inter 2000 rise 2 +++ the reason behind this issue is because in service_net_map.j2.yaml it is noticed that the network that is being used is 'storage_dashboard' and there is no network by that name so it defaults to ctlplane: +++ [root@undercloud16 ~]# grep -e CephDashboar -e CephGra /usr/share/openstack-tripleo-heat-templates/network/service_net_map.j2.yaml CephDashboardNetwork: {{ _service_nets.get('storage_dashboard', 'ctlplane') }} CephGrafanaNetwork: {{ _service_nets.get('storage_dashboard', 'ctlplane') }} +++ here are the details on the networks in this lab: +++ (undercloud) [stack@undercloud16 ~]$ neutron net-list neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead. +--------------------------------------+--------------+----------------------------------+-------------------------------------------------------+ | id | name | tenant_id | subnets | +--------------------------------------+--------------+----------------------------------+-------------------------------------------------------+ | 1b2da26a-6ff5-4664-af65-d015bd178a2b | ctlplane | d73920b84d9043fea60173e8c27d3bfd | dbef2f39-75ef-41f8-8dda-49a7308e16a7 192.168.24.0/24 | | 4759c631-d429-4fa4-b952-f6c9c0ae5e23 | internal_api | d73920b84d9043fea60173e8c27d3bfd | 1d6a347d-ee60-423f-af03-5b9b28adfcbf 172.16.20.0/24 | | 67dd666a-51a6-4967-a9e8-22235e5d0d9b | tenant | d73920b84d9043fea60173e8c27d3bfd | e3fd7eaa-5b8a-4339-a95b-0d1208c286fa 172.16.50.0/24 | | acdb84c6-aee6-4d50-b8d0-08bdf568fe58 | storage | d73920b84d9043fea60173e8c27d3bfd | fdc0d41f-76dc-47cc-9979-42c38f200276 172.16.40.0/24 | | e3ab2496-32f7-47ef-a943-b7103727110f | external | d73920b84d9043fea60173e8c27d3bfd | faed5981-dccc-455b-91a1-7b8aace621b4 192.168.122.0/24 | | edc047be-22e3-4317-b203-f35dd3d41a46 | storage_mgmt | d73920b84d9043fea60173e8c27d3bfd | 7dfe3a0f-863d-4b6c-995d-1e117da652a7 172.16.60.0/24 | +--------------------------------------+--------------+----------------------------------+-------------------------------------------------------+ +++ if we try to directly reach the backend server ip where the ceph and grafana dashboards are listening: +++ [root@overcloud-controller-0 ~]# curl -vL http://172.16.40.16:8444 * Rebuilt URL to: http://172.16.40.16:8444/ * Trying 172.16.40.16... * TCP_NODELAY set * Connected to 172.16.40.16 (172.16.40.16) port 8444 (#0) > GET / HTTP/1.1 > Host: 172.16.40.16:8444 > User-Agent: curl/7.61.1 > Accept: */* > < HTTP/1.1 200 OK < Content-Type: text/html;charset=utf-8 < Server: CherryPy/8.9.1 < Date: Sun, 13 Dec 2020 14:22:06 GMT < Content-Language: en-US < Vary: Accept-Language, Accept-Encoding < Last-Modified: Tue, 24 Nov 2020 19:19:12 GMT < Accept-Ranges: bytes < Content-Length: 1193 < <!doctype html> <html lang="en"> <head> <meta charset="utf-8"> <title>Red Hat Ceph Storage</title> <script> document.write('<base href="' + document.location+ '" />'); </script> +++ and then if we try for the below: +++ [root@overcloud-controller-0 ~]# curl -vL http://172.16.40.16:3100 * Rebuilt URL to: http://172.16.40.16:3100/ * Trying 172.16.40.16... * TCP_NODELAY set * Connected to 172.16.40.16 (172.16.40.16) port 3100 (#0) > GET / HTTP/1.1 > Host: 172.16.40.16:3100 > User-Agent: curl/7.61.1 > Accept: */* > < HTTP/1.1 200 OK < Content-Type: text/html; charset=UTF-8 < Set-Cookie: grafana_sess=54e469e89ae5c194; Path=/; HttpOnly < Date: Sun, 13 Dec 2020 15:13:58 GMT < Transfer-Encoding: chunked < <!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"> <meta name="viewport" content="width=device-width"> <meta name="theme-color" content="#000"> +++ we can see that when try to hit the correct ip they are reachable; so the default config that is set by haproxy needs to be fixed Version-Release number of selected component (if applicable): +++ [root@undercloud16 ~]# rpm -qa | grep -i tripleo-heat openstack-tripleo-heat-templates-11.3.2-1.20200914170156.el8ost.noarch +++ How reproducible: Always Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Screenshots while trying to access the dashboard from the haproxy vip ends up with 503
Here are some more details; this issue is because of the ServiceNetMapDefaults parameter; if we set: +++ (undercloud) [stack@undercloud16 ~]$ grep -i ServiceNetMap ~/templates/network-environment.yaml -A2 ServiceNetMap: CephDashboardNetwork: storage CephGrafanaNetwork: storage (undercloud) [stack@undercloud16 ~]$ +++ then run a stack update: +++ Wait for puppet host configuration to finish --------------------------- 29.19s Wait for puppet host configuration to finish --------------------------- 29.14s Wait for puppet host configuration to finish --------------------------- 29.14s Run tripleo-container-image-prepare logged to: /var/log/tripleo-container-image-prepare.log -- 27.63s Wait for container-puppet tasks (bootstrap tasks) for step 3 to finish -- 25.31s Wait for puppet host configuration to finish --------------------------- 22.11s Wait for containers to start for step 4 using paunch ------------------- 21.81s Pre-fetch all the containers ------------------------------------------- 20.52s Run puppet on the host to apply IPtables rules ------------------------- 16.58s tripleo-hieradata : Render hieradata from template --------------------- 14.94s tripleo-kernel : Set extra sysctl options ------------------------------- 9.96s tripleo-keystone-resources : Async creation of Keystone user ------------ 9.35s tripleo-keystone-resources : Async creation of Keystone admin endpoint --- 8.97s Ansible passed. Overcloud configuration completed. Overcloud Endpoint: http://192.168.122.202:5000 Overcloud Horizon Dashboard URL: http://192.168.122.202:80/dashboard Overcloud rc file: /home/stack/overcloudrc Overcloud Deployed without error sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.205', 45238)> sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.205', 60856)> sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.205', 38006), raddr=('192.168.24.205', 13989)> real 62m50.039s user 0m18.178s sys 0m2.159s +++ this sets the backend ip(s) correctly on the storage network; the vip still points to the ctlplane network +++ [root@overcloud-controller-0 ~]# cat /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg | grep -i ceph_dashboard -A10 listen ceph_dashboard bind 192.168.24.215:8444 transparent mode http balance source http-check expect rstatus 2[0-9][0-9] http-request set-header X-Forwarded-Proto https if { ssl_fc } http-request set-header X-Forwarded-Proto http if !{ ssl_fc } http-request set-header X-Forwarded-Port %[dst_port] option httpchk HEAD / server overcloud-controller-0.storage.labrh2251.com 172.16.40.16:8444 check fall 5 inter 2000 rise 2 [root@overcloud-controller-0 ~]# cat /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg | grep -i ceph_grafana -A10 listen ceph_grafana bind 192.168.24.215:3100 transparent mode http balance source http-request set-header X-Forwarded-Proto https if { ssl_fc } http-request set-header X-Forwarded-Proto http if !{ ssl_fc } http-request set-header X-Forwarded-Port %[dst_port] option httpchk HEAD / server overcloud-controller-0.storage.labrh2251.com 172.16.40.16:3100 check fall 5 inter 2000 rise 2 +++ we are now able to access the dashboard; while the dashboard is still accessible; while browsing through we get code 500 errors on many different points; some examples are: cluster -> hosts code 500 cluster -> monitor code 500 the data still appears to be displayed but we still get alerts I am attaching a screen shot of the same Regards, Punit
Created attachment 1738772 [details] ceph_dashboard_stack_upda