Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1907198

Summary:

ceph and grafana dashboards are inaccessible post deployment with code 503

Product:

Red Hat OpenStack

Reporter:

Punit Kundal <pkundal>

Component:

openstack-tripleo-heat-templates

Assignee:

RHOS Maint <rhos-maint>

Status:

CLOSED DUPLICATE

QA Contact:

Joe H. Rahme <jhakimra>

Severity:

medium

Docs Contact:

Priority:

unspecified

Version:

16.1 (Train)

CC:

aschultz, fpantano, gfidente, mburns

Target Milestone:

---

Target Release:

---

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2020-12-18 16:57:29 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
ceph_dashboard_default	none
ceph_dashboard_stack_upda	none

Description Punit Kundal 2020-12-13 15:19:25 UTC

Created attachment 1738748 [details]
ceph_dashboard_default

Description of problem:

Post a fresh deployment of RHOSP 16.1 with ceph; grafana and ceph dashboards are inaccessible

We notice that the haproxy binds to ctlplane vip for the ceph and grafana dashboards 

it is also seen that backend server ip(s) are on the ctlplane network

ceph-ansible playbooks however config the dashboard to run on storage network instead of ctlplane

Here are the observations:

haproxy binds on the ctplane ip for ceph and grafana dashboard

+++
[root@overcloud-controller-0 ~]# netstat -tunpl | grep -i 8444
tcp        0      0 192.168.24.215:8444     0.0.0.0:*               LISTEN      752542/haproxy      
[root@overcloud-controller-0 ~]# netstat -tunpl | grep -i 3100
tcp        0      0 192.168.24.215:3100     0.0.0.0:*               LISTEN      752542/haproxy  
+++

the backend servers are running on the storage network:

+++
[root@overcloud-controller-0 ~]# podman exec -it d1ea7a60368a ceph mgr services
{
    "dashboard": "http://overcloud-controller-0.storage.labrh2251.com:8444/",
    "prometheus": "http://overcloud-controller-0.labrh2251.com:9283/"
}
[root@overcloud-controller-0 ~]# 
+++

if we try to curl the ctlplane vip on port 8444 and 3100 where haproxy is listening:

ceph dashboard:

+++
[root@overcloud-controller-0 ~]# curl -vL http://192.168.24.215:8444
* Rebuilt URL to: http://192.168.24.215:8444/
*   Trying 192.168.24.215...
* TCP_NODELAY set
* Connected to 192.168.24.215 (192.168.24.215) port 8444 (#0)
> GET / HTTP/1.1
> Host: 192.168.24.215:8444
> User-Agent: curl/7.61.1
> Accept: */*
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 503 Service Unavailable
< Cache-Control: no-cache
< Connection: close
< Content-Type: text/html
< 
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
* Closing connection 0
[root@overcloud-controller-0 ~]# 
+++

same for port 3100:

+++
[root@overcloud-controller-0 ~]# curl -vL http://192.168.24.215:3100
* Rebuilt URL to: http://192.168.24.215:3100/
*   Trying 192.168.24.215...
* TCP_NODELAY set
* Connected to 192.168.24.215 (192.168.24.215) port 3100 (#0)
> GET / HTTP/1.1
> Host: 192.168.24.215:3100
> User-Agent: curl/7.61.1
> Accept: */*
> 
* HTTP 1.0, assume close after body
< HTTP/1.0 503 Service Unavailable
< Cache-Control: no-cache
< Connection: close
< Content-Type: text/html
< 
<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
* Closing connection 0
[root@overcloud-controller-0 ~]# 
+++

here's the haproxy config which is deployed by default:

+++
[root@overcloud-controller-0 ~]# cat /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg | grep -i ceph_dashboard -A10
listen ceph_dashboard
  bind 192.168.24.215:8444 transparent
  mode http
  balance source
  http-check expect rstatus 2[0-9][0-9]
  http-request set-header X-Forwarded-Proto https if { ssl_fc }
  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request set-header X-Forwarded-Port %[dst_port]
  option httpchk HEAD /
  server overcloud-controller-0.ctlplane.labrh2251.com 192.168.24.213:8444 check fall 5 inter 2000 rise 2

[root@overcloud-controller-0 ~]# cat /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg | grep -i ceph_grafana -A10
listen ceph_grafana
  bind 192.168.24.215:3100 transparent
  mode http
  balance source
  http-request set-header X-Forwarded-Proto https if { ssl_fc }
  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request set-header X-Forwarded-Port %[dst_port]
  option httpchk HEAD /
  server overcloud-controller-0.ctlplane.labrh2251.com 192.168.24.213:3100 check fall 5 inter 2000 rise 2
+++

the reason behind this issue is because in service_net_map.j2.yaml it is noticed that the network that is being used is 'storage_dashboard' and there is no network by that name so it defaults to ctlplane:

+++
[root@undercloud16 ~]# grep -e CephDashboar -e CephGra /usr/share/openstack-tripleo-heat-templates/network/service_net_map.j2.yaml 
      CephDashboardNetwork: {{ _service_nets.get('storage_dashboard', 'ctlplane') }}
      CephGrafanaNetwork: {{ _service_nets.get('storage_dashboard', 'ctlplane') }}
+++

here are the details on the networks in this lab:

+++
(undercloud) [stack@undercloud16 ~]$ neutron net-list
neutron CLI is deprecated and will be removed in the future. Use openstack CLI instead.
+--------------------------------------+--------------+----------------------------------+-------------------------------------------------------+
| id                                   | name         | tenant_id                        | subnets                                               |
+--------------------------------------+--------------+----------------------------------+-------------------------------------------------------+
| 1b2da26a-6ff5-4664-af65-d015bd178a2b | ctlplane     | d73920b84d9043fea60173e8c27d3bfd | dbef2f39-75ef-41f8-8dda-49a7308e16a7 192.168.24.0/24  |
| 4759c631-d429-4fa4-b952-f6c9c0ae5e23 | internal_api | d73920b84d9043fea60173e8c27d3bfd | 1d6a347d-ee60-423f-af03-5b9b28adfcbf 172.16.20.0/24   |
| 67dd666a-51a6-4967-a9e8-22235e5d0d9b | tenant       | d73920b84d9043fea60173e8c27d3bfd | e3fd7eaa-5b8a-4339-a95b-0d1208c286fa 172.16.50.0/24   |
| acdb84c6-aee6-4d50-b8d0-08bdf568fe58 | storage      | d73920b84d9043fea60173e8c27d3bfd | fdc0d41f-76dc-47cc-9979-42c38f200276 172.16.40.0/24   |
| e3ab2496-32f7-47ef-a943-b7103727110f | external     | d73920b84d9043fea60173e8c27d3bfd | faed5981-dccc-455b-91a1-7b8aace621b4 192.168.122.0/24 |
| edc047be-22e3-4317-b203-f35dd3d41a46 | storage_mgmt | d73920b84d9043fea60173e8c27d3bfd | 7dfe3a0f-863d-4b6c-995d-1e117da652a7 172.16.60.0/24   |
+--------------------------------------+--------------+----------------------------------+-------------------------------------------------------+
+++

if we try to directly reach the backend server ip where the ceph and grafana dashboards are listening:

+++
[root@overcloud-controller-0 ~]# curl -vL http://172.16.40.16:8444
* Rebuilt URL to: http://172.16.40.16:8444/
*   Trying 172.16.40.16...
* TCP_NODELAY set
* Connected to 172.16.40.16 (172.16.40.16) port 8444 (#0)
> GET / HTTP/1.1
> Host: 172.16.40.16:8444
> User-Agent: curl/7.61.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: text/html;charset=utf-8
< Server: CherryPy/8.9.1
< Date: Sun, 13 Dec 2020 14:22:06 GMT
< Content-Language: en-US
< Vary: Accept-Language, Accept-Encoding
< Last-Modified: Tue, 24 Nov 2020 19:19:12 GMT
< Accept-Ranges: bytes
< Content-Length: 1193
< 
<!doctype html>
<html lang="en">
<head>
  <meta charset="utf-8">
  <title>Red Hat Ceph Storage</title>

  <script>
    document.write('<base href="' + document.location+ '" />');
  </script>
+++

and then if we try for the below:

+++
[root@overcloud-controller-0 ~]# curl -vL http://172.16.40.16:3100
* Rebuilt URL to: http://172.16.40.16:3100/
*   Trying 172.16.40.16...
* TCP_NODELAY set
* Connected to 172.16.40.16 (172.16.40.16) port 3100 (#0)
> GET / HTTP/1.1
> Host: 172.16.40.16:3100
> User-Agent: curl/7.61.1
> Accept: */*
> 
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=UTF-8
< Set-Cookie: grafana_sess=54e469e89ae5c194; Path=/; HttpOnly
< Date: Sun, 13 Dec 2020 15:13:58 GMT
< Transfer-Encoding: chunked
< 
<!DOCTYPE html>
<html lang="en">

<head>
  <meta charset="utf-8">
  <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">
  <meta name="viewport" content="width=device-width">
  <meta name="theme-color" content="#000">
+++

we can see that when try to hit the correct ip they are reachable; so the default config that is set by haproxy needs to be fixed

Version-Release number of selected component (if applicable):

+++
[root@undercloud16 ~]# rpm -qa | grep -i tripleo-heat
openstack-tripleo-heat-templates-11.3.2-1.20200914170156.el8ost.noarch
+++

How reproducible:
Always

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Screenshots while trying to access the dashboard from the haproxy vip ends up with 503

Comment 1 Punit Kundal 2020-12-13 16:58:54 UTC

Here are some more details; 

this issue is because of the ServiceNetMapDefaults parameter; if we set:

+++
(undercloud) [stack@undercloud16 ~]$ grep -i ServiceNetMap ~/templates/network-environment.yaml -A2
  ServiceNetMap:
    CephDashboardNetwork: storage
    CephGrafanaNetwork: storage
(undercloud) [stack@undercloud16 ~]$ 
+++

then run a stack update:

+++
Wait for puppet host configuration to finish --------------------------- 29.19s
Wait for puppet host configuration to finish --------------------------- 29.14s
Wait for puppet host configuration to finish --------------------------- 29.14s
Run tripleo-container-image-prepare logged to: /var/log/tripleo-container-image-prepare.log -- 27.63s
Wait for container-puppet tasks (bootstrap tasks) for step 3 to finish -- 25.31s
Wait for puppet host configuration to finish --------------------------- 22.11s
Wait for containers to start for step 4 using paunch ------------------- 21.81s
Pre-fetch all the containers ------------------------------------------- 20.52s
Run puppet on the host to apply IPtables rules ------------------------- 16.58s
tripleo-hieradata : Render hieradata from template --------------------- 14.94s
tripleo-kernel : Set extra sysctl options ------------------------------- 9.96s
tripleo-keystone-resources : Async creation of Keystone user ------------ 9.35s
tripleo-keystone-resources : Async creation of Keystone admin endpoint --- 8.97s

Ansible passed.
Overcloud configuration completed.
Overcloud Endpoint: http://192.168.122.202:5000
Overcloud Horizon Dashboard URL: http://192.168.122.202:80/dashboard
Overcloud rc file: /home/stack/overcloudrc
Overcloud Deployed without error
sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=4, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.205', 45238)>
sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=5, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.205', 60856)>
sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.205', 38006), raddr=('192.168.24.205', 13989)>

real    62m50.039s
user    0m18.178s
sys     0m2.159s
+++

this sets the backend ip(s) correctly on the storage network; the vip still points to the ctlplane network

+++
[root@overcloud-controller-0 ~]# cat /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg | grep -i ceph_dashboard -A10
listen ceph_dashboard
  bind 192.168.24.215:8444 transparent
  mode http
  balance source
  http-check expect rstatus 2[0-9][0-9]
  http-request set-header X-Forwarded-Proto https if { ssl_fc }
  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request set-header X-Forwarded-Port %[dst_port]
  option httpchk HEAD /
  server overcloud-controller-0.storage.labrh2251.com 172.16.40.16:8444 check fall 5 inter 2000 rise 2

[root@overcloud-controller-0 ~]# cat /var/lib/config-data/puppet-generated/haproxy/etc/haproxy/haproxy.cfg | grep -i ceph_grafana -A10
listen ceph_grafana
  bind 192.168.24.215:3100 transparent
  mode http
  balance source
  http-request set-header X-Forwarded-Proto https if { ssl_fc }
  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request set-header X-Forwarded-Port %[dst_port]
  option httpchk HEAD /
  server overcloud-controller-0.storage.labrh2251.com 172.16.40.16:3100 check fall 5 inter 2000 rise 2
+++

we are now able to access the dashboard; while the dashboard is still accessible; while browsing through we get code 500 errors on many different points; some examples are:

cluster -> hosts  code 500
cluster -> monitor code 500

the data still appears to be displayed but we still get alerts

I am attaching a screen shot of the same

Regards,
Punit

Comment 2 Punit Kundal 2020-12-13 16:59:44 UTC

Created attachment 1738772 [details]
ceph_dashboard_stack_upda