Description of problem: I am using ansible method to install Containerized Ceph 4 cluster in a disconnected environment the Grafana Container cannot start normally, which restart repeatly, I hava found the container logs by "docker logs -f ContainerID" to fetch temporary logs: (1) wait for grafana container start: [root@ceph-metrics-01 ~]# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES e7c7c6e60486 registry.example.internal:5000/grafana/grafana:5.2.4 "/run.sh" Less than a second ago Up Less than a second grafana-server 9daf808d7780 registry.example.internal:5000/prom/alertmanager:v0.16.2 "/bin/alertmanager..." 18 hours ago Up 18 hours alertmanager 21ade096d5ec registry.example.internal:5000/prom/prometheus:v2.7.2 "/bin/prometheus -..." 18 hours ago Up 18 hours prometheus 08d461dd707f registry.example.internal:5000/prom/node-exporter:v0.17.0 "/bin/node_exporte..." 18 hours ago Up 18 hours node-exporter (2) when the grafana container start, I fetch the logs immediately: I found the grafana want to install two plugin: #grafana_plugins: # - vonage-status-panel # - grafana-piechart-panel but in a disconnect environment, this logs show us we cannot setup the grafana normally [root@ceph-metrics-01 ~]# docker logs -f e7 Failed to send requesterrorGet https://grafana.com/api/plugins/repo/vonage-status-panel: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) Error: ✗ Failed to send request. error: Get https://grafana.com/api/plugins/repo/vonage-status-panel: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) NAME: Grafana cli plugins install - install <plugin id> <plugin version (optional)> USAGE: Grafana cli plugins install [arguments...] (3) the Grafana container remove by itself in a very short time [root@ceph-metrics-01 ~]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 9daf808d7780 registry.example.internal:5000/prom/alertmanager:v0.16.2 "/bin/alertmanager..." 18 hours ago Up 18 hours alertmanager 21ade096d5ec registry.example.internal:5000/prom/prometheus:v2.7.2 "/bin/prometheus -..." 18 hours ago Up 18 hours prometheus 08d461dd707f registry.example.internal:5000/prom/node-exporter:v0.17.0 "/bin/node_exporte..." 18 hours ago Up 18 hours node-exporter Version-Release number of selected component (if applicable): Ceph 4.15 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: I want to install the metrics and dashboard in disconnected environment for Ceph 4 Additional info:
let's please assess this for 4.1, disconnected configurations are important to RHCS customers. resetting assignee to default, there is an invalid right now.
Hello, sorry for opening this closed bug. We have a CU upgrading from RHCS 4 to RHCS 5 in an air-gapped and secure environment, CU does use redhat images in their satellite setup. Part from the image inspect of the sos report for the image: ``` "Labels": { "architecture": "x86_64", "build-date": "2024-06-18T21:15:48", "com.redhat.component": "grafana-container", "com.redhat.license_terms": "https://www.redhat.com/agreements", "description": "Red Hat Ceph Storage 5 Grafana container", "distribution-scope": "public", "io.buildah.version": "1.29.0", "io.k8s.description": "Red Hat Ceph Storage 5 Grafana container", "io.k8s.display-name": "Grafana on RHEL 8", "io.openshift.expose-services": "", "io.openshift.tags": "rhceph ceph dashboard grafana", "maintainer": "Nizamudeen A <nia>", "name": "grafana", "release": "87", "summary": "Provides the Grafana container on RHEL 8 for Red Hat Ceph Storage 5.", "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/grafana/images/5-87", "vcs-ref": "27ee260ccdac41b6ea45ac0ee6ebaa2f50904e5c", "vcs-type": "git", "vendor": "Red Hat, Inc.", "version": "5" }, ``` Still CU has this error when upgrading and the upgrade failing: ``` Aug 23 16:39:03 jdc-1f-ras26-cmon01 grafana-server[307750]: Error: ✗ Get "https://grafana.com/api/plugins/vonage-status-panel/versions": context deadline exceeded (Client.Timeout exceeded while awaiting headers) ``` Can you please advise? CU is blocked for upgrading their production environment (this is from a test environment). Best regards Raimund
Hi, just as an FYI, I tried this in my lab in a 5.3 test cluster: ``` [root@mgmt-0 ceph-ansible]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 # BEGIN ANSIBLE MANAGED BLOCK # I want to simulate CU sites which can not access download.eng.bos.redhat.com #127.0.0.10 download.eng.bos.redhat.com #127.0.0.11 registry-proxy.engineering.redhat.com # END ANSIBLE MANAGED BLOCK # 127.0.0.234 grafana.com ``` I tried this in my lab and after restarting grafana I could see that grafana just put out an error message, but did proceed with the startup: ``` Aug 29 05:18:24 mgmt-0.rsachereceph536.lab.upshift.rdu2.redhat.com ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6-grafana-mgmt-0[4119638]: logger=grafana.update.checker t=2024-08-29T09:18:24.055776503Z level=error msg="Update check failed" error="failed to get stable version from grafana.com: Get \"https://grafana.com/api/grafana/versions/stable\": dial tcp 127.0.0.234:443: connect: connection refused" duration=24.044572ms Aug 29 05:18:24 mgmt-0.rsachereceph536.lab.upshift.rdu2.redhat.com ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6-grafana-mgmt-0[4119638]: logger=provisioning.dashboard t=2024-08-29T09:18:24.076347986Z level=info msg="starting to provision dashboards" ``` Also, with a cephadm deployed grafana server, we do not actually see instructions for the plugins in the unit file: ``` [root@mgmt-0 ceph-ansible]# cat /var/lib/ceph/f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6/grafana.mgmt-0/unit.run set -e # grafana.mgmt-0 ! /bin/podman rm -f ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6-grafana.mgmt-0 2> /dev/null ! /bin/podman rm -f ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6-grafana-mgmt-0 2> /dev/null ! /bin/podman rm -f --storage ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6-grafana-mgmt-0 2> /dev/null ! /bin/podman rm -f --storage ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6-grafana.mgmt-0 2> /dev/null /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host --init --name ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6-grafana-mgmt-0 --user 472 -d --log-driver journald --conmon-pidfile /run/ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6.service-pid --cidfile /run/ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6.service-cid --cgroups=split -e CONTAINER_IMAGE=registry.redhat.io/rhceph/rhceph-5-dashboard-rhel8:latest -e NODE_NAME=mgmt-0.rsachereceph536.lab.upshift.rdu2.redhat.com -e CEPH_USE_RANDOM_NONCE=1 -v /var/lib/ceph/f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6/grafana.mgmt-0/etc/grafana/grafana.ini:/etc/grafana/grafana.ini:Z -v /var/lib/ceph/f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6/grafana.mgmt-0/etc/grafana/provisioning/datasources:/etc/grafana/provisioning/datasources:Z -v /var/lib/ceph/f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6/grafana.mgmt-0/etc/grafana/certs:/etc/grafana/certs:Z -v /var/lib/ceph/f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6/grafana.mgmt-0/data/grafana.db:/var/lib/grafana/grafana.db:Z -v /etc/hosts:/etc/hosts:ro registry.redhat.io/rhceph/rhceph-5-dashboard-rhel8:latest ``` The error message I get in my lab is similar to the message the client receives, but I get `connection refused`, while client get `timeout`. The issue might be that their firewall does not a fast connection reset, but just does not respond. So maybe in disconnected environments it would work fine if the firewall does a connection reset and the upgrade check can fail fast, but if the upgrade check times out, then it may bring down grafana? Is executing the upgrade check actually needed or wanted in our RHCS image for grafana? Do we want plugins to get upgraded from upstream? BR Raimund
Hi, not sure if that was the actual resolution, but CU changed the image tag from 5-87 to 5-79, and with that it seemed to have worked in the disconnected environment. Are there differences in how the vonage plugin works or the update check works in those two images? Thank you, BR Raimund
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 5.3 security and bug fix updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2025:1478