1814080 – grafana cannot install plugin in a disconnected environment

Bug 1814080 - grafana cannot install plugin in a disconnected environment

Summary: grafana cannot install plugin in a disconnected environment

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Dashboard
Sub Component:
Version:	5.3
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	5.3z8
Assignee:	Nizamudeen
QA Contact:	Vinayak Papnoi
Docs Contact:	Anjana Suparna Sriram
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-03-17 00:12 UTC by kevin
Modified:	2025-02-13 19:22 UTC (History)
CC List:	9 users (show)
Fixed In Version:	grafana-container-5-95
Doc Type:	Bug Fix
Doc Text:	.Grafana plug-in installations no longer fail Previously, when installing the Grafana plug-ins (`GF_INSTALL_PLUGINS`), the installation would fail due to the attempt to install the plug-in from upstream. With this fix, the Grafana plug-ins are locally installed and the plug-ins install as expected.
Clone Of:
Environment:
Last Closed:	2025-02-13 19:22:30 UTC
Embargoed:
Dependent Products:
Flags:	welin: needinfo- nia: needinfo+

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2025:1478	0	None	None	None	2025-02-13 19:22:32 UTC

Description kevin 2020-03-17 00:12:08 UTC

Description of problem:
I am using ansible method to install Containerized Ceph 4 cluster in a disconnected environment
the Grafana Container cannot start normally, which restart repeatly, I hava found the container logs by "docker logs -f ContainerID" to fetch temporary logs:

(1) wait for grafana container start:

[root@ceph-metrics-01 ~]# docker ps
CONTAINER ID        IMAGE                                                       COMMAND                  CREATED                  STATUS                  PORTS               NAMES
e7c7c6e60486        registry.example.internal:5000/grafana/grafana:5.2.4        "/run.sh"                Less than a second ago   Up Less than a second                       grafana-server
9daf808d7780        registry.example.internal:5000/prom/alertmanager:v0.16.2    "/bin/alertmanager..."   18 hours ago             Up 18 hours                                 alertmanager
21ade096d5ec        registry.example.internal:5000/prom/prometheus:v2.7.2       "/bin/prometheus -..."   18 hours ago             Up 18 hours                                 prometheus
08d461dd707f        registry.example.internal:5000/prom/node-exporter:v0.17.0   "/bin/node_exporte..."   18 hours ago             Up 18 hours                                 node-exporter

(2) when the grafana container start, I fetch the logs immediately:
I found the grafana want to install two plugin:
#grafana_plugins:
#  - vonage-status-panel
#  - grafana-piechart-panel
but in a disconnect environment, this logs show us we cannot setup the grafana normally

[root@ceph-metrics-01 ~]# docker logs -f e7
Failed to send requesterrorGet https://grafana.com/api/plugins/repo/vonage-status-panel: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Error: ✗ Failed to send request. error: Get https://grafana.com/api/plugins/repo/vonage-status-panel: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
NAME:
   Grafana cli plugins install - install <plugin id> <plugin version (optional)>
USAGE:
   Grafana cli plugins install [arguments...]

(3) the Grafana container remove by itself in a very short time

[root@ceph-metrics-01 ~]# docker ps -a
CONTAINER ID        IMAGE                                                       COMMAND                  CREATED             STATUS              PORTS               NAMES
9daf808d7780        registry.example.internal:5000/prom/alertmanager:v0.16.2    "/bin/alertmanager..."   18 hours ago        Up 18 hours                             alertmanager
21ade096d5ec        registry.example.internal:5000/prom/prometheus:v2.7.2       "/bin/prometheus -..."   18 hours ago        Up 18 hours                             prometheus
08d461dd707f        registry.example.internal:5000/prom/node-exporter:v0.17.0   "/bin/node_exporte..."   18 hours ago        Up 18 hours                             node-exporter



Version-Release number of selected component (if applicable):
Ceph 4.15

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:
I want to install the metrics and dashboard in disconnected environment for Ceph 4

Additional info:

Comment 1 Federico Lucifredi 2020-03-19 17:38:04 UTC

let's please assess this for 4.1, disconnected configurations are important to RHCS customers.

resetting assignee to default, there is an invalid right now.

Comment 4 Raimund Sacherer 2024-08-29 07:02:42 UTC

Hello, 

sorry for opening this closed bug. We have a CU upgrading from RHCS 4 to RHCS 5 in an air-gapped and secure environment, CU does use redhat images in their satellite setup.

Part from the image inspect of the sos report for the image:
```
          "Labels": {
               "architecture": "x86_64",
               "build-date": "2024-06-18T21:15:48",
               "com.redhat.component": "grafana-container",
               "com.redhat.license_terms": "https://www.redhat.com/agreements",
               "description": "Red Hat Ceph Storage 5 Grafana container",
               "distribution-scope": "public",
               "io.buildah.version": "1.29.0",
               "io.k8s.description": "Red Hat Ceph Storage 5 Grafana container",
               "io.k8s.display-name": "Grafana on RHEL 8",
               "io.openshift.expose-services": "",
               "io.openshift.tags": "rhceph ceph dashboard grafana",
               "maintainer": "Nizamudeen A <nia>",
               "name": "grafana",
               "release": "87",
               "summary": "Provides the Grafana container on RHEL 8 for Red Hat Ceph Storage 5.",
               "url": "https://access.redhat.com/containers/#/registry.access.redhat.com/grafana/images/5-87",
               "vcs-ref": "27ee260ccdac41b6ea45ac0ee6ebaa2f50904e5c",
               "vcs-type": "git",
               "vendor": "Red Hat, Inc.",
               "version": "5"
          },
```


Still CU has this error when upgrading and the upgrade failing:
```
Aug 23 16:39:03 jdc-1f-ras26-cmon01 grafana-server[307750]: Error: ✗ Get "https://grafana.com/api/plugins/vonage-status-panel/versions": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
```


Can you please advise? CU is blocked for upgrading their production environment (this is from a test environment). 

Best regards
Raimund

Comment 8 Raimund Sacherer 2024-08-29 09:31:47 UTC

Hi, 

just as an FYI, I tried this in my lab in a 5.3 test cluster:

```
[root@mgmt-0 ceph-ansible]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
# BEGIN ANSIBLE MANAGED BLOCK
# I want to simulate CU sites which can not access download.eng.bos.redhat.com
#127.0.0.10 download.eng.bos.redhat.com      
#127.0.0.11 registry-proxy.engineering.redhat.com
# END ANSIBLE MANAGED BLOCK
#
127.0.0.234 grafana.com
```


I tried this in my lab and after restarting grafana I could see that grafana just put out an error message, but did proceed with the startup:
```
Aug 29 05:18:24 mgmt-0.rsachereceph536.lab.upshift.rdu2.redhat.com ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6-grafana-mgmt-0[4119638]: logger=grafana.update.checker t=2024-08-29T09:18:24.055776503Z level=error msg="Update check failed" error="failed to get stable version from grafana.com: Get \"https://grafana.com/api/grafana/versions/stable\": dial tcp 127.0.0.234:443: connect: connection refused" duration=24.044572ms
Aug 29 05:18:24 mgmt-0.rsachereceph536.lab.upshift.rdu2.redhat.com ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6-grafana-mgmt-0[4119638]: logger=provisioning.dashboard t=2024-08-29T09:18:24.076347986Z level=info msg="starting to provision dashboards"
```


Also, with a cephadm deployed grafana server, we do not actually see instructions for the plugins in the unit file:
```
[root@mgmt-0 ceph-ansible]# cat /var/lib/ceph/f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6/grafana.mgmt-0/unit.run
set -e
# grafana.mgmt-0
! /bin/podman rm -f ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6-grafana.mgmt-0 2> /dev/null
! /bin/podman rm -f ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6-grafana-mgmt-0 2> /dev/null
! /bin/podman rm -f --storage ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6-grafana-mgmt-0 2> /dev/null
! /bin/podman rm -f --storage ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6-grafana.mgmt-0 2> /dev/null
/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json --net=host --init --name ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6-grafana-mgmt-0 --user 472 -d --log-driver journald --conmon-pidfile /run/ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6.service-pid --cidfile /run/ceph-f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6.service-cid --cgroups=split -e CONTAINER_IMAGE=registry.redhat.io/rhceph/rhceph-5-dashboard-rhel8:latest -e NODE_NAME=mgmt-0.rsachereceph536.lab.upshift.rdu2.redhat.com -e CEPH_USE_RANDOM_NONCE=1 -v /var/lib/ceph/f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6/grafana.mgmt-0/etc/grafana/grafana.ini:/etc/grafana/grafana.ini:Z -v /var/lib/ceph/f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6/grafana.mgmt-0/etc/grafana/provisioning/datasources:/etc/grafana/provisioning/datasources:Z -v /var/lib/ceph/f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6/grafana.mgmt-0/etc/grafana/certs:/etc/grafana/certs:Z -v /var/lib/ceph/f6c193e4-5ee8-11ef-bb6b-fa163eacb7c6/grafana.mgmt-0/data/grafana.db:/var/lib/grafana/grafana.db:Z -v /etc/hosts:/etc/hosts:ro registry.redhat.io/rhceph/rhceph-5-dashboard-rhel8:latest
```


The error message I get in my lab is similar to the message the client receives, but I get `connection refused`, while client get `timeout`. The issue might be that their firewall does not a fast connection reset, but just does not respond. So maybe in disconnected environments it would work fine if the firewall does a connection reset and the upgrade check can fail fast, but if the upgrade check times out, then it may bring down grafana?

Is executing the upgrade check actually needed or wanted in our RHCS image for grafana? Do we want plugins to get upgraded from upstream?


BR
Raimund

Comment 10 Raimund Sacherer 2024-08-30 09:30:26 UTC

Hi, 

not sure if that was the actual resolution, but CU changed the image tag from 5-87 to 5-79, and with that it seemed to have worked in the disconnected environment. 

Are there differences in how the vonage plugin works or the update check works in those two images?

 
Thank you,
BR
Raimund

Comment 21 errata-xmlrpc 2025-02-13 19:22:30 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 5.3 security and bug fix updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2025:1478

Note You need to log in before you can comment on or make changes to this bug.