Description of problem: [6.1z5][Monitoring Service][Prometheus]For latest 6.1z5 builds Prometheus service deployment filing. For latest 6.1z5 builds (17.2.6-205) prometheus service deployment failing as part of regression runs. Spec yaml file content: --- service_type: prometheus service_name: prometheus placement: count: 1 hosts: ['ceph-regression-fugkar-t4lm8t-node1-installer'] --- service_type: grafana service_name: grafana placement: hosts: ['ceph-regression-fugkar-t4lm8t-node1-installer'] --- service_type: alertmanager service_name: alertmanager placement: count: 1 --- service_type: node-exporter service_name: node-exporter placement: host_pattern: '*' --- service_type: crash service_name: crash placement: host_pattern: '*' # cephadm -v shell --mount /tmp:/tmp -- ceph orch apply -i /tmp/tmp_ewe7dvf.yaml 2024-03-15 12:26:04,110 (cephci.test_cephadm) [INFO] - cephci.Regression.rgw.40.cephci.ceph.ceph.py:1597 - Command completed successfully 2024-03-15 12:26:04,111 (cephci.test_cephadm) [INFO] - cephci.Regression.rgw.40.cephci.ceph.ceph_admin.helper.py:1020 - 0/1 prometheus up... retrying 2024-03-15 12:26:04,111 (cephci.test_cephadm) [INFO] - cephci.Regression.rgw.40.cephci.ceph.ceph.py:1563 - Running command cephadm shell -- ceph orch ls --service_type prometheus --format json --refresh on 10.0.195.125 timeout 600 2024-03-15 12:26:06,172 (cephci.test_cephadm) [INFO] - cephci.Regression.rgw.40.cephci.ceph.ceph.py:1597 - Command completed successfully 2024-03-15 12:26:06,173 (cephci.test_cephadm) [ERROR] - cephci.Regression.rgw.40.cephci.ceph.ceph_admin.helper.py:1037 - prometheus failed with ['2024-03-15T06:27:16.645743Z service:prometheus [INFO] "service was created"'] 2024-03-15 12:26:06,174 (cephci.test_cephadm) [ERROR] - cephci.Regression.rgw.40.cephci.tests.ceph_installer.test_cephadm.py:176 - prometheus service deployment failed!!! Traceback (most recent call last): File "/home/jenkins/ceph-builds/17.2.6-205/Regression/rgw/40/cephci/tests/ceph_installer/test_cephadm.py", line 160, in run func(cfg) File "/home/jenkins/ceph-builds/17.2.6-205/Regression/rgw/40/cephci/ceph/ceph_admin/orch.py", line 198, in apply_spec validate_spec_services( File "/home/jenkins/ceph-builds/17.2.6-205/Regression/rgw/40/cephci/ceph/ceph_admin/helper.py", line 1069, in validate_spec_services raise Exception(f"{svc_name or svc_type} service deployment failed!!!") Exception: prometheus service deployment failed!!! Note: For 6.1z5(17.2.6-202) previous build prometheus service deployment working as expected Version-Release number of selected component (if applicable): 17.2.6-205 How reproducible: Always Steps to Reproduce: 1. Bootstrap cluster 2. Deploy monitoring services(prometheus) using spec file Actual results: Prometheus service deployment failing Expected results: Prometheus service deployment pass
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 6.1 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:1580
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days