Bug 2303338

Summary:	Ceph Cluster 8.0 connection fails in the vSphere plugin while adding Add Storage System without mTLS configured
Product:	[Red Hat Storage] Red Hat Ceph Storage	Reporter:	Krishna Ramaswamy <kramaswa>
Component:	vSphere plugin	Assignee:	Ernesto Puerta <epuertat>
Status:	VERIFIED ---	QA Contact:	Krishna Ramaswamy <kramaswa>
Severity:	urgent	Docs Contact:	ceph-docs <ceph-docs>
Priority:	urgent
Version:	8.0	CC:	ceph-eng-bugs, epuertat, hberrisf, nia
Target Milestone:	---	Keywords:	External
Target Release:	8.0	Flags:	hberrisf: needinfo+
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:
Clones:	2306778 (view as bug list)		Environment:
Last Closed:		Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	2303116, 2306778
Bug Blocks:

Description Krishna Ramaswamy 2024-08-07 05:14:57 UTC

Description of problem:

Ceph Cluster 8.0 connection  fails in the vSphere plugin while adding Add Storage System without mTLS configured

Version-Release number of selected component (if applicable):


 cp.stg.icr.io/cp/ibm-ceph/ceph-8-rhel9:8-13
 cp.stg.icr.io/cp/ibm-ceph/nvmeof-cli-rhel9:1.2.17-8
 cp.stg.icr.io/cp/ibm-ceph/nvmeof-rhel9:1.2.17-6


Pre-Req Configured:

[root@cephqe-node1 ~]# ceph auth ls | grep nvmeof
client.nvmeof.rbd.cephqe-node2.ycbwfr
client.nvmeof.rbd.cephqe-node3.unadxm
client.nvmeof.rbd.cephqe-node5.yagutr
client.nvmeof.rbd.cephqe-node7.jiunhe
[root@cephqe-node1 ~]# ceph osd get-require-min-compat-client
mimic
[root@cephqe-node1 ~]# ceph dashboard nvmeof-gateway-list
{"gateways": {"cephqe-node2": {"service_url": "10.70.39.49:5500"}, "cephqe-node3": {"service_url": "10.70.39.50:5500"}, "cephqe-node5": {"service_url": "10.70.39.52:5500"}, "cephqe-node7": {"service_url": "10.70.39.54:5500"}}}
[root@cephqe-node1 ~]# 


Plugin Error Log:

2024-08-07 04:54:52,469 - endpoints.py[line:177] - vsphere-plugin.endpoints - INFO : GET /api/cephclusters/9
2024-08-07 04:54:52,471 - ceph_manager.py[line:508] - vsphere-plugin.ceph_manager - INFO : Sending command: https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/summary
2024-08-07 04:54:52,490 - ceph_manager.py[line:515] - vsphere-plugin.ceph_manager - INFO : Storage system bf73c41c-541a-11ef-a88e-4c5262033c3d response for command https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/summary
2024-08-07 04:54:52,490 - ceph_manager.py[line:508] - vsphere-plugin.ceph_manager - INFO : Sending command: https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/health/get_cluster_capacity
2024-08-07 04:54:52,501 - ceph_manager.py[line:515] - vsphere-plugin.ceph_manager - INFO : Storage system bf73c41c-541a-11ef-a88e-4c5262033c3d response for command https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/health/get_cluster_capacity
2024-08-07 04:54:52,502 - ceph_manager.py[line:508] - vsphere-plugin.ceph_manager - INFO : Sending command: https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/pool/rbd?stats=true
2024-08-07 04:54:52,518 - ceph_manager.py[line:515] - vsphere-plugin.ceph_manager - INFO : Storage system bf73c41c-541a-11ef-a88e-4c5262033c3d response for command https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/pool/rbd?stats=true
2024-08-07 04:54:52,519 - ceph_manager.py[line:508] - vsphere-plugin.ceph_manager - INFO : Sending command: https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/nvmeof/gateway
2024-08-07 04:54:52,539 - ceph_manager.py[line:515] - vsphere-plugin.ceph_manager - INFO : Storage system bf73c41c-541a-11ef-a88e-4c5262033c3d response for command https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/nvmeof/gateway
2024-08-07 04:54:52,540 - ceph_manager.py[line:531] - vsphere-plugin.ceph_manager - ERROR : Caught HTTPStatusError with status_code 400 and detail {"detail": "Failed to get nvmeof_server_cert for cephqe-node2: No secret found for entity nvmeof_server_cert with service name cephqe-node2", "component": null}
2024-08-07 04:54:52,540 - ceph_exception_manager.py[line:57] - vsphere-plugin.ceph_exception_manager - ERROR : Status code: 400, detail: {"detail": "Failed to get nvmeof_server_cert for cephqe-node2: No secret found for entity nvmeof_server_cert with service name cephqe-node2", "component": null}
Traceback (most recent call last):
  File "/app/ceph_manager.py", line 520, in _make_get_request
    response.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/httpx/_models.py", line 758, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/nvmeof/gateway'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/ceph_exception_manager.py", line 53, in wrapper
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/endpoints.py", line 58, in make_basic_request
    return await fs.make_basic_request(command)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ceph_manager.py", line 367, in make_basic_request
    response = await self._make_get_request(request, headers)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ceph_manager.py", line 534, in _make_get_request
    raise ceph_exception.ConnectionErrorException(status_code, detail) from err
ceph_exception_manager.ConnectionErrorException: (400, '{"detail": "Failed to get nvmeof_server_cert for cephqe-node2: No secret found for entity nvmeof_server_cert with service name cephqe-node2", "component": null}')
^C
root@ibm-storage-ceph-plugin-for-vsphere-1 [ /opt/persistent ]#

Comment 1 Hannah Berrisford 2024-09-05 14:22:32 UTC

Need some more info - do we need to do any work alongside the Dashboard work?
Will the Dashboard team be working on this fix?