Bug 2303338

Summary: Ceph Cluster 8.0 connection fails in the vSphere plugin while adding Add Storage System without mTLS configured
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Krishna Ramaswamy <kramaswa>
Component: vSphere pluginAssignee: Ernesto Puerta <epuertat>
Status: VERIFIED --- QA Contact: Krishna Ramaswamy <kramaswa>
Severity: urgent Docs Contact: ceph-docs <ceph-docs>
Priority: urgent    
Version: 8.0CC: ceph-eng-bugs, epuertat, hberrisf, nia
Target Milestone: ---Keywords: External
Target Release: 8.0Flags: hberrisf: needinfo+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2306778 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2303116, 2306778    
Bug Blocks:    

Description Krishna Ramaswamy 2024-08-07 05:14:57 UTC
Description of problem:

Ceph Cluster 8.0 connection  fails in the vSphere plugin while adding Add Storage System without mTLS configured

Version-Release number of selected component (if applicable):


 cp.stg.icr.io/cp/ibm-ceph/ceph-8-rhel9:8-13
 cp.stg.icr.io/cp/ibm-ceph/nvmeof-cli-rhel9:1.2.17-8
 cp.stg.icr.io/cp/ibm-ceph/nvmeof-rhel9:1.2.17-6


Pre-Req Configured:

[root@cephqe-node1 ~]# ceph auth ls | grep nvmeof
client.nvmeof.rbd.cephqe-node2.ycbwfr
client.nvmeof.rbd.cephqe-node3.unadxm
client.nvmeof.rbd.cephqe-node5.yagutr
client.nvmeof.rbd.cephqe-node7.jiunhe
[root@cephqe-node1 ~]# ceph osd get-require-min-compat-client
mimic
[root@cephqe-node1 ~]# ceph dashboard nvmeof-gateway-list
{"gateways": {"cephqe-node2": {"service_url": "10.70.39.49:5500"}, "cephqe-node3": {"service_url": "10.70.39.50:5500"}, "cephqe-node5": {"service_url": "10.70.39.52:5500"}, "cephqe-node7": {"service_url": "10.70.39.54:5500"}}}
[root@cephqe-node1 ~]# 


Plugin Error Log:

2024-08-07 04:54:52,469 - endpoints.py[line:177] - vsphere-plugin.endpoints - INFO : GET /api/cephclusters/9
2024-08-07 04:54:52,471 - ceph_manager.py[line:508] - vsphere-plugin.ceph_manager - INFO : Sending command: https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/summary
2024-08-07 04:54:52,490 - ceph_manager.py[line:515] - vsphere-plugin.ceph_manager - INFO : Storage system bf73c41c-541a-11ef-a88e-4c5262033c3d response for command https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/summary
2024-08-07 04:54:52,490 - ceph_manager.py[line:508] - vsphere-plugin.ceph_manager - INFO : Sending command: https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/health/get_cluster_capacity
2024-08-07 04:54:52,501 - ceph_manager.py[line:515] - vsphere-plugin.ceph_manager - INFO : Storage system bf73c41c-541a-11ef-a88e-4c5262033c3d response for command https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/health/get_cluster_capacity
2024-08-07 04:54:52,502 - ceph_manager.py[line:508] - vsphere-plugin.ceph_manager - INFO : Sending command: https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/pool/rbd?stats=true
2024-08-07 04:54:52,518 - ceph_manager.py[line:515] - vsphere-plugin.ceph_manager - INFO : Storage system bf73c41c-541a-11ef-a88e-4c5262033c3d response for command https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/pool/rbd?stats=true
2024-08-07 04:54:52,519 - ceph_manager.py[line:508] - vsphere-plugin.ceph_manager - INFO : Sending command: https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/nvmeof/gateway
2024-08-07 04:54:52,539 - ceph_manager.py[line:515] - vsphere-plugin.ceph_manager - INFO : Storage system bf73c41c-541a-11ef-a88e-4c5262033c3d response for command https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/nvmeof/gateway
2024-08-07 04:54:52,540 - ceph_manager.py[line:531] - vsphere-plugin.ceph_manager - ERROR : Caught HTTPStatusError with status_code 400 and detail {"detail": "Failed to get nvmeof_server_cert for cephqe-node2: No secret found for entity nvmeof_server_cert with service name cephqe-node2", "component": null}
2024-08-07 04:54:52,540 - ceph_exception_manager.py[line:57] - vsphere-plugin.ceph_exception_manager - ERROR : Status code: 400, detail: {"detail": "Failed to get nvmeof_server_cert for cephqe-node2: No secret found for entity nvmeof_server_cert with service name cephqe-node2", "component": null}
Traceback (most recent call last):
  File "/app/ceph_manager.py", line 520, in _make_get_request
    response.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/httpx/_models.py", line 758, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '400 Bad Request' for url 'https://cephqe-node1.lab.eng.blr.redhat.com:8443/api/nvmeof/gateway'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/400

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/ceph_exception_manager.py", line 53, in wrapper
    return await fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/endpoints.py", line 58, in make_basic_request
    return await fs.make_basic_request(command)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ceph_manager.py", line 367, in make_basic_request
    response = await self._make_get_request(request, headers)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/ceph_manager.py", line 534, in _make_get_request
    raise ceph_exception.ConnectionErrorException(status_code, detail) from err
ceph_exception_manager.ConnectionErrorException: (400, '{"detail": "Failed to get nvmeof_server_cert for cephqe-node2: No secret found for entity nvmeof_server_cert with service name cephqe-node2", "component": null}')
^C
root@ibm-storage-ceph-plugin-for-vsphere-1 [ /opt/persistent ]#

Comment 1 Hannah Berrisford 2024-09-05 14:22:32 UTC
Need some more info - do we need to do any work alongside the Dashboard work?
Will the Dashboard team be working on this fix?