Description of problem: With mTLS configuration in NVMe Gateway is in CREATED state and GW's are in standby mode Version-Release number of selected component (if applicable):ceph 7.1 Build Version : 18.2.1-188 Steps Performed: 1. Deployed Ceph Cluster on 5 Node Baremetal Servers. 2. Configured NVMe Service with mTLS in ceph cluster. 3. Creation of subsystem, host, listener, namespace are successful. 4. But NVMe GW's are still in standby mode as shown below Error Log: [ceph: root@cephqe-node1 /]# ceph nvme-gw show rbd '' { "epoch": 11, "pool": "rbd", "group": "", "num gws": 2, "Anagrp list": "[ 1 2 ]" } { "gw-id": "client.nvmeof.rbd.cephqe-node2.tfngbt", "anagrp-id": 1, "performed-full-startup": 1, "Availability": "CREATED", "ana states": " 1: STANDBY , 2: STANDBY ," } { "gw-id": "client.nvmeof.rbd.cephqe-node3.fpbdir", "anagrp-id": 2, "performed-full-startup": 1, "Availability": "CREATED", "ana states": " 1: STANDBY , 2: STANDBY ," } [ceph: root@cephqe-node1 /]# Impact: Due to this issue , the Namespaces are not able to use at the client side ( ESXi or RHEL Initiator ) because the path will be in standby mode instead of Active mode. Here is the Initiator ESXi Log: [root@rhocs-bm8:~] esxcli storage core path list | grep -A 10 vmhba65 Runtime Name: vmhba65:C0:T3:L0 Device: No associated device Device Display Name: No associated device Adapter: vmhba65 Channel: 0 Target: 3 LUN: 0 Plugin: HPP State: standby Transport: tcp Adapter Identifier: tcp.vmnic1:0c:42:a1:6c:a8:ed Target Identifier: tcp.unknown Adapter Transport Details: Unavailable or path is unclaimed Target Transport Details: Unavailable or path is unclaimed -- Runtime Name: vmhba65:C0:T2:L0 Device: No associated device Device Display Name: No associated device Adapter: vmhba65 Channel: 0 Target: 2 LUN: 0 Plugin: HPP State: standby Transport: tcp Adapter Identifier: tcp.vmnic1:0c:42:a1:6c:a8:ed Target Identifier: tcp.unknown Adapter Transport Details: Unavailable or path is unclaimed Target Transport Details: Unavailable or path is unclaimed [root@rhocs-bm8:~]
Tried with RHEL Initiator: Same issue. [root@ceph-sunilkumar-01-2d2ssd-node8 ~]# nvme connect-all --trsvcid 8009 --transport tcp --traddr 10.70.39.49 --ctrl-loss-tmo 3600 [root@ceph-sunilkumar-01-2d2ssd-node8 ~]# nvme list Node Generic SN Model Namespace Usage Format FW Rev --------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- -------- [root@ceph-sunilkumar-01-2d2ssd-node8 ~]# Gate Way Node Log: [root@cephqe-node2 nvmeof-client.nvmeof.rbd.cephqe-node2.tfngbt]# tail -f nvmeof-log [23-May-2024 06:54:36] INFO grpc.py:2648: Received request to get gateway's info [23-May-2024 06:54:36] INFO prometheus.py:131: Stats for all bdevs will be provided [23-May-2024 06:54:36] INFO grpc.py:914: Received request to add a namespace using NSID 1 and UUID 5834f3cc-8dd6-4d37-97b3-8a3769fe9b89 to nqn.2016-06.io.spdk:cnode1, ana group 1 context: None [23-May-2024 06:54:36] INFO grpc.py:274: Received request to create bdev bdev_5834f3cc-8dd6-4d37-97b3-8a3769fe9b89 from rbd/image1 (size 0 bytes) with block size 512, will not create image if doesn't exist [23-May-2024 06:54:36] INFO grpc.py:250: Allocated cluster name='cluster_context_1_0' nonce='10.70.39.49:0/312171512' anagrp=1 [23-May-2024 06:54:36] INFO grpc.py:209: get_cluster cluster_name='cluster_context_1_0' number bdevs: 1 [23-May-2024 06:54:36] INFO grpc.py:760: Received request to add bdev_5834f3cc-8dd6-4d37-97b3-8a3769fe9b89 to nqn.2016-06.io.spdk:cnode1 with ANA group id 1 using NSID 1 and UUID 5834f3cc-8dd6-4d37-97b3-8a3769fe9b89 [23-May-2024 06:54:36] INFO grpc.py:1755: Received request to allow any host access for nqn.2016-06.io.spdk:cnode1, context: None [23-May-2024 06:54:36] INFO grpc.py:2099: Received request to create cephqe-node2 TCP ipv4 listener for nqn.2016-06.io.spdk:cnode1 at 10.70.39.49:4420, context: None [23-May-2024 06:54:36] INFO grpc.py:2099: Received request to create cephqe-node3 TCP ipv4 listener for nqn.2016-06.io.spdk:cnode1 at 10.70.39.50:4420, context: None
It looks like a configuration issue not related to mTLS. It only changes the transport and the CLI commands are the same. Please verify you did not miss a step. What were the commands you used to configure the gateways? Please provide the full details in the BZ. Please retry without mTLS to verify. Thanks
Hi Thomas, Please attach this BZ to 7.1 Errata.
Verified the mTLS configuration on the latest build : 190 ceph version 18.2.1-190.el9cp (5eee8f17de7cfe7a752abc74828d97473040534e) reef (stable) the following behaviour observed. Scenario 1: with mTLS configuration steps 1. Deployed ceph cluster with latest build: ceph version 18.2.1-190.el9cp (5eee8f17de7cfe7a752abc74828d97473040534e) reef (stable) 2. Configured the mTLS on ceph cluster [root@cephqe-node1 ~]# cat gw-conf-with-mtls.yaml | grep auth enable_auth: true 3. Applied the changes. [root@cephqe-node1 ~]# ceph orch apply -i gw-conf-with-mtls.yaml Scheduled nvmeof.rbd update... [root@cephqe-node1 ~]# ceph orch reconfig nvmeof.rbd Scheduled to reconfig nvmeof.rbd.cephqe-node2.jebusx on host 'cephqe-node2' Scheduled to reconfig nvmeof.rbd.cephqe-node3.gykptt on host 'cephqe-node3' [root@cephqe-node1 ~]# ceph orch ls NAME PORTS RUNNING REFRESHED AGE PLACEMENT alertmanager ?:9093,9094 1/1 7m ago 4d count:1 ceph-exporter 4/4 7m ago 4d * crash 4/4 7m ago 4d * grafana ?:3000 1/1 7m ago 4d count:1 mgr 2/2 7m ago 4d count:2 mon 3/3 7m ago 4d label:mon node-exporter ?:9100 4/4 7m ago 3h * node-proxy 0/0 - 4d * nvmeof.rbd ?:4420,5500,8009 2/2 5m ago 23s cephqe-node2;cephqe-node3 osd.all-available-devices 15 7m ago 4d * prometheus ?:9095 1/1 7m ago 4d count:1 [root@cephqe-node1 ~]# ceph nvme-gw show rbd '' { "epoch": 51, "pool": "rbd", "group": "", "num gws": 2, "Anagrp list": "[ 1 4 ]" } { "gw-id": "client.nvmeof.rbd.cephqe-node2.jebusx", "anagrp-id": 1, "performed-full-startup": 0, "Availability": "UNAVAILABLE", "ana states": " 1: STANDBY , 4: STANDBY ," } { "gw-id": "client.nvmeof.rbd.cephqe-node3.gykptt", "anagrp-id": 4, "performed-full-startup": 0, "Availability": "UNAVAILABLE", "ana states": " 1: STANDBY , 4: STANDBY ," } Scenario 2: with-out mTLS configuration steps [root@cephqe-node1 ~]# cat gw-conf-with-mtls.yaml | grep auth enable_auth: false [root@cephqe-node1 ~]# ceph nvme-gw show rbd '' { "epoch": 57, "pool": "rbd", "group": "", "num gws": 2, "Anagrp list": "[ 1 4 ]" } { "gw-id": "client.nvmeof.rbd.cephqe-node2.jebusx", "anagrp-id": 1, "performed-full-startup": 1, "Availability": "AVAILABLE", "ana states": " 1: ACTIVE , 4: STANDBY ," } { "gw-id": "client.nvmeof.rbd.cephqe-node3.gykptt", "anagrp-id": 4, "performed-full-startup": 1, "Availability": "AVAILABLE", "ana states": " 1: STANDBY , 4: ACTIVE ," }
[root@cephqe-node2 ~]# nvm gw infoEnable server auth since both --client-key and --client-cert are providedCLI's version: 1.2.10 Gateway's version: 1.2.10 Gateway's name: client.nvmeof.rbd.cephqe-node2.jebusx Gateway's host name: cephqe-node2 Gateway's load balancing group: 1 Gateway's address: 10.70.39.49 Gateway's port: 5500 SPDK version: 24.01.1
mTLS is deferred to 7.1 z
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 7.1 security and bug fix update.), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:5080