Bug 2282833 - With mTLS configuration in NVMe Gateway is in CREATED state and GW's are in standby mode
Summary: With mTLS configuration in NVMe Gateway is in CREATED state and GW's are in s...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: NVMeOF
Version: 7.1
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: 7.1z1
Assignee: Aviv Caro
QA Contact: Krishna Ramaswamy
Akash Raj
URL:
Whiteboard:
Depends On:
Blocks: 2267614 2298578 2298579
TreeView+ depends on / blocked
 
Reported: 2024-05-23 06:38 UTC by Krishna Ramaswamy
Modified: 2024-08-07 11:21 UTC (History)
11 users (show)

Fixed In Version: ceph-18.2.1-201.el9cp
Doc Type: Known Issue
Doc Text:
When using NVMe-oF gateway CLI commands, mTLS options are shown as available. mTLS is not currently supported and cannot be used.
Clone Of:
Environment:
Last Closed: 2024-08-07 11:21:26 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-9081 0 None None None 2024-05-23 06:41:06 UTC
Red Hat Product Errata RHBA-2024:5080 0 None None None 2024-08-07 11:21:37 UTC

Description Krishna Ramaswamy 2024-05-23 06:38:29 UTC
Description of problem:
With mTLS configuration in NVMe Gateway is in CREATED state and GW's are in standby mode

Version-Release number of selected component (if applicable):ceph 7.1 
Build Version : 18.2.1-188

Steps Performed: 
1. Deployed Ceph Cluster on 5 Node Baremetal Servers.
2. Configured NVMe Service with  mTLS in ceph cluster.
3. Creation of subsystem, host, listener, namespace are successful.
4. But NVMe GW's are still in standby mode as shown below

Error Log:

[ceph: root@cephqe-node1 /]# ceph nvme-gw show rbd ''
{
    "epoch": 11,
    "pool": "rbd",
    "group": "",
    "num gws": 2,
    "Anagrp list": "[ 1 2 ]"
}
{
    "gw-id": "client.nvmeof.rbd.cephqe-node2.tfngbt",
    "anagrp-id": 1,
    "performed-full-startup": 1,
    "Availability": "CREATED",
    "ana states": " 1: STANDBY , 2: STANDBY ,"
}
{
    "gw-id": "client.nvmeof.rbd.cephqe-node3.fpbdir",
    "anagrp-id": 2,
    "performed-full-startup": 1,
    "Availability": "CREATED",
    "ana states": " 1: STANDBY , 2: STANDBY ,"
}
[ceph: root@cephqe-node1 /]# 



Impact: 

Due to this issue , the Namespaces are not able to use at the client side ( ESXi or RHEL Initiator ) because the path will be in standby mode instead of Active mode.

Here is the Initiator ESXi Log:

[root@rhocs-bm8:~] esxcli storage core path list | grep -A 10 vmhba65
   Runtime Name: vmhba65:C0:T3:L0
   Device: No associated device
   Device Display Name: No associated device
   Adapter: vmhba65
   Channel: 0
   Target: 3
   LUN: 0
   Plugin: HPP
   State: standby
   Transport: tcp
   Adapter Identifier: tcp.vmnic1:0c:42:a1:6c:a8:ed
   Target Identifier: tcp.unknown
   Adapter Transport Details: Unavailable or path is unclaimed
   Target Transport Details: Unavailable or path is unclaimed
--
   Runtime Name: vmhba65:C0:T2:L0
   Device: No associated device
   Device Display Name: No associated device
   Adapter: vmhba65
   Channel: 0
   Target: 2
   LUN: 0
   Plugin: HPP
   State: standby
   Transport: tcp
   Adapter Identifier: tcp.vmnic1:0c:42:a1:6c:a8:ed
   Target Identifier: tcp.unknown
   Adapter Transport Details: Unavailable or path is unclaimed
   Target Transport Details: Unavailable or path is unclaimed
[root@rhocs-bm8:~]

Comment 2 Krishna Ramaswamy 2024-05-23 07:18:09 UTC
Tried with RHEL Initiator: Same issue.

[root@ceph-sunilkumar-01-2d2ssd-node8 ~]# nvme connect-all  --trsvcid 8009 --transport tcp --traddr 10.70.39.49 --ctrl-loss-tmo 3600
[root@ceph-sunilkumar-01-2d2ssd-node8 ~]# nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
[root@ceph-sunilkumar-01-2d2ssd-node8 ~]# 


Gate Way Node Log:

[root@cephqe-node2 nvmeof-client.nvmeof.rbd.cephqe-node2.tfngbt]# tail -f nvmeof-log 
[23-May-2024 06:54:36] INFO grpc.py:2648: Received request to get gateway's info
[23-May-2024 06:54:36] INFO prometheus.py:131: Stats for all bdevs will be provided
[23-May-2024 06:54:36] INFO grpc.py:914: Received request to add a namespace using NSID 1 and UUID 5834f3cc-8dd6-4d37-97b3-8a3769fe9b89 to nqn.2016-06.io.spdk:cnode1,  ana group 1  context: None
[23-May-2024 06:54:36] INFO grpc.py:274: Received request to create bdev bdev_5834f3cc-8dd6-4d37-97b3-8a3769fe9b89 from rbd/image1 (size 0 bytes) with block size 512, will not create image if doesn't exist
[23-May-2024 06:54:36] INFO grpc.py:250: Allocated cluster name='cluster_context_1_0' nonce='10.70.39.49:0/312171512' anagrp=1
[23-May-2024 06:54:36] INFO grpc.py:209: get_cluster cluster_name='cluster_context_1_0' number bdevs: 1
[23-May-2024 06:54:36] INFO grpc.py:760: Received request to add bdev_5834f3cc-8dd6-4d37-97b3-8a3769fe9b89 to nqn.2016-06.io.spdk:cnode1 with ANA group id 1 using NSID 1 and UUID 5834f3cc-8dd6-4d37-97b3-8a3769fe9b89
[23-May-2024 06:54:36] INFO grpc.py:1755: Received request to allow any host access for nqn.2016-06.io.spdk:cnode1, context: None
[23-May-2024 06:54:36] INFO grpc.py:2099: Received request to create cephqe-node2 TCP ipv4 listener for nqn.2016-06.io.spdk:cnode1 at 10.70.39.49:4420, context: None
[23-May-2024 06:54:36] INFO grpc.py:2099: Received request to create cephqe-node3 TCP ipv4 listener for nqn.2016-06.io.spdk:cnode1 at 10.70.39.50:4420, context: None

Comment 3 Orit Wasserman 2024-05-23 07:29:42 UTC
It looks like a configuration issue not related to mTLS. It only changes the transport and the CLI commands are the same.
Please verify you did not miss a step. 
What were the commands you used to configure the gateways? Please provide the full details in the BZ.
Please retry without mTLS to verify.

Thanks

Comment 6 Veera Raghava Reddy 2024-05-24 11:15:16 UTC
Hi Thomas,
Please attach this BZ to 7.1 Errata.

Comment 9 Krishna Ramaswamy 2024-05-27 08:41:23 UTC
Verified the mTLS configuration on the latest build : 190  ceph version 18.2.1-190.el9cp (5eee8f17de7cfe7a752abc74828d97473040534e) reef (stable)

the following behaviour observed. 

Scenario 1:

with mTLS configuration steps

1. Deployed ceph cluster with latest build: ceph version 18.2.1-190.el9cp (5eee8f17de7cfe7a752abc74828d97473040534e) reef (stable)
2. Configured the mTLS on ceph cluster
[root@cephqe-node1 ~]# cat gw-conf-with-mtls.yaml | grep auth
enable_auth: true
3. Applied the changes.
[root@cephqe-node1 ~]# ceph orch apply -i gw-conf-with-mtls.yaml 
Scheduled nvmeof.rbd update...
[root@cephqe-node1 ~]# ceph orch reconfig nvmeof.rbd
Scheduled to reconfig nvmeof.rbd.cephqe-node2.jebusx on host 'cephqe-node2'
Scheduled to reconfig nvmeof.rbd.cephqe-node3.gykptt on host 'cephqe-node3'
[root@cephqe-node1 ~]# ceph orch ls
NAME                       PORTS             RUNNING  REFRESHED  AGE  PLACEMENT                  
alertmanager               ?:9093,9094           1/1  7m ago     4d   count:1                    
ceph-exporter                                    4/4  7m ago     4d   *                          
crash                                            4/4  7m ago     4d   *                          
grafana                    ?:3000                1/1  7m ago     4d   count:1                    
mgr                                              2/2  7m ago     4d   count:2                    
mon                                              3/3  7m ago     4d   label:mon                  
node-exporter              ?:9100                4/4  7m ago     3h   *                          
node-proxy                                       0/0  -          4d   *                          
nvmeof.rbd                 ?:4420,5500,8009      2/2  5m ago     23s  cephqe-node2;cephqe-node3  
osd.all-available-devices                         15  7m ago     4d   *                          
prometheus                 ?:9095                1/1  7m ago     4d   count:1 

[root@cephqe-node1 ~]# ceph nvme-gw show rbd ''
{
    "epoch": 51,
    "pool": "rbd",
    "group": "",
    "num gws": 2,
    "Anagrp list": "[ 1 4 ]"
}
{
    "gw-id": "client.nvmeof.rbd.cephqe-node2.jebusx",
    "anagrp-id": 1,
    "performed-full-startup": 0,
    "Availability": "UNAVAILABLE",
    "ana states": " 1: STANDBY , 4: STANDBY ,"
}
{
    "gw-id": "client.nvmeof.rbd.cephqe-node3.gykptt",
    "anagrp-id": 4,
    "performed-full-startup": 0,
    "Availability": "UNAVAILABLE",
    "ana states": " 1: STANDBY , 4: STANDBY ,"
}    

Scenario 2:

with-out mTLS configuration steps

[root@cephqe-node1 ~]# cat gw-conf-with-mtls.yaml | grep auth
  enable_auth: false
[root@cephqe-node1 ~]# ceph nvme-gw show rbd ''
{
    "epoch": 57,
    "pool": "rbd",
    "group": "",
    "num gws": 2,
    "Anagrp list": "[ 1 4 ]"
}
{
    "gw-id": "client.nvmeof.rbd.cephqe-node2.jebusx",
    "anagrp-id": 1,
    "performed-full-startup": 1,
    "Availability": "AVAILABLE",
    "ana states": " 1: ACTIVE , 4: STANDBY ,"
}
{
    "gw-id": "client.nvmeof.rbd.cephqe-node3.gykptt",
    "anagrp-id": 4,
    "performed-full-startup": 1,
    "Availability": "AVAILABLE",
    "ana states": " 1: STANDBY , 4: ACTIVE ,"
}

Comment 10 Krishna Ramaswamy 2024-05-27 08:43:10 UTC
[root@cephqe-node2 ~]# nvm gw 

infoEnable server auth since both --client-key and --client-cert are providedCLI's version: 1.2.10
Gateway's version: 1.2.10
Gateway's name: client.nvmeof.rbd.cephqe-node2.jebusx
Gateway's host name: cephqe-node2
Gateway's load balancing group: 1
Gateway's address: 10.70.39.49
Gateway's port: 5500
SPDK version: 24.01.1

Comment 11 Aviv Caro 2024-05-27 12:46:43 UTC
mTLS is deferred to 7.1 z

Comment 18 errata-xmlrpc 2024-08-07 11:21:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.1 security and bug fix update.), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:5080


Note You need to log in before you can comment on or make changes to this bug.