Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use https://ibm-ceph.atlassian.net/ for all bug tracking management.

Bug 2282833

Summary: With mTLS configuration in NVMe Gateway is in CREATED state and GW's are in standby mode
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Krishna Ramaswamy <kramaswa>
Component: NVMeOFAssignee: Aviv Caro <acaro>
Status: CLOSED ERRATA QA Contact: Krishna Ramaswamy <kramaswa>
Severity: urgent Docs Contact: Akash Raj <akraj>
Priority: urgent    
Version: 7.1CC: akraj, cephqe-warriors, jcaratza, mburkhar, mmurthy, owasserm, rlepaksh, rpollack, sunnagar, tserlin, vereddy
Target Milestone: ---   
Target Release: 7.1z1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-18.2.1-201.el9cp Doc Type: Known Issue
Doc Text:
When using NVMe-oF gateway CLI commands, mTLS options are shown as available. mTLS is not currently supported and cannot be used.
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-08-07 11:21:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2267614, 2298578, 2298579    

Description Krishna Ramaswamy 2024-05-23 06:38:29 UTC
Description of problem:
With mTLS configuration in NVMe Gateway is in CREATED state and GW's are in standby mode

Version-Release number of selected component (if applicable):ceph 7.1 
Build Version : 18.2.1-188

Steps Performed: 
1. Deployed Ceph Cluster on 5 Node Baremetal Servers.
2. Configured NVMe Service with  mTLS in ceph cluster.
3. Creation of subsystem, host, listener, namespace are successful.
4. But NVMe GW's are still in standby mode as shown below

Error Log:

[ceph: root@cephqe-node1 /]# ceph nvme-gw show rbd ''
{
    "epoch": 11,
    "pool": "rbd",
    "group": "",
    "num gws": 2,
    "Anagrp list": "[ 1 2 ]"
}
{
    "gw-id": "client.nvmeof.rbd.cephqe-node2.tfngbt",
    "anagrp-id": 1,
    "performed-full-startup": 1,
    "Availability": "CREATED",
    "ana states": " 1: STANDBY , 2: STANDBY ,"
}
{
    "gw-id": "client.nvmeof.rbd.cephqe-node3.fpbdir",
    "anagrp-id": 2,
    "performed-full-startup": 1,
    "Availability": "CREATED",
    "ana states": " 1: STANDBY , 2: STANDBY ,"
}
[ceph: root@cephqe-node1 /]# 



Impact: 

Due to this issue , the Namespaces are not able to use at the client side ( ESXi or RHEL Initiator ) because the path will be in standby mode instead of Active mode.

Here is the Initiator ESXi Log:

[root@rhocs-bm8:~] esxcli storage core path list | grep -A 10 vmhba65
   Runtime Name: vmhba65:C0:T3:L0
   Device: No associated device
   Device Display Name: No associated device
   Adapter: vmhba65
   Channel: 0
   Target: 3
   LUN: 0
   Plugin: HPP
   State: standby
   Transport: tcp
   Adapter Identifier: tcp.vmnic1:0c:42:a1:6c:a8:ed
   Target Identifier: tcp.unknown
   Adapter Transport Details: Unavailable or path is unclaimed
   Target Transport Details: Unavailable or path is unclaimed
--
   Runtime Name: vmhba65:C0:T2:L0
   Device: No associated device
   Device Display Name: No associated device
   Adapter: vmhba65
   Channel: 0
   Target: 2
   LUN: 0
   Plugin: HPP
   State: standby
   Transport: tcp
   Adapter Identifier: tcp.vmnic1:0c:42:a1:6c:a8:ed
   Target Identifier: tcp.unknown
   Adapter Transport Details: Unavailable or path is unclaimed
   Target Transport Details: Unavailable or path is unclaimed
[root@rhocs-bm8:~]

Comment 2 Krishna Ramaswamy 2024-05-23 07:18:09 UTC
Tried with RHEL Initiator: Same issue.

[root@ceph-sunilkumar-01-2d2ssd-node8 ~]# nvme connect-all  --trsvcid 8009 --transport tcp --traddr 10.70.39.49 --ctrl-loss-tmo 3600
[root@ceph-sunilkumar-01-2d2ssd-node8 ~]# nvme list
Node                  Generic               SN                   Model                                    Namespace  Usage                      Format           FW Rev  
--------------------- --------------------- -------------------- ---------------------------------------- ---------- -------------------------- ---------------- --------
[root@ceph-sunilkumar-01-2d2ssd-node8 ~]# 


Gate Way Node Log:

[root@cephqe-node2 nvmeof-client.nvmeof.rbd.cephqe-node2.tfngbt]# tail -f nvmeof-log 
[23-May-2024 06:54:36] INFO grpc.py:2648: Received request to get gateway's info
[23-May-2024 06:54:36] INFO prometheus.py:131: Stats for all bdevs will be provided
[23-May-2024 06:54:36] INFO grpc.py:914: Received request to add a namespace using NSID 1 and UUID 5834f3cc-8dd6-4d37-97b3-8a3769fe9b89 to nqn.2016-06.io.spdk:cnode1,  ana group 1  context: None
[23-May-2024 06:54:36] INFO grpc.py:274: Received request to create bdev bdev_5834f3cc-8dd6-4d37-97b3-8a3769fe9b89 from rbd/image1 (size 0 bytes) with block size 512, will not create image if doesn't exist
[23-May-2024 06:54:36] INFO grpc.py:250: Allocated cluster name='cluster_context_1_0' nonce='10.70.39.49:0/312171512' anagrp=1
[23-May-2024 06:54:36] INFO grpc.py:209: get_cluster cluster_name='cluster_context_1_0' number bdevs: 1
[23-May-2024 06:54:36] INFO grpc.py:760: Received request to add bdev_5834f3cc-8dd6-4d37-97b3-8a3769fe9b89 to nqn.2016-06.io.spdk:cnode1 with ANA group id 1 using NSID 1 and UUID 5834f3cc-8dd6-4d37-97b3-8a3769fe9b89
[23-May-2024 06:54:36] INFO grpc.py:1755: Received request to allow any host access for nqn.2016-06.io.spdk:cnode1, context: None
[23-May-2024 06:54:36] INFO grpc.py:2099: Received request to create cephqe-node2 TCP ipv4 listener for nqn.2016-06.io.spdk:cnode1 at 10.70.39.49:4420, context: None
[23-May-2024 06:54:36] INFO grpc.py:2099: Received request to create cephqe-node3 TCP ipv4 listener for nqn.2016-06.io.spdk:cnode1 at 10.70.39.50:4420, context: None

Comment 3 Orit Wasserman 2024-05-23 07:29:42 UTC
It looks like a configuration issue not related to mTLS. It only changes the transport and the CLI commands are the same.
Please verify you did not miss a step. 
What were the commands you used to configure the gateways? Please provide the full details in the BZ.
Please retry without mTLS to verify.

Thanks

Comment 6 Veera Raghava Reddy 2024-05-24 11:15:16 UTC
Hi Thomas,
Please attach this BZ to 7.1 Errata.

Comment 9 Krishna Ramaswamy 2024-05-27 08:41:23 UTC
Verified the mTLS configuration on the latest build : 190  ceph version 18.2.1-190.el9cp (5eee8f17de7cfe7a752abc74828d97473040534e) reef (stable)

the following behaviour observed. 

Scenario 1:

with mTLS configuration steps

1. Deployed ceph cluster with latest build: ceph version 18.2.1-190.el9cp (5eee8f17de7cfe7a752abc74828d97473040534e) reef (stable)
2. Configured the mTLS on ceph cluster
[root@cephqe-node1 ~]# cat gw-conf-with-mtls.yaml | grep auth
enable_auth: true
3. Applied the changes.
[root@cephqe-node1 ~]# ceph orch apply -i gw-conf-with-mtls.yaml 
Scheduled nvmeof.rbd update...
[root@cephqe-node1 ~]# ceph orch reconfig nvmeof.rbd
Scheduled to reconfig nvmeof.rbd.cephqe-node2.jebusx on host 'cephqe-node2'
Scheduled to reconfig nvmeof.rbd.cephqe-node3.gykptt on host 'cephqe-node3'
[root@cephqe-node1 ~]# ceph orch ls
NAME                       PORTS             RUNNING  REFRESHED  AGE  PLACEMENT                  
alertmanager               ?:9093,9094           1/1  7m ago     4d   count:1                    
ceph-exporter                                    4/4  7m ago     4d   *                          
crash                                            4/4  7m ago     4d   *                          
grafana                    ?:3000                1/1  7m ago     4d   count:1                    
mgr                                              2/2  7m ago     4d   count:2                    
mon                                              3/3  7m ago     4d   label:mon                  
node-exporter              ?:9100                4/4  7m ago     3h   *                          
node-proxy                                       0/0  -          4d   *                          
nvmeof.rbd                 ?:4420,5500,8009      2/2  5m ago     23s  cephqe-node2;cephqe-node3  
osd.all-available-devices                         15  7m ago     4d   *                          
prometheus                 ?:9095                1/1  7m ago     4d   count:1 

[root@cephqe-node1 ~]# ceph nvme-gw show rbd ''
{
    "epoch": 51,
    "pool": "rbd",
    "group": "",
    "num gws": 2,
    "Anagrp list": "[ 1 4 ]"
}
{
    "gw-id": "client.nvmeof.rbd.cephqe-node2.jebusx",
    "anagrp-id": 1,
    "performed-full-startup": 0,
    "Availability": "UNAVAILABLE",
    "ana states": " 1: STANDBY , 4: STANDBY ,"
}
{
    "gw-id": "client.nvmeof.rbd.cephqe-node3.gykptt",
    "anagrp-id": 4,
    "performed-full-startup": 0,
    "Availability": "UNAVAILABLE",
    "ana states": " 1: STANDBY , 4: STANDBY ,"
}    

Scenario 2:

with-out mTLS configuration steps

[root@cephqe-node1 ~]# cat gw-conf-with-mtls.yaml | grep auth
  enable_auth: false
[root@cephqe-node1 ~]# ceph nvme-gw show rbd ''
{
    "epoch": 57,
    "pool": "rbd",
    "group": "",
    "num gws": 2,
    "Anagrp list": "[ 1 4 ]"
}
{
    "gw-id": "client.nvmeof.rbd.cephqe-node2.jebusx",
    "anagrp-id": 1,
    "performed-full-startup": 1,
    "Availability": "AVAILABLE",
    "ana states": " 1: ACTIVE , 4: STANDBY ,"
}
{
    "gw-id": "client.nvmeof.rbd.cephqe-node3.gykptt",
    "anagrp-id": 4,
    "performed-full-startup": 1,
    "Availability": "AVAILABLE",
    "ana states": " 1: STANDBY , 4: ACTIVE ,"
}

Comment 10 Krishna Ramaswamy 2024-05-27 08:43:10 UTC
[root@cephqe-node2 ~]# nvm gw 

infoEnable server auth since both --client-key and --client-cert are providedCLI's version: 1.2.10
Gateway's version: 1.2.10
Gateway's name: client.nvmeof.rbd.cephqe-node2.jebusx
Gateway's host name: cephqe-node2
Gateway's load balancing group: 1
Gateway's address: 10.70.39.49
Gateway's port: 5500
SPDK version: 24.01.1

Comment 11 Aviv Caro 2024-05-27 12:46:43 UTC
mTLS is deferred to 7.1 z

Comment 18 errata-xmlrpc 2024-08-07 11:21:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.1 security and bug fix update.), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:5080