Bug 2239892

Summary: Deploy NVMEoF containers
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Justin Caratzas <jcaratza>
Component: CephadmAssignee: Adam King <adking>
Status: CLOSED ERRATA QA Contact: Sunil Kumar Nagaraju <sunnagar>
Severity: high Docs Contact: Rivka Pollack <rpollack>
Priority: unspecified    
Version: 7.0CC: adking, akraj, cephqe-warriors, mobisht, sunnagar, tserlin, vereddy
Target Milestone: ---   
Target Release: 7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-18.2.0-56.el9cp Doc Type: Enhancement
Doc Text:
.Deploy NVMe-oF Gateway using `cephadm` With this release, you can now deploy NVME over Fabrics either by using the `ceph orch apply nvmeof` command or by applying a service specification with the `service_type nvme`.
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-12-13 15:23:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2237662    

Description Justin Caratzas 2023-09-20 16:19:02 UTC
Description of problem:

Cephadm needs to support deploying NVMEoF containers. Please backport the changes from upstream to RHCS 7


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 Sunil Kumar Nagaraju 2023-09-26 05:38:30 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=2240588

NVMeoF deployment failed with Rados permission error as below

The first is the gateway container:
registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:0.0.3-1


>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:control.server:SPDK Target Path: /usr/local/bin/nvmf_tgt
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:control.server:SPDK Socket: /var/tmp/spdk.sock
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:control.server:Starting /usr/local/bin/nvmf_tgt -u -r /var/tmp/spdk.sock
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:control.server:Attempting to initialize SPDK: rpc_socket: /var/tmp/spdk.sock, conn_retries: 300, timeout: 60.0
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO: Setting log level to WARN
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:JSONRPCClient(/var/tmp/spdk.sock):Setting log level to WARN
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:14.788580] Starting SPDK v23.01.1 / DPDK 22.11.0 initialization...
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:14.788679] [ DPDK EAL parameters: nvmf --no-shconf -c 0x1 --no-pci --huge-unlink --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --iova-mode=pa --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid3 ]
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: TELEMETRY: No legacy callbacks, legacy socket not created
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:14.921881] app.c: 712:spdk_app_start: *NOTICE*: Total cores available: 1
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:14.967586] reactor.c: 926:reactor_run: *NOTICE*: Reactor started on core 0
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:15.004660] accel_sw.c: 681:sw_accel_module_init: *NOTICE*: Accel framework software module initialized.
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: DEBUG:control.server:create_transport: tcp options: {"in_capsule_data_size" : 8192, "max_io_qpairs_per_ctrlr" : 7}
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:15.149482] tcp.c: 629:nvmf_tcp_create: *NOTICE*: *** TCP Transport Init ***
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: ERROR:control.state:Unable to create omap: [errno 13] RADOS permission denied (error connecting to the cluster). Exiting!
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: ERROR:control.server:GatewayServer exception occurred:
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: Traceback (most recent call last):
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]:   File "/remote-source/ceph-nvmeof/app/control/__main__.py", line 35, in <module>
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]:     gateway.serve()
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]:   File "/remote-source/ceph-nvmeof/app/control/server.py", line 98, in serve
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]:     omap_state = OmapGatewayState(self.config)
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]:   File "/remote-source/ceph-nvmeof/app/control/state.py", line 179, in __init__
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]:     conn.connect()
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]:   File "rados.pyx", line 690, in rados.Rados.connect
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: rados.PermissionDeniedError: [errno 13] RADOS permission denied (error connecting to the cluster)
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:control.server:Terminating SPDK(ceph-2sunilkumar-lxztyv-node2) pid 3...
>>>

Version-Release number of selected component (if applicable):

# ceph version 
ceph version 18.2.0-47.el9cp (4b54338fe0c515edb779f25acdcc35dcee28c821) reef (stable)
NVMe image: registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:0.0.3-1


How reproducible: always


Steps to Reproduce:
1. Deployed the cluster latest downstream ceph image 

2. set nvmeof container image configuration on the cluster with below command

ceph config set mgr mgr/cephadm/container_image_nvmeof registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:0.0.3-1

3. deploy the nvmeof GW using orchestrator.

 ceph orch apply nvmeof rbd --placement="ceph-2sunilkumar-lxztyv-node2"
Scheduled nvmeof.rbd update...


Actual results:
# ceph orch ps | grep nvm 
nvmeof.rbd.ceph-2sunilkumar-lxztyv-node2.huaunh        ceph-2sunilkumar-lxztyv-node2            *:5500,4420,8009  error                  -  10m        -        -  <unknown>        <unknown>     <unknown>   
# ceph orch ps | grep nvm 
nvmeof.rbd.ceph-2sunilkumar-lxztyv-node2.huaunh        ceph-2sunilkumar-lxztyv-node2            *:5500,4420,8009  error                  -  10m        -        -  <unknown>        <unknown>     <unknown>   


Expected results:
Gateway should be deployed

Comment 11 Sunil Kumar Nagaraju 2023-10-27 04:51:57 UTC
Deployment of NVMeoF GW are successfull in most of recent builds available. Hence marking this BZ as verified.

http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-8FJUUA/Basic_E2ETest_Ceph_NVMEoF_GW_sanity_test_0.log

Comment 14 errata-xmlrpc 2023-12-13 15:23:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7780