Bug 2239892 - Deploy NVMEoF containers
Summary: Deploy NVMEoF containers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Cephadm
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 7.0
Assignee: Adam King
QA Contact: Sunil Kumar Nagaraju
Rivka Pollack
URL:
Whiteboard:
Depends On:
Blocks: 2237662
TreeView+ depends on / blocked
 
Reported: 2023-09-20 16:19 UTC by Justin Caratzas
Modified: 2023-12-13 15:23 UTC (History)
7 users (show)

Fixed In Version: ceph-18.2.0-56.el9cp
Doc Type: Enhancement
Doc Text:
.Deploy NVMe-oF Gateway using `cephadm` With this release, you can now deploy NVME over Fabrics either by using the `ceph orch apply nvmeof` command or by applying a service specification with the `service_type nvme`.
Clone Of:
Environment:
Last Closed: 2023-12-13 15:23:26 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-7497 0 None None None 2023-09-20 16:20:29 UTC
Red Hat Product Errata RHBA-2023:7780 0 None None None 2023-12-13 15:23:30 UTC

Description Justin Caratzas 2023-09-20 16:19:02 UTC
Description of problem:

Cephadm needs to support deploying NVMEoF containers. Please backport the changes from upstream to RHCS 7


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 8 Sunil Kumar Nagaraju 2023-09-26 05:38:30 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=2240588

NVMeoF deployment failed with Rados permission error as below

The first is the gateway container:
registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:0.0.3-1


>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:control.server:SPDK Target Path: /usr/local/bin/nvmf_tgt
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:control.server:SPDK Socket: /var/tmp/spdk.sock
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:control.server:Starting /usr/local/bin/nvmf_tgt -u -r /var/tmp/spdk.sock
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:control.server:Attempting to initialize SPDK: rpc_socket: /var/tmp/spdk.sock, conn_retries: 300, timeout: 60.0
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO: Setting log level to WARN
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:JSONRPCClient(/var/tmp/spdk.sock):Setting log level to WARN
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:14.788580] Starting SPDK v23.01.1 / DPDK 22.11.0 initialization...
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:14.788679] [ DPDK EAL parameters: nvmf --no-shconf -c 0x1 --no-pci --huge-unlink --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --iova-mode=pa --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid3 ]
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: TELEMETRY: No legacy callbacks, legacy socket not created
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:14.921881] app.c: 712:spdk_app_start: *NOTICE*: Total cores available: 1
>>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:14.967586] reactor.c: 926:reactor_run: *NOTICE*: Reactor started on core 0
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:15.004660] accel_sw.c: 681:sw_accel_module_init: *NOTICE*: Accel framework software module initialized.
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: DEBUG:control.server:create_transport: tcp options: {"in_capsule_data_size" : 8192, "max_io_qpairs_per_ctrlr" : 7}
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:15.149482] tcp.c: 629:nvmf_tcp_create: *NOTICE*: *** TCP Transport Init ***
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: ERROR:control.state:Unable to create omap: [errno 13] RADOS permission denied (error connecting to the cluster). Exiting!
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: ERROR:control.server:GatewayServer exception occurred:
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: Traceback (most recent call last):
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]:   File "/remote-source/ceph-nvmeof/app/control/__main__.py", line 35, in <module>
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]:     gateway.serve()
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]:   File "/remote-source/ceph-nvmeof/app/control/server.py", line 98, in serve
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]:     omap_state = OmapGatewayState(self.config)
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]:   File "/remote-source/ceph-nvmeof/app/control/state.py", line 179, in __init__
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]:     conn.connect()
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]:   File "rados.pyx", line 690, in rados.Rados.connect
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: rados.PermissionDeniedError: [errno 13] RADOS permission denied (error connecting to the cluster)
>>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:control.server:Terminating SPDK(ceph-2sunilkumar-lxztyv-node2) pid 3...
>>>

Version-Release number of selected component (if applicable):

# ceph version 
ceph version 18.2.0-47.el9cp (4b54338fe0c515edb779f25acdcc35dcee28c821) reef (stable)
NVMe image: registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:0.0.3-1


How reproducible: always


Steps to Reproduce:
1. Deployed the cluster latest downstream ceph image 

2. set nvmeof container image configuration on the cluster with below command

ceph config set mgr mgr/cephadm/container_image_nvmeof registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:0.0.3-1

3. deploy the nvmeof GW using orchestrator.

 ceph orch apply nvmeof rbd --placement="ceph-2sunilkumar-lxztyv-node2"
Scheduled nvmeof.rbd update...


Actual results:
# ceph orch ps | grep nvm 
nvmeof.rbd.ceph-2sunilkumar-lxztyv-node2.huaunh        ceph-2sunilkumar-lxztyv-node2            *:5500,4420,8009  error                  -  10m        -        -  <unknown>        <unknown>     <unknown>   
# ceph orch ps | grep nvm 
nvmeof.rbd.ceph-2sunilkumar-lxztyv-node2.huaunh        ceph-2sunilkumar-lxztyv-node2            *:5500,4420,8009  error                  -  10m        -        -  <unknown>        <unknown>     <unknown>   


Expected results:
Gateway should be deployed

Comment 11 Sunil Kumar Nagaraju 2023-10-27 04:51:57 UTC
Deployment of NVMeoF GW are successfull in most of recent builds available. Hence marking this BZ as verified.

http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-8FJUUA/Basic_E2ETest_Ceph_NVMEoF_GW_sanity_test_0.log

Comment 14 errata-xmlrpc 2023-12-13 15:23:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7780


Note You need to log in before you can comment on or make changes to this bug.