Description of problem: Cephadm needs to support deploying NVMEoF containers. Please backport the changes from upstream to RHCS 7 Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
https://bugzilla.redhat.com/show_bug.cgi?id=2240588 NVMeoF deployment failed with Rados permission error as below The first is the gateway container: registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:0.0.3-1 >>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:control.server:SPDK Target Path: /usr/local/bin/nvmf_tgt >>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:control.server:SPDK Socket: /var/tmp/spdk.sock >>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:control.server:Starting /usr/local/bin/nvmf_tgt -u -r /var/tmp/spdk.sock >>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:control.server:Attempting to initialize SPDK: rpc_socket: /var/tmp/spdk.sock, conn_retries: 300, timeout: 60.0 >>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO: Setting log level to WARN >>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:JSONRPCClient(/var/tmp/spdk.sock):Setting log level to WARN >>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:14.788580] Starting SPDK v23.01.1 / DPDK 22.11.0 initialization... >>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:14.788679] [ DPDK EAL parameters: nvmf --no-shconf -c 0x1 --no-pci --huge-unlink --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --iova-mode=pa --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid3 ] >>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: TELEMETRY: No legacy callbacks, legacy socket not created >>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:14.921881] app.c: 712:spdk_app_start: *NOTICE*: Total cores available: 1 >>>Sep 25 07:13:14 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:14.967586] reactor.c: 926:reactor_run: *NOTICE*: Reactor started on core 0 >>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:15.004660] accel_sw.c: 681:sw_accel_module_init: *NOTICE*: Accel framework software module initialized. >>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: DEBUG:control.server:create_transport: tcp options: {"in_capsule_data_size" : 8192, "max_io_qpairs_per_ctrlr" : 7} >>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: [2023-09-25 11:13:15.149482] tcp.c: 629:nvmf_tcp_create: *NOTICE*: *** TCP Transport Init *** >>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: ERROR:control.state:Unable to create omap: [errno 13] RADOS permission denied (error connecting to the cluster). Exiting! >>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: ERROR:control.server:GatewayServer exception occurred: >>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: Traceback (most recent call last): >>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: File "/remote-source/ceph-nvmeof/app/control/__main__.py", line 35, in <module> >>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: gateway.serve() >>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: File "/remote-source/ceph-nvmeof/app/control/server.py", line 98, in serve >>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: omap_state = OmapGatewayState(self.config) >>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: File "/remote-source/ceph-nvmeof/app/control/state.py", line 179, in __init__ >>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: conn.connect() >>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: File "rados.pyx", line 690, in rados.Rados.connect >>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: rados.PermissionDeniedError: [errno 13] RADOS permission denied (error connecting to the cluster) >>>Sep 25 07:13:15 ceph-2sunilkumar-lxztyv-node2 ceph-c6963904-5b8d-11ee-924e-fa163e19eae4-nvmeof-rbd-ceph-2sunilkumar-lxztyv-node2-huaunh[15284]: INFO:control.server:Terminating SPDK(ceph-2sunilkumar-lxztyv-node2) pid 3... >>> Version-Release number of selected component (if applicable): # ceph version ceph version 18.2.0-47.el9cp (4b54338fe0c515edb779f25acdcc35dcee28c821) reef (stable) NVMe image: registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:0.0.3-1 How reproducible: always Steps to Reproduce: 1. Deployed the cluster latest downstream ceph image 2. set nvmeof container image configuration on the cluster with below command ceph config set mgr mgr/cephadm/container_image_nvmeof registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:0.0.3-1 3. deploy the nvmeof GW using orchestrator. ceph orch apply nvmeof rbd --placement="ceph-2sunilkumar-lxztyv-node2" Scheduled nvmeof.rbd update... Actual results: # ceph orch ps | grep nvm nvmeof.rbd.ceph-2sunilkumar-lxztyv-node2.huaunh ceph-2sunilkumar-lxztyv-node2 *:5500,4420,8009 error - 10m - - <unknown> <unknown> <unknown> # ceph orch ps | grep nvm nvmeof.rbd.ceph-2sunilkumar-lxztyv-node2.huaunh ceph-2sunilkumar-lxztyv-node2 *:5500,4420,8009 error - 10m - - <unknown> <unknown> <unknown> Expected results: Gateway should be deployed
Deployment of NVMeoF GW are successfull in most of recent builds available. Hence marking this BZ as verified. http://magna002.ceph.redhat.com/cephci-jenkins/cephci-run-8FJUUA/Basic_E2ETest_Ceph_NVMEoF_GW_sanity_test_0.log
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:7780