Description of problem: Observing namespace mismatch as below ``` # ceph orch ps | grep nvmeof nvmeof.nvmeof_pool.argo023.xkyblu argo023 *:5500,4420,8009 running (105m) 105s ago 105m 550M - b09894a2fc25 fede1b63c50e nvmeof.nvmeof_pool.argo024.xrmciw argo024 *:5500,4420,8009 running (105m) 105s ago 105m 663M - b09894a2fc25 9d5c575f2be9 # podman run --rm cp.stg.icr.io/cp/ibm-ceph/nvmeof-cli-rhel9:1.2.0-1 --server-address 10.8.128.223 --server-port 5500 subsystem list Subsystems: ╒═══════════╤════════════════════════════╤════════════╤══════════╤══════════════════╤═════════════╤══════════════╕ │ Subtype │ NQN │ HA State │ Serial │ Controller IDs │ Namespace │ Max │ │ │ │ │ Number │ │ Count │ Namespaces │ ╞═══════════╪════════════════════════════╪════════════╪══════════╪══════════════════╪═════════════╪══════════════╡ │ NVMe │ nqn.2016-06.io.spdk:cnode1 │ enabled │ 1 │ 1-2040 │ 0 │ 2048 │ ├───────────┼────────────────────────────┼────────────┼──────────┼──────────────────┼─────────────┼──────────────┤ │ NVMe │ nqn.2016-06.io.spdk:cnode2 │ enabled │ 2 │ 1-2040 │ 200 │ 2048 │ ╘═══════════╧════════════════════════════╧════════════╧══════════╧══════════════════╧═════════════╧══════════════╛ # podman run --rm cp.stg.icr.io/cp/ibm-ceph/nvmeof-cli-rhel9:1.2.0-1 --server-address 10.8.128.224 --server-port 5500 subsystem list Subsystems: ╒═══════════╤════════════════════════════╤════════════╤══════════╤══════════════════╤═════════════╤══════════════╕ │ Subtype │ NQN │ HA State │ Serial │ Controller IDs │ Namespace │ Max │ │ │ │ │ Number │ │ Count │ Namespaces │ ╞═══════════╪════════════════════════════╪════════════╪══════════╪══════════════════╪═════════════╪══════════════╡ │ NVMe │ nqn.2016-06.io.spdk:cnode1 │ enabled │ 1 │ 2041-4080 │ 299 │ 2048 │ ├───────────┼────────────────────────────┼────────────┼──────────┼──────────────────┼─────────────┼──────────────┤ │ NVMe │ nqn.2016-06.io.spdk:cnode2 │ enabled │ 2 │ 2041-4080 │ 200 │ 2048 │ ╘═══════════╧════════════════════════════╧════════════╧══════════╧══════════════════╧═════════════╧══════════════╛ ``` ``` [root@argo023]# podman run --rm cp.stg.icr.io/cp/ibm-ceph/nvmeof-cli-rhel9:1.2.0-1 --server-address 10.8.128.223 --server-port 5500 namespace list -n nqn.2016-06.io.spdk:cnode1 No namespaces in subsystem nqn.2016-06.io.spdk:cnode1 [root@argo024 ~]# podman run --rm cp.stg.icr.io/cp/ibm-ceph/nvmeof-cli-rhel9:1.2.0-1 --server-address 10.8.128.224 --server-port 5500 namespace list -n nqn.2016-06.io.spdk:cnode1 Namespaces in subsystem nqn.2016-06.io.spdk:cnode1: ╒════════╤════════════════════════╤════════╤═══════════════╤═════════╤═════════╤═════════════════════╤═════════════╤═══════════╤═══════════╤════════════╤═════════════╕ │ NSID │ Bdev │ RBD │ RBD │ Image │ Block │ UUID │ Load │ R/W IOs │ R/W MBs │ Read MBs │ Write MBs │ │ │ Name │ Pool │ Image │ Size │ Size │ │ Balancing │ per │ per │ per │ per │ │ │ │ │ │ │ │ │ Group │ second │ second │ second │ second │ ╞════════╪════════════════════════╪════════╪═══════════════╪═════════╪═════════╪═════════════════════╪═════════════╪═══════════╪═══════════╪════════════╪═════════════╡ │ 1 │ bdev_f2783d15-41bc- │ rbd │ ZVWD-image1 │ 1 TiB │ 512 B │ f2783d15-41bc-4bb5- │ 1 │ unlimited │ unlimited │ unlimited │ unlimited │ │ │ 4bb5-896b-3925e28e44dd │ │ │ │ │ 896b-3925e28e44dd │ │ │ │ │ │ ├────────┼────────────────────────┼────────┼───────────────┼─────────┼─────────┼─────────────────────┼─────────────┼───────────┼───────────┼────────────┼─────────────┤ │ 2 │ bdev_4f6c0f8d-2ef1- │ rbd │ ZVWD-image2 │ 1 TiB │ 512 B │ 4f6c0f8d-2ef1-4d82- │ 1 │ unlimited │ unlimited │ unlimited │ unlimited │ │ │ 4d82-9435-cd03668e26dc │ │ │ │ │ 9435-cd03668e26dc │ │ │ │ │ │ . . . │ 298 │ bdev_58eff0f6-d273- │ rbd │ OHN3-image98 │ 1 TiB │ 512 B │ 58eff0f6-d273-4b99- │ 1 │ unlimited │ unlimited │ unlimited │ unlimited │ │ │ 4b99-95c4-ee2e6a5cb96c │ │ │ │ │ 95c4-ee2e6a5cb96c │ │ │ │ │ │ ├────────┼────────────────────────┼────────┼───────────────┼─────────┼─────────┼─────────────────────┼─────────────┼───────────┼───────────┼────────────┼─────────────┤ │ 299 │ bdev_d13690e8-ece5- │ rbd │ OHN3-image99 │ 1 TiB │ 512 B │ d13690e8-ece5-4bd5- │ 1 │ unlimited │ unlimited │ unlimited │ unlimited │ │ │ 4bd5-bf59-eddef0a6bc72 │ │ │ │ │ bf59-eddef0a6bc72 │ │ │ │ │ │ ╘════════╧════════════════════════╧════════╧═══════════════╧═════════╧═════════╧═════════════════════╧═════════════╧═══════════╧═══════════╧════════════╧═════════════╛ ``` Version-Release number of selected component (if applicable): # ceph version ceph version 18.2.1-136.el9cp (e7edde2b655d0dd9f860dda675f9d7954f07e6e3) reef (stable) cp.stg.icr.io/cp/ibm-ceph/nvmeof-rhel9:1.2.0-1 How reproducible: once till now Steps to Reproduce: 1.Deploy nvmeof service with cp.stg.icr.io/cp/ibm-ceph/nvmeof-rhel9:1.2.0-1 2. Configure 2 subsystems and scale to 400 namespaces - 200 per subsystem - successful 3. With IO to earlier namespaces, further scale to 100 namespaces on subsystem1 , it fails to add 299th namespace overall on that subsystem 4. Query for subsystem list and we see mismatch in count of NS Actual results: Query for subsystem lists and we see mismatch in namespaces of subsystem Expected results: Both Gateways state should be up to date Additional info:
Rahul is it still happening on latest downstream build?
Fixed in Ceph 7.1 Build (IBM-CEPH-7.1-202404190257.ci.0).
Closing this BZ as issue was not seen with latest builds Pass log - http://magna002.ceph.redhat.com/cephci-jenkins/test-runs/openstack/IBM/7.1/rhel-9/Regression/18.2.1-149/nvmeotcp/105/tier-3_2-nvmeof-gw_8-sub_ns/ http://magna002.ceph.redhat.com/cephci-jenkins/test-runs/openstack/IBM/7.1/rhel-9/Regression/18.2.1-149/nvmeotcp/105/tier-3_2-nvmeof-gw_2-sub_ns/ 2024-04-25 13:46:22,406 (cephci.test_ceph_nvmeof_gateway_sub_scale) [INFO] - cephci.IBM.7.1.rhel-9.Regression.18.2.1-149.nvmeotcp.105.cephci.ceph.nvmegw_cli.execute.py:16 - NVMe CLI command : namespace add 2024-04-25 13:46:22,407 (cephci.test_ceph_nvmeof_gateway_sub_scale) [INFO] - cephci.IBM.7.1.rhel-9.Regression.18.2.1-149.nvmeotcp.105.cephci.ceph.ceph.py:1568 - Running command podman run --quiet --rm cp.stg.icr.io/cp/ibm-ceph/nvmeof-cli-rhel9:1.2.4-1 --server-address 10.0.195.98 --server-port 5500 namespace add --rbd-image L5N6-image200 --nsid 200 --rbd-pool rbd --subsystem nqn.2016-06.io.spdk:cnode2 on 10.0.195.98 timeout 600 2024-04-25 13:46:23,540 (cephci.test_ceph_nvmeof_gateway_sub_scale) [INFO] - cephci.IBM.7.1.rhel-9.Regression.18.2.1-149.nvmeotcp.105.cephci.ceph.ceph.py:1602 - Command completed successfully 2024-04-25 13:46:23,548 (cephci.test_ceph_nvmeof_gateway_sub_scale) [INFO] - cephci.IBM.7.1.rhel-9.Regression.18.2.1-149.nvmeotcp.105.cephci.ceph.nvmegw_cli.execute.py:36 - ('', 'Adding namespace 200 to nqn.2016-06.io.spdk:cnode2, load balancing group 0: Successful\n') 2024-04-25 13:46:23,549 (cephci.test_ceph_nvmeof_gateway_sub_scale) [INFO] - cephci.IBM.7.1.rhel-9.Regression.18.2.1-149.nvmeotcp.105.cephci.ceph.nvmegw_cli.execute.py:16 - NVMe CLI command : namespace list 2024-04-25 13:46:23,550 (cephci.test_ceph_nvmeof_gateway_sub_scale) [INFO] - cephci.IBM.7.1.rhel-9.Regression.18.2.1-149.nvmeotcp.105.cephci.ceph.ceph.py:1568 - Running command podman run --quiet --rm cp.stg.icr.io/cp/ibm-ceph/nvmeof-cli-rhel9:1.2.4-1 --format json --server-address 10.0.195.98 --server-port 5500 namespace list --nsid 200 --subsystem nqn.2016-06.io.spdk:cnode2 on 10.0.195.98 timeout 600 2024-04-25 13:46:24,895 (cephci.test_ceph_nvmeof_gateway_sub_scale) [INFO] - cephci.IBM.7.1.rhel-9.Regression.18.2.1-149.nvmeotcp.105.cephci.ceph.ceph.py:1602 - Command completed successfully 2024-04-25 13:46:24,896 (cephci.test_ceph_nvmeof_gateway_sub_scale) [INFO] - cephci.IBM.7.1.rhel-9.Regression.18.2.1-149.nvmeotcp.105.cephci.ceph.nvmegw_cli.execute.py:36 - ('', '{\n "error_message": "Success",\n "subsystem_nqn": "nqn.2016-06.io.spdk:cnode2",\n "namespaces": [\n {\n "nsid": 200,\n "bdev_name": "bdev_84e30207-7a60-4657-b126-b2a59d036b76",\n "rbd_image_name": "L5N6-image200",\n "rbd_pool_name": "rbd",\n "load_balancing_group": 1,\n "block_size": 512,\n "rbd_image_size": "1099511627776",\n "uuid": "84e30207-7a60-4657-b126-b2a59d036b76",\n "rw_ios_per_second": "0",\n "rw_mbytes_per_second": "0",\n "r_mbytes_per_second": "0",\n "w_mbytes_per_second": "0"\n }\n ],\n "status": 0\n}\n')
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:3925