Description of problem: I had a ceph-nvmeof setup with a subsystem and nearly 150 namespaces and able to see them with get_subsystems command. Upon nvmeof service restart, restart is successful but GW is unable to load all configured targets(subsystem and namespaces)though OMAP holds all entries and updates and get_subsystems lists none Version-Release number of selected component (if applicable): ceph version 18.0.0-6175-g38f11f28 (38f11f28b05bfd80aaf2644cc1660ef8b51dd272) reef (dev) How reproducible: always Steps to Reproduce: 1. Deploy NVMeOF GW on ceph cluster and scale its entities - 1 subsystem and nearly 150 namespaces 2. Restart nvmeof service - "ceph orch restart nvmeof.rbd" 3. Upon get_subsystems, no entities are listed Actual results: GW fails to load OMAP entries upon a restart Expected results: GW should load OMAP entries correctly upon a restart Additional info: GW journalctl log - http://magna002.ceph.redhat.com/cephci-jenkins/nvmeof_restart_GW.log omap- http://magna002.ceph.redhat.com/cephci-jenkins/nvmeof_restart_omap.log
Another instance - OMAP entries are intact on nvmeof service restart but looks like GW is unable to consume these entries upon restart - http://pastebin.test.redhat.com/1109748
Tested this again with newer GW and ceph version ceph version 18.2.0-72.el9cp (3f281315a9c7d4bb2281729a5f3c3366ad99193d) reef (stable) registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:0.0.4-1 This issue seems to be intermittent and GW fails to load upon restart and get_subsystems command is issued while service is coming up. Logs at http://magna002.ceph.redhat.com/cephci-jenkins/nvmeof_gw_restart1.log Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Received request to create bdev U7U9-bdev245 from rbd/U7U9-image245 with block size 4096 Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: [2023-10-04 10:42:15.388799] bdev_rbd.c:1199:bdev_rbd_create: *NOTICE*: Add U7U9-bdev245 rbd disk to lun Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:create_bdev: U7U9-bdev245 Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Received request to create bdev U7U9-bdev195 from rbd/U7U9-image195 with block size 4096 Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Allocating cluster name='cluster_context_4' Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: [2023-10-04 10:42:15.440532] bdev_rbd.c:1199:bdev_rbd_create: *NOTICE*: Add U7U9-bdev195 rbd disk to lun Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:create_bdev: U7U9-bdev195 Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Received request to create bdev U7U9-bdev49 from rbd/U7U9-image49 with block size 4096 Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Received request to get subsystems Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: [2023-10-04 10:42:15.461168] bdev_rbd.c:1199:bdev_rbd_create: *NOTICE*: Add U7U9-bdev49 rbd disk to lun Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:create_bdev: U7U9-bdev49 Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Received request to create bdev U7U9-bdev243 from rbd/U7U9-image243 with block size 4096 Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:create_bdev: [] Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: Exception in thread Thread-1: Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: Traceback (most recent call last): Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: File "/usr/lib64/python3.9/threading.py", line 980, in _bootstrap_inner Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: self.run() Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: File "/usr/lib64/python3.9/threading.py", line 917, in run Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: self._target(*self._args, **self._kwargs) Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: File "/remote-source/ceph-nvmeof/app/control/state.py", line 420, in _update_caller Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: self.update() Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: File "/remote-source/ceph-nvmeof/app/control/state.py", line 465, in update Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: self._update_call_rpc(grouped_added, True, prefix_list) Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: File "/remote-source/ceph-nvmeof/app/control/state.py", line 487, in _update_call_rpc Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: self.gateway_rpc_caller(component_update, True) Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: File "/remote-source/ceph-nvmeof/app/control/server.py", line 297, in gateway_rpc_caller Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: self.gateway_rpc.create_bdev(req) Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: File "/remote-source/ceph-nvmeof/app/control/grpc.py", line 128, in create_bdev Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: return pb2.bdev(bdev_name=bdev_name, status=True) Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: TypeError: bad argument type for built-in operation Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: [2023-10-04 10:42:15.478168] bdev_rbd.c:1199:bdev_rbd_create: *NOTICE*: Add U7U9-bdev243 rbd disk to lun Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:get_subsystems: U7U9-bdev243 Oct 04 06:42:17 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Received request to get subsystems Oct 04 06:42:17 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:get_subsystems: [] Oct 04 06:42:19 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Received request to get subsystems Oct 04 06:42:19 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:get_subsystems: []
Handled in https://github.com/ceph/ceph-nvmeof/issues/255
Rahul I believe this is fixed with 0.0.5. Please validate.
Yes Aviv, it is but it is not downstream yet to validate it.
This will be again tested once it is downstream and then BZ will be marked closed
Fixed in 0.0.5. Please verify.
No need to add to RN, it is fixed in the build we have for 7.0.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:7780
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days