Bug 2240169 - Gateway fails to load subsystems upon nvmeof service restart
Summary: Gateway fails to load subsystems upon nvmeof service restart
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: NVMeOF
Version: 7.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 7.0
Assignee: Aviv Caro
QA Contact: Manohar Murthy
Rivka Pollack
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-09-22 08:26 UTC by Rahul Lepakshi
Modified: 2024-04-12 04:25 UTC (History)
5 users (show)

Fixed In Version: ceph-nvmeof-container-0.0.5-1
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-12-13 15:23:56 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-7523 0 None None None 2023-09-22 08:27:26 UTC
Red Hat Product Errata RHBA-2023:7780 0 None None None 2023-12-13 15:23:59 UTC

Description Rahul Lepakshi 2023-09-22 08:26:14 UTC
Description of problem:
I had a ceph-nvmeof setup with a subsystem and nearly 150 namespaces and able to see them with get_subsystems command.

Upon nvmeof service restart, restart is successful but GW is unable to load all configured targets(subsystem and namespaces)though OMAP holds all entries and updates and get_subsystems lists none 

Version-Release number of selected component (if applicable):
ceph version 18.0.0-6175-g38f11f28 (38f11f28b05bfd80aaf2644cc1660ef8b51dd272) reef (dev)

How reproducible:
always

Steps to Reproduce:
1. Deploy NVMeOF GW on ceph cluster and scale its entities - 1 subsystem and nearly 150 namespaces
2. Restart nvmeof service - "ceph orch restart nvmeof.rbd"
3. Upon get_subsystems, no entities are listed

Actual results:
GW fails to load OMAP entries upon a restart 

Expected results:
GW should load OMAP entries correctly upon a restart 

Additional info:
GW journalctl log - http://magna002.ceph.redhat.com/cephci-jenkins/nvmeof_restart_GW.log

omap- http://magna002.ceph.redhat.com/cephci-jenkins/nvmeof_restart_omap.log

Comment 1 Rahul Lepakshi 2023-09-22 08:42:33 UTC
Another instance - OMAP entries are intact on nvmeof service restart but looks like GW is unable to consume these entries upon restart - http://pastebin.test.redhat.com/1109748

Comment 3 Rahul Lepakshi 2023-10-04 11:47:25 UTC
Tested this again with newer GW and ceph version
ceph version 18.2.0-72.el9cp (3f281315a9c7d4bb2281729a5f3c3366ad99193d) reef (stable)
registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:0.0.4-1

This issue seems to be intermittent and GW fails to load upon restart and get_subsystems command is issued while service is coming up.
Logs at http://magna002.ceph.redhat.com/cephci-jenkins/nvmeof_gw_restart1.log

Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Received request to create bdev U7U9-bdev245 from rbd/U7U9-image245 with block size 4096
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: [2023-10-04 10:42:15.388799] bdev_rbd.c:1199:bdev_rbd_create: *NOTICE*: Add U7U9-bdev245 rbd disk to lun
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:create_bdev: U7U9-bdev245
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Received request to create bdev U7U9-bdev195 from rbd/U7U9-image195 with block size 4096
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Allocating cluster name='cluster_context_4'
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: [2023-10-04 10:42:15.440532] bdev_rbd.c:1199:bdev_rbd_create: *NOTICE*: Add U7U9-bdev195 rbd disk to lun
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:create_bdev: U7U9-bdev195
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Received request to create bdev U7U9-bdev49 from rbd/U7U9-image49 with block size 4096
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Received request to get subsystems
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: [2023-10-04 10:42:15.461168] bdev_rbd.c:1199:bdev_rbd_create: *NOTICE*: Add U7U9-bdev49 rbd disk to lun
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:create_bdev: U7U9-bdev49
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Received request to create bdev U7U9-bdev243 from rbd/U7U9-image243 with block size 4096
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:create_bdev: []
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: Exception in thread Thread-1:
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: Traceback (most recent call last):
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]:   File "/usr/lib64/python3.9/threading.py", line 980, in _bootstrap_inner
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]:     self.run()
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]:   File "/usr/lib64/python3.9/threading.py", line 917, in run
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]:     self._target(*self._args, **self._kwargs)
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]:   File "/remote-source/ceph-nvmeof/app/control/state.py", line 420, in _update_caller
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]:     self.update()
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]:   File "/remote-source/ceph-nvmeof/app/control/state.py", line 465, in update
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]:     self._update_call_rpc(grouped_added, True, prefix_list)
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]:   File "/remote-source/ceph-nvmeof/app/control/state.py", line 487, in _update_call_rpc
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]:     self.gateway_rpc_caller(component_update, True)
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]:   File "/remote-source/ceph-nvmeof/app/control/server.py", line 297, in gateway_rpc_caller
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]:     self.gateway_rpc.create_bdev(req)
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]:   File "/remote-source/ceph-nvmeof/app/control/grpc.py", line 128, in create_bdev
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]:     return pb2.bdev(bdev_name=bdev_name, status=True)
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: TypeError: bad argument type for built-in operation
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: [2023-10-04 10:42:15.478168] bdev_rbd.c:1199:bdev_rbd_create: *NOTICE*: Add U7U9-bdev243 rbd disk to lun
Oct 04 06:42:15 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:get_subsystems: U7U9-bdev243
Oct 04 06:42:17 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Received request to get subsystems
Oct 04 06:42:17 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:get_subsystems: []
Oct 04 06:42:19 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:Received request to get subsystems
Oct 04 06:42:19 ceph-nvmf3-hrhd31-node5 ceph-33fddd3a-626e-11ee-8bcb-fa163e0c7e19-nvmeof-rbd-ceph-nvmf3-hrhd31-node5-begrve[207500]: INFO:control.grpc:get_subsystems: []

Comment 4 Aviv Caro 2023-10-05 11:38:38 UTC
Handled in https://github.com/ceph/ceph-nvmeof/issues/255

Comment 5 Aviv Caro 2023-10-25 12:17:37 UTC
Rahul I believe this is fixed with 0.0.5. Please validate.

Comment 6 Rahul Lepakshi 2023-10-26 04:40:50 UTC
Yes Aviv, it is but it is not downstream yet to validate it.

Comment 7 Rahul Lepakshi 2023-10-31 09:27:37 UTC
This will be again tested once it is downstream and then BZ will be marked closed

Comment 8 Aviv Caro 2023-11-01 11:09:41 UTC
Fixed in 0.0.5. Please verify.

Comment 15 Aviv Caro 2023-11-21 08:09:53 UTC
No need to add to RN, it is fixed in the build we have for 7.0.

Comment 16 errata-xmlrpc 2023-12-13 15:23:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 7.0 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:7780

Comment 17 Red Hat Bugzilla 2024-04-12 04:25:31 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.