Created attachment 2029954 [details] NVMe service log Description of problem: NVMe Deployment failed with Ceph 18.2.1-155 and NVMeoF 1.2.4-1 versions. Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [2024-04-28 06:37:59.596636] app.c: 712:spdk_app_start: *NOTICE*: Total cores available: 4 Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [2024-04-28 06:37:59.650074] reactor.c: 926:reactor_run: *NOTICE*: Reactor started on core 1 Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [2024-04-28 06:37:59.650143] reactor.c: 926:reactor_run: *NOTICE*: Reactor started on core 2 Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [2024-04-28 06:37:59.650203] reactor.c: 926:reactor_run: *NOTICE*: Reactor started on core 3 Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [2024-04-28 06:37:59.650207] reactor.c: 926:reactor_run: *NOTICE*: Reactor started on core 0 Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [2024-04-28 06:37:59.692980] accel_sw.c: 681:sw_accel_module_init: *NOTICE*: Accel framework software module initialized. Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [2024-04-28 06:37:59.838632] tcp.c: 629:nvmf_tcp_create: *NOTICE*: *** TCP Transport Init *** Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [28-Apr-2024 06:37:59] INFO server.py:249: Discovery service process id: 63 Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [28-Apr-2024 06:37:59] INFO server.py:245: Starting ceph nvmeof discovery service Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [28-Apr-2024 06:37:59] ERROR server.py:108: GatewayServer exception occurred: Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: Traceback (most recent call last): Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: File "/remote-source/ceph-nvmeof/app/control/__main__.py", line 43, in <module> Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: gateway.serve() Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: File "/remote-source/ceph-nvmeof/app/control/server.py", line 177, in serve Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: omap_lock = OmapLock(omap_state, gateway_state) Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: File "/remote-source/ceph-nvmeof/app/control/state.py", line 201, in __init__ Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: self.omap_file_lock_retry_sleep_interval = self.omap_state.config.getint_with_default("gateway", Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: File "/remote-source/ceph-nvmeof/app/control/config.py", line 47, in getint_with_default Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: return self.config.getint(section, param, fallback=value) Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: File "/usr/lib64/python3.9/configparser.py", line 818, in getint Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: return self._get_conv(section, option, int, raw=raw, vars=vars, Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: File "/usr/lib64/python3.9/configparser.py", line 808, in _get_conv Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: return self._get(section, conv, option, raw=raw, vars=vars, Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: File "/usr/lib64/python3.9/configparser.py", line 803, in _get Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: return conv(self.get(section, option, **kwargs)) Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: ValueError: invalid literal for int() with base 10: '1.0' Version-Release number of selected component (if applicable): nvmeof_image=registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:1.2.4-1 nvmeof_cli_image=registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof-cli:1.2.4-1 ceph-repo http://download.devel.redhat.com/rhel-9/composes/auto/ceph-7.1-rhel-9/RHCEPH-7.1-RHEL-9-20240424.ci.3 Ceph-image= registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-7.1-rhel-9-containers-candidate-86483-20240424220941 How reproducible: Steps to Reproduce: 1. Bootstrap Ceph cluster and add all Ceph daemons MON, MGR, OSDS 2. Create RBD pool and deploy NVMeoF. 3. User can notice the issue.
Fixed in https://pkgs.devel.redhat.com/cgit/containers/ceph-nvmeof/commit/?h=ceph-7.1-rhel-9&id=3912329fbf36a7355510fe5786046e302139d66e Please verify with GW/CLI version >= 1.2.5.
NVMeof Deployment works with new build, Ceph 18.2.1-159 and NVMe 1.2.5-2 attching HA Sanity logs for reference.
[ceph: root@ceph-sunilkumar-00-bjcvqj-node1-installer /]# ceph orch ps --daemon_type nvmeof NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node6.vtwjfa ceph-sunilkumar-00-bjcvqj-node6 *:5500,4420,8009 running (8m) 8m ago 20h 116M - fe96956aabcd d1c6bed9582f nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node7.vglfty ceph-sunilkumar-00-bjcvqj-node7 *:5500,4420,8009 running (8m) 8m ago 20h 117M - fe96956aabcd a2a276907e3f nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node8.hnmfps ceph-sunilkumar-00-bjcvqj-node8 *:5500,4420,8009 running (8m) 8m ago 20h 116M - fe96956aabcd c7dad16fe4b1 nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node9.guwzdw ceph-sunilkumar-00-bjcvqj-node9 *:5500,4420,8009 running (8m) 8m ago 20h 48.0M - fe96956aabcd ef3345747f1a [ceph: root@ceph-sunilkumar-00-bjcvqj-node1-installer /]# ceph orch ls --service_type nvmeof NAME PORTS RUNNING REFRESHED AGE PLACEMENT nvmeof.rbd ?:4420,5500,8009 4/4 8m ago 20h ceph-sunilkumar-00-bjcvqj-node6;ceph-sunilkumar-00-bjcvqj-node7;ceph-sunilkumar-00-bjcvqj-node8;ceph-sunilkumar-00-bjcvqj-node9
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:3925