Bug 2277699 - NVMe Deployment failed with Ceph 18.2.1-155 and NVMeoF 1.2.4-1
Summary: NVMe Deployment failed with Ceph 18.2.1-155 and NVMeoF 1.2.4-1
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: NVMeOF
Version: 7.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 7.1
Assignee: Aviv Caro
QA Contact: Manohar Murthy
ceph-doc-bot
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-04-29 09:14 UTC by Sunil Kumar Nagaraju
Modified: 2024-06-13 14:32 UTC (History)
2 users (show)

Fixed In Version: ceph-nvmeof-container-1.2.5-3
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-06-13 14:32:20 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHCEPH-8911 0 None None None 2024-04-30 14:08:12 UTC
Red Hat Product Errata RHSA-2024:3925 0 None None None 2024-06-13 14:32:22 UTC

Description Sunil Kumar Nagaraju 2024-04-29 09:14:40 UTC
Created attachment 2029954 [details]
NVMe service log

Description of problem:

NVMe Deployment failed with Ceph 18.2.1-155 and NVMeoF 1.2.4-1 versions.


Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [2024-04-28 06:37:59.596636] app.c: 712:spdk_app_start: *NOTICE*: Total cores available: 4
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [2024-04-28 06:37:59.650074] reactor.c: 926:reactor_run: *NOTICE*: Reactor started on core 1
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [2024-04-28 06:37:59.650143] reactor.c: 926:reactor_run: *NOTICE*: Reactor started on core 2
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [2024-04-28 06:37:59.650203] reactor.c: 926:reactor_run: *NOTICE*: Reactor started on core 3
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [2024-04-28 06:37:59.650207] reactor.c: 926:reactor_run: *NOTICE*: Reactor started on core 0
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [2024-04-28 06:37:59.692980] accel_sw.c: 681:sw_accel_module_init: *NOTICE*: Accel framework software module initialized.
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [2024-04-28 06:37:59.838632] tcp.c: 629:nvmf_tcp_create: *NOTICE*: *** TCP Transport Init ***
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [28-Apr-2024 06:37:59] INFO server.py:249: Discovery service process id: 63
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [28-Apr-2024 06:37:59] INFO server.py:245: Starting ceph nvmeof discovery service
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: [28-Apr-2024 06:37:59] ERROR server.py:108: GatewayServer exception occurred:
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: Traceback (most recent call last):
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]:   File "/remote-source/ceph-nvmeof/app/control/__main__.py", line 43, in <module>
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]:     gateway.serve()
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]:   File "/remote-source/ceph-nvmeof/app/control/server.py", line 177, in serve
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]:     omap_lock = OmapLock(omap_state, gateway_state)
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]:   File "/remote-source/ceph-nvmeof/app/control/state.py", line 201, in __init__
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]:     self.omap_file_lock_retry_sleep_interval = self.omap_state.config.getint_with_default("gateway",
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]:   File "/remote-source/ceph-nvmeof/app/control/config.py", line 47, in getint_with_default
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]:     return self.config.getint(section, param, fallback=value)
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]:   File "/usr/lib64/python3.9/configparser.py", line 818, in getint
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]:     return self._get_conv(section, option, int, raw=raw, vars=vars,
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]:   File "/usr/lib64/python3.9/configparser.py", line 808, in _get_conv
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]:     return self._get(section, conv, option, raw=raw, vars=vars,
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]:   File "/usr/lib64/python3.9/configparser.py", line 803, in _get
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]:     return conv(self.get(section, option, **kwargs))
Apr 28 02:37:59 ceph-sunilkumar-00-pvlfdn-node7 ceph-3c4aaa88-0528-11ef-a216-fa163e4f1077-nvmeof-rbd-ceph-sunilkumar-00-pvlfdn-node7-xxjgor[14769]: ValueError: invalid literal for int() with base 10: '1.0'


Version-Release number of selected component (if applicable):

nvmeof_image=registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:1.2.4-1 
nvmeof_cli_image=registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof-cli:1.2.4-1 
ceph-repo http://download.devel.redhat.com/rhel-9/composes/auto/ceph-7.1-rhel-9/RHCEPH-7.1-RHEL-9-20240424.ci.3 
Ceph-image= registry-proxy.engineering.redhat.com/rh-osbs/rhceph:ceph-7.1-rhel-9-containers-candidate-86483-20240424220941


How reproducible:


Steps to Reproduce:
1. Bootstrap Ceph cluster and add all Ceph daemons MON, MGR, OSDS
2. Create RBD pool and deploy NVMeoF.
3. User can notice the issue.

Comment 6 Sunil Kumar Nagaraju 2024-05-03 10:54:19 UTC
NVMeof Deployment works with new build,
Ceph 18.2.1-159 and NVMe 1.2.5-2

attching HA Sanity logs for reference.

Comment 8 Sunil Kumar Nagaraju 2024-05-03 10:55:41 UTC
[ceph: root@ceph-sunilkumar-00-bjcvqj-node1-installer /]# ceph orch ps --daemon_type nvmeof
NAME                                               HOST                             PORTS             STATUS        REFRESHED  AGE  MEM USE  MEM LIM  VERSION  IMAGE ID      CONTAINER ID
nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node6.vtwjfa  ceph-sunilkumar-00-bjcvqj-node6  *:5500,4420,8009  running (8m)     8m ago  20h     116M        -           fe96956aabcd  d1c6bed9582f
nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node7.vglfty  ceph-sunilkumar-00-bjcvqj-node7  *:5500,4420,8009  running (8m)     8m ago  20h     117M        -           fe96956aabcd  a2a276907e3f
nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node8.hnmfps  ceph-sunilkumar-00-bjcvqj-node8  *:5500,4420,8009  running (8m)     8m ago  20h     116M        -           fe96956aabcd  c7dad16fe4b1
nvmeof.rbd.ceph-sunilkumar-00-bjcvqj-node9.guwzdw  ceph-sunilkumar-00-bjcvqj-node9  *:5500,4420,8009  running (8m)     8m ago  20h    48.0M        -           fe96956aabcd  ef3345747f1a
[ceph: root@ceph-sunilkumar-00-bjcvqj-node1-installer /]# ceph orch ls --service_type nvmeof
NAME        PORTS             RUNNING  REFRESHED  AGE  PLACEMENT
nvmeof.rbd  ?:4420,5500,8009      4/4  8m ago     20h  ceph-sunilkumar-00-bjcvqj-node6;ceph-sunilkumar-00-bjcvqj-node7;ceph-sunilkumar-00-bjcvqj-node8;ceph-sunilkumar-00-bjcvqj-node9

Comment 9 errata-xmlrpc 2024-06-13 14:32:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Critical: Red Hat Ceph Storage 7.1 security, enhancements, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:3925


Note You need to log in before you can comment on or make changes to this bug.