Bug 2246306
Summary: | Unable to delete block devices from GW | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Sunil Kumar Nagaraju <sunnagar> |
Component: | NVMeOF | Assignee: | Aviv Caro <acaro> |
Status: | CLOSED COMPLETED | QA Contact: | Sunil Kumar Nagaraju <sunnagar> |
Severity: | high | Docs Contact: | ceph-doc-bot <ceph-doc-bugzilla> |
Priority: | unspecified | ||
Version: | 7.0 | CC: | akraj, cephqe-warriors, gbregman, idryomov, mmurthy, owasserm, sostapov, tserlin, vereddy |
Target Milestone: | --- | Keywords: | Automation, Regression |
Target Release: | 7.0z3 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | ceph-nvmeof-container-0.0.5-1 | Doc Type: | Known Issue |
Doc Text: |
.When using Ceph NVMe-oF gateway, `bdevs` are not deleted during service removal
In Ceph NVMe-oF gateway the `podman run -it cp.icr.io/cp/ibm-ceph/nvmeof-cli-rhel9:latest --server-address GATEWAY_IP --server-port 5500 delete_bdev` command fails to delete block devices.
As a workaround, skip this step during the NVMe-oF service removal.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2024-06-13 12:59:31 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 2237662 |
Description
Sunil Kumar Nagaraju
2023-10-26 09:26:07 UTC
marking it as regression, as it worked in 18.2.0-70, http://magna002.ceph.redhat.com/cephci-jenkins/test-runs/18.2.0-70/Sanity/69/tier-0_nvmeof_sanity/Manage_nvmeof_gateway_entities_0.log >>> >>>[root@ceph-1sunilkumar-0qegc7-node6 ~]# podman run registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof-cli:0.0.4-1 --server-address 10.0.207.37 --server-port 5500 get_subsystems >>>INFO:__main__:Get subsystems: >>>[ >>> { >>> "nqn": "nqn.2014-08.org.nvmexpress.discovery", >>> "subtype": "Discovery", >>> "listen_addresses": [], >>> "allow_any_host": true, >>> "hosts": [] >>> } >>>] >>> >>>[ceph: root@ceph-1sunilkumar-0qegc7-node1-installer /]# rados -p rbd listomapvals nvmeof.None.state >>>bdev_bdev1 >>>value (153 bytes) : >>>00000000 7b 0a 20 20 22 62 64 65 76 5f 6e 61 6d 65 22 3a |{. "bdev_name":| >>>00000010 20 22 62 64 65 76 31 22 2c 0a 20 20 22 72 62 64 | "bdev1",. "rbd| >>>00000020 5f 70 6f 6f 6c 5f 6e 61 6d 65 22 3a 20 22 72 62 |_pool_name": "rb| >>>00000030 64 22 2c 0a 20 20 22 72 62 64 5f 69 6d 61 67 65 |d",. "rbd_image| >>>00000040 5f 6e 61 6d 65 22 3a 20 22 69 6d 61 67 65 31 22 |_name": "image1"| >>>00000050 2c 0a 20 20 22 62 6c 6f 63 6b 5f 73 69 7a 65 22 |,. "block_size"| >>>00000060 3a 20 35 31 32 2c 0a 20 20 22 75 75 69 64 22 3a |: 512,. "uuid":| >>>00000070 20 22 30 62 62 65 61 65 37 39 2d 62 30 63 36 2d | "0bbeae79-b0c6-| >>>00000080 34 63 35 39 2d 38 39 66 62 2d 31 66 33 32 32 37 |4c59-89fb-1f3227| >>>00000090 61 65 33 61 65 39 22 0a 7d |ae3ae9".}| >>>00000099 >>> >>>omap_version >>>value (2 bytes) : >>>00000000 31 30 |10| >>>00000002 >>> >>> This seems like a problem with some old code which is no longer there. It was re-written in PR 270. Looking at the log I see the get_subsystems returned: [{'nqn': 'nqn.2014-08.org.nvmexpress.discovery', 'subtype': 'Discovery', 'listen_addresses': [], 'allow_any_host': True, 'hosts': []}] There is no "namespaces" section which caused a KeyError exception in the code when we tried iterating through the namespaces. We can see that the subsystem we have here is a discovery one which we no longer show. I'll try to go back to commit 5b936c613571209c5d28b920eaccb82abff6ac7c, the one before we deleted the discovery subsystem from the get_subsystems output and see if I can reproduce the issue. I made sure that the current code no longer has this issue both when "enable_spdk_discovery_controller" is True or False. Sunil this is fixed in 0.0.5. Fixed in 0.0.5. Please verify. Issue still exists with newer version of ceph and nvmeof images. 023-11-07 13:20:12,015 (cephci.test_nvme_cli) [INFO] - cephci.ceph.ceph.py:1591 - Command completed successfully 2023-11-07 13:20:12,016 (cephci.test_nvme_cli) [DEBUG] - cephci.ceph.nvmeof.nvmeof_gwcli.py:54 - ('', 'INFO:__main__:Deleted subsystem nqn.2016-06.io.spdk:cnode1: True\n') 2023-11-07 13:20:12,018 (cephci.test_nvme_cli) [INFO] - cephci.ceph.ceph.py:1557 - Running command podman run registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof-cli:0.0.5-1 --server-address 10.0.211.237 --server-port 5500 delete_bdev --bdev bdev1 on 10.0.211.237 timeout 600 2023-11-07 13:20:13,705 (cephci.test_nvme_cli) [ERROR] - cephci.ceph.ceph.py:1593 - Error 2 during cmd, timeout 600 2023-11-07 13:20:13,707 (cephci.test_nvme_cli) [ERROR] - cephci.ceph.ceph.py:1594 - usage: python3 -m control.cli [-h] [--server-address SERVER_ADDRESS] [--server-port SERVER_PORT] [--client-key CLIENT_KEY] [--client-cert CLIENT_CERT] [--server-cert SERVER_CERT] {create_bdev,delete_bdev,create_subsystem,delete_subsystem,add_namespace,remove_namespace,add_host,remove_host,create_listener,delete_listener,get_subsystems} ... python3 -m control.cli: error: delete_bdev failed: code=StatusCode.UNKNOWN message=Exception calling application: 'namespaces' 2023-11-07 13:20:13,709 (cephci.test_nvme_cli) [ERROR] - cephci.tests.nvmeof.test_nvme_cli.py:100 - podman run registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof-cli:0.0.5-1 --server-address 10.0.211.237 --server-port 5500 delete_bdev --bdev bdev1 Error: usage: python3 -m control.cli [-h] [--server-address SERVER_ADDRESS] [--server-port SERVER_PORT] [--client-key CLIENT_KEY] [--client-cert CLIENT_CERT] [--server-cert SERVER_CERT] {create_bdev,delete_bdev,create_subsystem,delete_subsystem,add_namespace,remove_namespace,add_host,remove_host,create_listener,delete_listener,get_subsystems} ... python3 -m control.cli: error: delete_bdev failed: code=StatusCode.UNKNOWN message=Exception calling application: 'namespaces' 10.0.211.237 Traceback (most recent call last): File "/home/sunilkumar/workspace/cephci/tests/nvmeof/test_nvme_cli.py", line 98, in run func(**cfg["args"]) File "/home/sunilkumar/workspace/cephci/ceph/nvmeof/nvmeof_gwcli.py", line 71, in delete_block_device return self.run_control_cli("delete_bdev", **args) File "/home/sunilkumar/workspace/cephci/ceph/nvmeof/nvmeof_gwcli.py", line 50, in run_control_cli out = self.node.exec_command( File "/home/sunilkumar/workspace/cephci/ceph/ceph.py", line 1595, in exec_command raise CommandFailed( ceph.ceph.CommandFailed: podman run registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof-cli:0.0.5-1 --server-address 10.0.211.237 --server-port 5500 delete_bdev --bdev bdev1 Error: usage: python3 -m control.cli [-h] [--server-address SERVER_ADDRESS] [--server-port SERVER_PORT] [--client-key CLIENT_KEY] [--client-cert CLIENT_CERT] [--server-cert SERVER_CERT] {create_bdev,delete_bdev,create_subsystem,delete_subsystem,add_namespace,remove_namespace,add_host,remove_host,create_listener,delete_listener,get_subsystems} ... python3 -m control.cli: error: delete_bdev failed: code=StatusCode.UNKNOWN message=Exception calling application: 'namespaces' [ceph: root@ceph-1sunilkumar-4q4o0k-node1-installer /]# ceph version ceph version 18.2.0-117.el9cp (7e71aaeb77dd63a7bf8cc3f39dd69b7d151298b0) reef (stable) [ceph: root@ceph-1sunilkumar-4q4o0k-node1-installer /]# ceph config dump | grep nvme mgr advanced mgr/cephadm/container_image_nvmeof registry-proxy.engineering.redhat.com/rh-osbs/ceph-nvmeof:0.0.5-1 (In reply to Sunil Kumar Nagaraju from comment #13) This code was removed. You still use an old version. Can you send us the contents of the log you see when you starts the system? We should see the exact version there. @sunnagar looking at the history I see that the version was changed to 0.0.5 on 18-Oct but the change for PR #270, which should fix this issue, was done on 19-Oct. So, it's not enough to use version 0.0.5. It should be one which includes the PR #270 fix. As I said above, the log file should have the exact version of the files used. Not only the 0.0.5 version but also the exact changes which are included in that code. Hi Gil, As we discussed, let the BZ be in assigned state until PR gets merged and verifyable in downstream builds. -Thanks Sunil It is looking like we may have missed the window for merging this PR and still getting it into 7.0. Is even a blocker for 7.0? It should not be a blocker because this is a TP. We should fix it in 7.0z1. But we should include it in the release notes of 7.0. Who is taking care? And again, moving to the next zstream, 7.0 z3 |