Bug 2210716 - [rbd_support] empty error message for `ceph rbd task add flatten` when client is blocklisted
Summary: [rbd_support] empty error message for `ceph rbd task add flatten` when client...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: RBD
Version: 6.1
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: 6.1z1
Assignee: Ram Raja
QA Contact: Sunil Angadi
Akash Raj
URL:
Whiteboard:
Depends On:
Blocks: 2221020
TreeView+ depends on / blocked
 
Reported: 2023-05-29 09:15 UTC by Vasishta
Modified: 2023-08-03 16:45 UTC (History)
11 users (show)

Fixed In Version: ceph-17.2.6-84.el9cp
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-03 16:45:09 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 61688 0 None None None 2023-06-14 21:01:02 UTC
Red Hat Issue Tracker RHCEPH-6757 0 None None None 2023-05-29 09:18:26 UTC
Red Hat Product Errata RHBA-2023:4473 0 None None None 2023-08-03 16:45:54 UTC

Description Vasishta 2023-05-29 09:15:59 UTC
Description of problem:
While testing out a scenario where rbd_support client is blocklisted, the error message for ceph rbd task add flatten was empty.

Version-Release number of selected component (if applicable):
17.2.6-50.el9cp

How reproducible:
Many times in a scenario

Steps to Reproduce:
1. Blocklist rbd_support client
2. Tried to add a ceph task to flatten a client

Actual results:
# ceph rbd task add flatten  mirror_pool/r_clone
Error EAGAIN:
#

Expected results:
User understandable error message

Additional info:
- Created snapshot mirror_pool/r@snap and clone  mirror_pool/r_clone
- block listed rbd_suport client

# CLIENT_ADDR=$(ceph mgr dump | jq .active_clients[] |
        jq 'select(.name == "rbd_support")' |
        jq -r '[.addrvec[0].addr, "/", .addrvec[0].nonce|tostring] | add')
    ceph osd blocklist add $CLIENT_ADDR
    ceph osd blocklist ls | grep $CLIENT_ADDR
blocklisting 1x.x.1xx.2x:0/3400216612 until 2023-05-29T08:48:03.770899+0000 (3600 sec)
listed 16 entries
1x.x.1xx.2x:0/3400216612 2023-05-29T08:48:03.770899+0000

- Tried adding ceph task to flatten rbd clone few times before rbd_support client recovers
# date;ceph rbd task add flatten  mirror_pool/r_clone
Mon May 29 07:48:24 AM UTC 2023
Error EAGAIN:
# date;ceph rbd task add flatten  mirror_pool/r_clone
Mon May 29 07:48:36 AM UTC 2023
Error EAGAIN:

- Finally after when rbd_support gets recovered
# date;ceph rbd task add flatten  mirror_pool/r_clone
Mon May 29 07:49:02 AM UTC 2023
{"sequence": 1, "id": "0b0bff02-440a-4a64-a449-708275a7894b", "message": "Flattening image mirror_pool/r_clone", "refs": {"action": "flatten", "pool_name": "mirror_pool", "pool_namespace": "", "image_name": "r_clone", "image_id": "607c134e4c879"}}

Comment 8 Sunil Angadi 2023-07-20 13:39:10 UTC
Tested using
ceph version 17.2.6-98.el9cp (b53d9dfff6b1021fbab8e28f2b873d7d49cf5665) quincy (stable)

Created pool, image and snaphot of that image
Cloned the snapshot to new image

[ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# rbd snap ls rep_pool_jDNsGQuutX/rep_image_KJXaGSnDfq
SNAPID  NAME   SIZE   PROTECTED  TIMESTAMP
     5  snap1  1 GiB  yes        Thu Jul 20 12:48:46 2023
     6  snap2  1 GiB             Thu Jul 20 13:09:39 2023

Protected the snapshot before cloning to child image
[ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# rbd clone rep_pool_jDNsGQuutX/rep_image_KJXaGSnDfq@snap2 pool2/image4
2023-07-20T13:10:58.488+0000 7f88b9c5d640 -1 librbd::image::CloneRequest: 0x5561a89d1a40 validate_parent: parent snapshot must be protected

rbd: clone error: (22) Invalid argument
[ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# rbd snap protect rep_pool_jDNsGQuutX/rep_image_KJXaGSnDfq@snap2
[ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]#
[ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# rbd clone rep_pool_jDNsGQuutX/rep_image_KJXaGSnDfq@snap2 pool2/image4
[ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]#
[ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# rbd du pool2/image4
NAME    PROVISIONED  USED
image4        1 GiB   0 B

Performed client block list using this script http://pastebin.test.redhat.com/1101663


[root@ceph-rbd1-s-r-avxinc-node2 cephuser]# ./block_client.sh
Blocking 10.0.211.92:0/3556506075
blocklisting 10.0.211.92:0/3556506075 until 2023-07-20T14:12:53.762610+0000 (3600 sec)
listed 3 entries
Confirmed 10.0.211.92:0/3556506075 got blocklisted
**************************************************
Blocking 10.0.211.92:0/3556506075
blocklisting 10.0.211.92:0/3556506075 until 2023-07-20T14:13:00.631901+0000 (3600 sec)
listed 3 entries
Confirmed 10.0.211.92:0/3556506075 got blocklisted
**************************************************


During this ongoing script execution when i try to add a task to flatten the child image,
It throws below error

[ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# ceph rbd task add flatten pool2/image4
Error EAGAIN: [errno 108] RBD connection was shutdown (error opening image b'image4' at snapshot None)

@Rama Raja, Is this the expected error message added for it?

Can you please suggest the steps to see ?
Error message added in the PR
https://github.com/ceph/ceph/pull/52064/files
"rbd_support module is not ready, try again"


Just NOTE: after stopping the script able to add a task to flatten the child image

[ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# ceph rbd task add flatten pool2/image4
{"sequence": 1, "id": "7865a8f9-44b2-4c0a-9e5e-f07b84a83e8b", "message": "Flattening image pool2/image4", "refs": {"action": "flatten", "pool_name": "pool2", "pool_namespace": "", "image_name": "image4", "image_id": "3bac5a603593"}}

Comment 9 Ram Raja 2023-07-20 23:23:10 UTC
(In reply to Sunil Angadi from comment #8)
> Tested using
> ceph version 17.2.6-98.el9cp (b53d9dfff6b1021fbab8e28f2b873d7d49cf5665)
> quincy (stable)
> 
> Created pool, image and snaphot of that image
> Cloned the snapshot to new image
> 
> [ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# rbd snap ls
> rep_pool_jDNsGQuutX/rep_image_KJXaGSnDfq
> SNAPID  NAME   SIZE   PROTECTED  TIMESTAMP
>      5  snap1  1 GiB  yes        Thu Jul 20 12:48:46 2023
>      6  snap2  1 GiB             Thu Jul 20 13:09:39 2023
> 
> Protected the snapshot before cloning to child image
> [ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# rbd clone
> rep_pool_jDNsGQuutX/rep_image_KJXaGSnDfq@snap2 pool2/image4
> 2023-07-20T13:10:58.488+0000 7f88b9c5d640 -1 librbd::image::CloneRequest:
> 0x5561a89d1a40 validate_parent: parent snapshot must be protected
> 
> rbd: clone error: (22) Invalid argument
> [ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# rbd snap protect
> rep_pool_jDNsGQuutX/rep_image_KJXaGSnDfq@snap2
> [ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]#
> [ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# rbd clone
> rep_pool_jDNsGQuutX/rep_image_KJXaGSnDfq@snap2 pool2/image4
> [ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]#
> [ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# rbd du pool2/image4
> NAME    PROVISIONED  USED
> image4        1 GiB   0 B
> 
> Performed client block list using this script
> http://pastebin.test.redhat.com/1101663
> 
> 
> [root@ceph-rbd1-s-r-avxinc-node2 cephuser]# ./block_client.sh
> Blocking 10.0.211.92:0/3556506075
> blocklisting 10.0.211.92:0/3556506075 until 2023-07-20T14:12:53.762610+0000
> (3600 sec)
> listed 3 entries
> Confirmed 10.0.211.92:0/3556506075 got blocklisted
> **************************************************
> Blocking 10.0.211.92:0/3556506075
> blocklisting 10.0.211.92:0/3556506075 until 2023-07-20T14:13:00.631901+0000
> (3600 sec)
> listed 3 entries
> Confirmed 10.0.211.92:0/3556506075 got blocklisted
> **************************************************
> 
> 
> During this ongoing script execution when i try to add a task to flatten the
> child image,
> It throws below error
> 
> [ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# ceph rbd task add
> flatten pool2/image4
> Error EAGAIN: [errno 108] RBD connection was shutdown (error opening image
> b'image4' at snapshot None)
> 
> @Ram Raja, Is this the expected error message added for it?
> 
> Can you please suggest the steps to see ?
> Error message added in the PR
> https://github.com/ceph/ceph/pull/52064/files
> "rbd_support module is not ready, try again"
> 

You won't always see the above message when you execute a `rbd_support` module command after the module's client is blocklisted. Seeing "Error EAGAIN: [errno 108] RBD connection was shutdown (error opening image b'image4' at snapshot None)" is also expected behavior. The error message you see depends on when exactly the blocklist is detected by the rbd_support module and its handlers.

Instead of running the script, I suggest that you manually blocklist the `rbd_support` module's client as mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=2210716#c0. And immediately after that, execute the `ceph rbd task flatten image` several times in quick session or in a loop. You will eventually see this stderr output "rbd_support module is not ready, try again" as the rbd_support module tries to recover from blocklisting.

> 
> Just NOTE: after stopping the script able to add a task to flatten the child
> image
> 
> [ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# ceph rbd task add
> flatten pool2/image4
> {"sequence": 1, "id": "7865a8f9-44b2-4c0a-9e5e-f07b84a83e8b", "message":
> "Flattening image pool2/image4", "refs": {"action": "flatten", "pool_name":
> "pool2", "pool_namespace": "", "image_name": "image4", "image_id":
> "3bac5a603593"}}

This is good. This means that the rbd_support module was able to recover from blocklisting.

Comment 10 Sunil Angadi 2023-07-21 06:50:03 UTC
Thanks for the info Ram,

Able to get Error message as per the PR

[ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# ceph rbd task add flatten pool2/image5
Error EAGAIN: rbd_support module is not ready, try again

also this error message was also expected
[ceph: root@ceph-rbd1-s-r-avxinc-node1-installer /]# ceph rbd task add flatten pool2/image4
Error EAGAIN: [errno 108] RBD connection was shutdown (error opening image b'image4' at snapshot None)

Fix working as expected,

Verified using
ceph version 17.2.6-98.el9cp (b53d9dfff6b1021fbab8e28f2b873d7d49cf5665)

Comment 12 errata-xmlrpc 2023-08-03 16:45:09 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.1 Bug Fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:4473


Note You need to log in before you can comment on or make changes to this bug.