Bug 2034283

Summary: [Tracker for Ceph BZ #2056637] IO to an RBD image, that is mirroring enabled, reports several errors and quits when a mirroring snapshot is taken
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Shyamsundar <srangana>
Component: cephAssignee: Ilya Dryomov <idryomov>
ceph sub component: RBD QA Contact: Elad <ebenahar>
Status: ASSIGNED --- Docs Contact:
Severity: high    
Priority: unspecified CC: bniver, idryomov, jdurgin, kramdoss, mmuench, muagarwa, odf-bz-bot, pnataraj
Version: 4.9Keywords: Tracking
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2056637 (view as bug list) Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2056637    
Attachments:
Description Flags
IO error log and scripts none

Description Shyamsundar 2021-12-20 15:06:40 UTC
Created attachment 1847075 [details]
IO error log and scripts

Description of problem (please be detailed as possible and provide log
snippests):

A test executed as in the script rbd-mirror-io-test.sh (attached) where RBD mirroring is setup to mirror in a 1 minute schedule across 2 ceph clusters shows the following behaviour:
- Post the script (rbd-mirror-io-test.sh) is executed, start IO on the image against the primary instance using the following command:
  - rbd bench --io-type write --io-size 4K --io-threads 2 --io-total 10M --io-pattern rand replicapool/test-demote-sb-1
- In parallel watch in a loop for rbd snapshots being created for the image under test:
  - while true; do date -u; rbd snap ls replicapool/test-demote-sb-1 --all; echo "============================="; sleep 5; done
- When a snapshot schedule is triggered (at the minute) the following observations are made on the terminal performing IO on the image:
  - IO starts slowing down drastically
  - IO errors out with logs as follows (also attached rbd-io-error.log)

Log snippet:
2021-12-20T14:08:04.660+0000 7f4ccd8c3700 -1 librbd::ImageWatcher: 0x7f4cb40078e0 image watch failed: 139967279926704, (107) Transport endpoint is not connected
2021-12-20T14:08:04.660+0000 7f4ccd8c3700 -1 librbd::Watcher: 0x7f4cb40078e0 handle_error: handle=139967279926704: (107) Transport endpoint is not connected
   30      1460   11.5875    46 KiB/s
   31      1468    10.425    42 KiB/s
2021-12-20T14:08:05.999+0000 7f4ccd8c3700 -1 librbd::io::ObjectRequest: 0x7f4cac007150 handle_write_object: rbd_data.1705844598362.000000000000024a failed to write object: (108) Cannot send after transport endpoint shutdown
2021-12-20T14:08:05.999+0000 7f4ccd8c3700 -1 librbd::cache::WriteAroundObjectDispatch: 0x7f4cac00aea0 handle_in_flight_io_complete: IO error encountered: tid=3: (108) Cannot send after transport endpoint shutdown
2021-12-20T14:08:06.024+0000 7f4ccd8c3700 -1 librbd::io::ObjectRequest: 0x7f4cc400abe0 handle_write_object: rbd_data.1705844598362.000000000000413a failed to write object: (108) Cannot send after transport endpoint shutdown
2021-12-20T14:08:06.024+0000 7f4ccd8c3700 -1 librbd::cache::WriteAroundObjectDispatch: 0x7f4cac00aea0 handle_in_flight_io_complete: IO error encountered: tid=4: (108) Cannot send after transport endpoint shutdown
2021-12-20T14:08:06.029+0000 7f4ccd8c3700 -1 librbd::io::ObjectRequest: 0x7f4cb401aa10 handle_write_object: rbd_data.1705844598362.0000000000003ad5 failed to write object: (108) Cannot send after transport endpoint shutdown
2021-12-20T14:08:06.029+0000 7f4ccd8c3700 -1 librbd::cache::WriteAroundObjectDispatch: 0x7f4cac00aea0 handle_in_flight_io_complete: IO error encountered: tid=6: (108) Cannot send after transport endpoint shutdown
2021-12-20T14:08:06.042+0000 7f4ccd8c3700 -1 librbd::io::ObjectRequest: 0x7f4cb401f9e0 handle_write_object: rbd_data.1705844598362.0000000000001915 failed to write object: (108) Cannot send after transport endpoint shutdown
2021-12-20T14:08:06.042+0000 7f4ccd8c3700 -1 librbd::cache::WriteAroundObjectDispatch: 0x7f4cac00aea0 handle_in_flight_io_complete: IO error encountered: tid=7: (108) Cannot send after transport endpoint shutdown

Further observations are as follows:
  - IO errors out even if secondary cluster is shutdown (i.e image replay is not in flight)
  - IO slowness ~65Kib/s is observed only when mirroring is in play, this does not happen for a regular image for which a snapshot is taken at the time of IO test as above


Problems to look at:
  - IO errors
  - IO slowness when image is mirroring enabled

Version of all relevant components (if applicable):
  - ceph version 16.2.6 (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable)

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

- If IO errors out when mirroring is in effect this would impact clients using the image
- IO rate slowness is quite drastic, and would impact client access of the image

Is there any workaround available to the best of your knowledge?
  - No


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
  - 2

Can this issue reproducible?
  - Yes


Can this issue reproduce from the UI?
  - NA


If this is a regression, please provide more details to justify this:
  - No

Steps to Reproduce:
  - As above

Comment 2 Shyamsundar 2021-12-20 21:56:35 UTC
Further observations:
- When IO is interrupted the client performing the IO is blocklisted
- This was determined by watching the blocklist entries against the OSD when IO was ongoing and a mirror snapshot schedule got triggered
- The output looks as follows:
while true; do date -u; rbd snap ls replicapool/test-demote-sb-1 --all; ceph osd blocklist ls; echo "============================="; sleep 5; done
...
Mon Dec 20 21:50:07 UTC 2021
SNAPID  NAME                                                                                       SIZE     PROTECTED  TIMESTAMP                 NAMESPACE                                                         
 11621  .mirror.primary.a2ef2d48-0cbb-4e05-8d29-d93706f42a22.6c67935b-5f0a-4490-ac65-55bca3ab24dc  100 GiB             Mon Dec 20 14:44:01 2021  mirror (primary peer_uuids:[49a8473a-9cfd-4797-8266-443eba34dfdf])
 11630  .mirror.primary.a2ef2d48-0cbb-4e05-8d29-d93706f42a22.6687c762-78d3-4e75-ad75-a88de4ea513b  100 GiB             Mon Dec 20 14:45:39 2021  mirror (primary peer_uuids:[49a8473a-9cfd-4797-8266-443eba34dfdf])
 13319  .mirror.primary.a2ef2d48-0cbb-4e05-8d29-d93706f42a22.e51f1b07-d2cb-4baa-9442-3914e5f5e989  100 GiB             Mon Dec 20 21:49:01 2021  mirror (primary peer_uuids:[49a8473a-9cfd-4797-8266-443eba34dfdf])
listed 0 entries
=============================
Mon Dec 20 21:50:14 UTC 2021
SNAPID  NAME                                                                                       SIZE     PROTECTED  TIMESTAMP                 NAMESPACE                                                         
 11621  .mirror.primary.a2ef2d48-0cbb-4e05-8d29-d93706f42a22.6c67935b-5f0a-4490-ac65-55bca3ab24dc  100 GiB             Mon Dec 20 14:44:01 2021  mirror (primary peer_uuids:[49a8473a-9cfd-4797-8266-443eba34dfdf])
 11630  .mirror.primary.a2ef2d48-0cbb-4e05-8d29-d93706f42a22.6687c762-78d3-4e75-ad75-a88de4ea513b  100 GiB             Mon Dec 20 14:45:39 2021  mirror (primary peer_uuids:[49a8473a-9cfd-4797-8266-443eba34dfdf])
 13319  .mirror.primary.a2ef2d48-0cbb-4e05-8d29-d93706f42a22.e51f1b07-d2cb-4baa-9442-3914e5f5e989  100 GiB             Mon Dec 20 21:49:01 2021  mirror (primary peer_uuids:[49a8473a-9cfd-4797-8266-443eba34dfdf])
172.17.0.2:0/4264307978 2021-12-20T22:50:12.108020+0000
listed 1 entries
=============================

NOTE: Post IO errors, the blocklist starts listing the client that was performing the IO

As image is fast-diff (and hence exclusive lock) enabled, it is possibly the client not giving up the exclusive lock that causes the blocklist by the mirror process and results in the IO errors.

Comment 13 Deepika Upadhyay 2022-01-19 07:34:22 UTC
Thanks Shyam for providing dev environment, I was able to reproduce this issue.

On debugging, I found the issue is:

Since we are using list_snaps in parallel to performing i/o's, both these
process use rados_api.execute boost::asio process with same object, this makes the
the list_snaps operation blocking for running i/o's. 

This should not be a blocking issue as we are not expecting user to generally be
constantly be monitoring snapshot listing `snap ls`.

The issue is not observed when we do not have a monitoring shell `snap ls`
running in parallel to I/O's.

Comment 14 Mudit Agarwal 2022-02-14 15:24:19 UTC
Deepika, what is the plan for this BZ. Are we going to fix it? 
If yes, then do you want to open a clone Ceph BZ.

Comment 15 Mudit Agarwal 2022-03-08 13:21:05 UTC
Not a 4.10 blocker, moving DR BZs to 4.10.z

Comment 16 Mudit Agarwal 2022-06-07 11:25:30 UTC
Ceph BZ is moved to 6.2