Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
This project is now read‑only. Starting Monday, February 2, please use Jira Cloud for all bug tracking management.

Bug 2069720

Summary: [DR] rbd_support: a schedule may get lost due to load vs add race
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Scott Ostapovicz <sostapov>
Component: RBD-MirrorAssignee: Ilya Dryomov <idryomov>
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: urgent Docs Contact: Akash Raj <akraj>
Priority: unspecified    
Version: 5.1CC: akraj, asriram, bniver, ceph-eng-bugs, choffman, idryomov, jdurgin, kramdoss, kseeger, madam, mmuench, muagarwa, ocs-bugs, prsurve, srangana, tserlin, vereddy
Target Milestone: ---   
Target Release: 5.2   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ceph-16.2.8-52.el8cp Doc Type: Bug Fix
Doc Text:
.Snapshot-based mirroring process no longer gets cancelled Previously, as a result of an internal race condition, the `rbd mirror snapshot schedule add` command would be cancelled out. The snapshot-based mirroring process for the affected image would not start, if no other existing schedules were applicable. With this release, the race condition is fixed and the snapshot-based mirroring process starts as expected.
Story Points: ---
Clone Of: 2067095
: 2099799 (view as bug list) Environment:
Last Closed: 2022-08-09 17:37:39 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2067095, 2102272    

Comment 1 Josh Durgin 2022-03-29 15:24:14 UTC
Chris, can you take a look? It seems there are a number of rbd-mirror crashes with this backtrace:

    "assert_msg": "/builddir/build/BUILD/ceph-16.2.7/src/librbd/ImageWatcher.cc: In function 'void librbd::ImageWatcher<ImageCtxT>::schedule_request_lock(bool, int) [with ImageCtxT = librbd::ImageCtx]' thread 7f6ccc123700 time 2022-03-26T15:39:31.399999+0000\n/builddir/build/BUILD/ceph-16.2.7/src/librbd/ImageWatcher.cc: 580: FAILED ceph_assert(m_image_ctx.exclusive_lock && !m_image_ctx.exclusive_lock->is_lock_owner())\n",
    "assert_thread_name": "io_context_pool",
    "backtrace": [
        "/lib64/libpthread.so.0(+0x12c20) [0x7f6ce068ac20]",
        "gsignal()",
        "abort()",
        "(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a9) [0x7f6ce124ad4f]",
        "/usr/lib64/ceph/libceph-common.so.2(+0x276f18) [0x7f6ce124af18]",
        "(librbd::ImageWatcher<librbd::ImageCtx>::schedule_request_lock(bool, int)+0x3b6) [0x5617d288a596]",
        "(librbd::ImageWatcher<librbd::ImageCtx>::handle_request_lock(int)+0x486) [0x5617d288aae6]",
        "(librbd::image_watcher::NotifyLockOwner::finish(int)+0x2b) [0x5617d2a0f25b]",
        "(librbd::image_watcher::NotifyLockOwner::handle_notify(int)+0x9e4) [0x5617d2a10014]",
        "(Context::complete(int)+0xd) [0x5617d26e080d]",
        "(boost::asio::detail::completion_handler<boost::asio::detail::work_dispatcher<librbd::asio::ContextWQ::queue(Context*, int)::{lambda()#1}> >::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long)+0x66) [0x5617d26e0ca6]",
        "(boost::asio::detail::strand_service::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long)+0x85) [0x5617d2854435]",
        "/lib64/librados.so.2(+0xc12e2) [0x7f6cea87e2e2]",
        "/lib64/librados.so.2(+0xc6cea) [0x7f6cea883cea]",
        "/lib64/libstdc++.so.6(+0xc2ba3) [0x7f6cdf499ba3]",
        "/lib64/libpthread.so.0(+0x817a) [0x7f6ce068017a]",
        "clone()"
    ],

Comment 8 Scott Ostapovicz 2022-05-06 20:28:46 UTC
Done

Comment 15 Gopi 2022-07-01 04:33:29 UTC
Working as expected with latest build. Hence moving to verified state.

Comment 20 errata-xmlrpc 2022-08-09 17:37:39 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat Ceph Storage Security, Bug Fix, and Enhancement Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5997