Bug 2280636 - cephfs_mirror: fix crash in update_fs_mirrors()
Summary: cephfs_mirror: fix crash in update_fs_mirrors()
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: CephFS
Version: 6.1
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
: 6.1z7
Assignee: Jos Collin
QA Contact: Hemanth Kumar
URL:
Whiteboard:
Depends On:
Blocks: 2280662 2280665
TreeView+ depends on / blocked
 
Reported: 2024-05-15 13:00 UTC by Jos Collin
Modified: 2024-08-28 17:58 UTC (History)
8 users (show)

Fixed In Version: ceph-17.2.6-234
Doc Type: Bug Fix
Doc Text:
.Checks are now performed on ‘m_instance_watcher’ and ‘m_mirror_watcher’ with a new patch Previously, `FSMirror::is_failed()` function used null `m_instance_watcher` and `m_mirror_watcher` pointers to call the member function `InstanceWatcher::is_failed()` and `MirrorWatcher::is_failed()` respectively. With this fix, the patch checks `m_instance_watcher` and `m_mirror_watcher` before usage.
Clone Of:
: 2280662 2280665 (view as bug list)
Environment:
Last Closed: 2024-08-28 17:58:23 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Ceph Project Bug Tracker 65991 0 None None None 2024-05-15 15:41:01 UTC
Red Hat Issue Tracker RHCEPH-9021 0 None None None 2024-05-15 13:02:21 UTC
Red Hat Product Errata RHBA-2024:5960 0 None None None 2024-08-28 17:58:34 UTC

Description Jos Collin 2024-05-15 13:00:15 UTC
Description of problem:
cephfs_mirror: fix crash in update_fs_mirrors(), when calling FSMirror::is_failed()

Version-Release number of selected component (if applicable):
RHCS6

How reproducible:
This cannot be reproduced manually. You need to run the tests in test_mirroring.py and it crashes, even after all tests are passed. The ceph-client.mirror logs show the bt:

     -16> 2023-12-28T17:51:26.774+0000 7fdc39f63700 10 monclient: tick
   -15> 2023-12-28T17:51:26.900+0000 7fdc3d76a700 10 cephfs::mirror::Utils connect: connected to cluster=ceph using client=client.mirror
   -14> 2023-12-28T17:51:26.901+0000 7fdc3d76a700 20 cephfs::mirror::Utils mount: filesystem={fscid=56, fs_name=cephfs}
   -13> 2023-12-28T17:51:26.908+0000 7fdc3d76a700 10 cephfs::mirror::Utils mount: mounted filesystem={fscid=56, fs_name=cephfs}
   -12> 2023-12-28T17:51:26.908+0000 7fdc3d76a700 10 cephfs::mirror::FSMirror init: rados addrs=172.21.15.115:0/449872678
   -11> 2023-12-28T17:51:26.908+0000 7fdc3d76a700 20 cephfs::mirror::FSMirror init_instance_watcher
   -10> 2023-12-28T17:51:26.908+0000 7fdc3d76a700 20 cephfs::mirror::InstanceWatcher init
    -9> 2023-12-28T17:51:26.908+0000 7fdc3d76a700 20 cephfs::mirror::InstanceWatcher create_instance
    -8> 2023-12-28T17:51:26.908+0000 7fdc3d76a700 20 cephfs::mirror::Mirror handle_enable_mirroring: filesystem={fscid=54, fs_name=cephfs}, peers=, r=-2
    -7> 2023-12-28T17:51:26.908+0000 7fdc3ff6f700 -1 asok(0x5651f2796000) AdminSocket: error writing response length (32) Broken pipe
    -6> 2023-12-28T17:51:26.910+0000 7fdc31752700 20 cephfs::mirror::InstanceWatcher handle_create_instance: r=0
    -5> 2023-12-28T17:51:26.910+0000 7fdc31752700 20 cephfs::mirror::InstanceWatcher register_watcher
    -4> 2023-12-28T17:51:26.910+0000 7fdc31752700 20 cephfs::mirror::Watcher register_watch
    -3> 2023-12-28T17:51:26.911+0000 7fdc31f53700 20 cephfs::mirror::Watcher handle_register_watch: r=0
    -2> 2023-12-28T17:51:26.911+0000 7fdc31f53700 20 cephfs::mirror::InstanceWatcher handle_register_watcher: r=0
    -1> 2023-12-28T17:51:26.911+0000 7fdc31f53700 20 cephfs::mirror::FSMirror handle_init_instance_watcher: r=0
     0> 2023-12-28T17:51:26.912+0000 7fdc3cf69700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fdc3cf69700 thread_name:safe_timer

 ceph version 16.2.14-417-gc5564c79 (c5564c7988cbaadc3382253af5843a8595347c2d) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12ce0) [0x7fdc4476ace0]
 2: __pthread_mutex_lock()
 3: (std::mutex::lock()+0x17) [0x5651f0710357]
 4: (cephfs::mirror::Mirror::update_fs_mirrors()+0x827) [0x5651f070e7d7]
 5: (Context::complete(int)+0xd) [0x5651f070f6ed]
 6: (CommonSafeTimer<std::mutex>::timer_thread()+0x10f) [0x7fdc4563d65f]
 7: (CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x7fdc4563e9f1]
 8: /lib64/libpthread.so.0(+0x81cf) [0x7fdc447601cf]
 9: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Steps to Reproduce:
1.
2.
3.

Actual results:
The ceph-client.mirror logs show the above bt.

Expected results:


Additional info:

Comment 6 errata-xmlrpc 2024-08-28 17:58:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.1 security, bug fix, and enhancement updates.), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:5960


Note You need to log in before you can comment on or make changes to this bug.