Bug 2280636

Summary: cephfs_mirror: fix crash in update_fs_mirrors()
Product: [Red Hat Storage] Red Hat Ceph Storage Reporter: Jos Collin <jcollin>
Component: CephFSAssignee: Jos Collin <jcollin>
Status: CLOSED ERRATA QA Contact: Hemanth Kumar <hyelloji>
Severity: medium Docs Contact:
Priority: high    
Version: 6.1CC: ceph-eng-bugs, cephqe-warriors, gfarnum, hyelloji, jcaratza, rpollack, tserlin, vshankar
Target Milestone: ---   
Target Release: 6.1z7   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: ceph-17.2.6-234 Doc Type: Bug Fix
Doc Text:
.Checks are now performed on ‘m_instance_watcher’ and ‘m_mirror_watcher’ with a new patch Previously, `FSMirror::is_failed()` function used null `m_instance_watcher` and `m_mirror_watcher` pointers to call the member function `InstanceWatcher::is_failed()` and `MirrorWatcher::is_failed()` respectively. With this fix, the patch checks `m_instance_watcher` and `m_mirror_watcher` before usage.
Story Points: ---
Clone Of:
: 2280662 2280665 (view as bug list) Environment:
Last Closed: 2024-08-28 17:58:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2280662, 2280665    

Description Jos Collin 2024-05-15 13:00:15 UTC
Description of problem:
cephfs_mirror: fix crash in update_fs_mirrors(), when calling FSMirror::is_failed()

Version-Release number of selected component (if applicable):
RHCS6

How reproducible:
This cannot be reproduced manually. You need to run the tests in test_mirroring.py and it crashes, even after all tests are passed. The ceph-client.mirror logs show the bt:

     -16> 2023-12-28T17:51:26.774+0000 7fdc39f63700 10 monclient: tick
   -15> 2023-12-28T17:51:26.900+0000 7fdc3d76a700 10 cephfs::mirror::Utils connect: connected to cluster=ceph using client=client.mirror
   -14> 2023-12-28T17:51:26.901+0000 7fdc3d76a700 20 cephfs::mirror::Utils mount: filesystem={fscid=56, fs_name=cephfs}
   -13> 2023-12-28T17:51:26.908+0000 7fdc3d76a700 10 cephfs::mirror::Utils mount: mounted filesystem={fscid=56, fs_name=cephfs}
   -12> 2023-12-28T17:51:26.908+0000 7fdc3d76a700 10 cephfs::mirror::FSMirror init: rados addrs=172.21.15.115:0/449872678
   -11> 2023-12-28T17:51:26.908+0000 7fdc3d76a700 20 cephfs::mirror::FSMirror init_instance_watcher
   -10> 2023-12-28T17:51:26.908+0000 7fdc3d76a700 20 cephfs::mirror::InstanceWatcher init
    -9> 2023-12-28T17:51:26.908+0000 7fdc3d76a700 20 cephfs::mirror::InstanceWatcher create_instance
    -8> 2023-12-28T17:51:26.908+0000 7fdc3d76a700 20 cephfs::mirror::Mirror handle_enable_mirroring: filesystem={fscid=54, fs_name=cephfs}, peers=, r=-2
    -7> 2023-12-28T17:51:26.908+0000 7fdc3ff6f700 -1 asok(0x5651f2796000) AdminSocket: error writing response length (32) Broken pipe
    -6> 2023-12-28T17:51:26.910+0000 7fdc31752700 20 cephfs::mirror::InstanceWatcher handle_create_instance: r=0
    -5> 2023-12-28T17:51:26.910+0000 7fdc31752700 20 cephfs::mirror::InstanceWatcher register_watcher
    -4> 2023-12-28T17:51:26.910+0000 7fdc31752700 20 cephfs::mirror::Watcher register_watch
    -3> 2023-12-28T17:51:26.911+0000 7fdc31f53700 20 cephfs::mirror::Watcher handle_register_watch: r=0
    -2> 2023-12-28T17:51:26.911+0000 7fdc31f53700 20 cephfs::mirror::InstanceWatcher handle_register_watcher: r=0
    -1> 2023-12-28T17:51:26.911+0000 7fdc31f53700 20 cephfs::mirror::FSMirror handle_init_instance_watcher: r=0
     0> 2023-12-28T17:51:26.912+0000 7fdc3cf69700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fdc3cf69700 thread_name:safe_timer

 ceph version 16.2.14-417-gc5564c79 (c5564c7988cbaadc3382253af5843a8595347c2d) pacific (stable)
 1: /lib64/libpthread.so.0(+0x12ce0) [0x7fdc4476ace0]
 2: __pthread_mutex_lock()
 3: (std::mutex::lock()+0x17) [0x5651f0710357]
 4: (cephfs::mirror::Mirror::update_fs_mirrors()+0x827) [0x5651f070e7d7]
 5: (Context::complete(int)+0xd) [0x5651f070f6ed]
 6: (CommonSafeTimer<std::mutex>::timer_thread()+0x10f) [0x7fdc4563d65f]
 7: (CommonSafeTimerThread<std::mutex>::entry()+0x11) [0x7fdc4563e9f1]
 8: /lib64/libpthread.so.0(+0x81cf) [0x7fdc447601cf]
 9: clone()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Steps to Reproduce:
1.
2.
3.

Actual results:
The ceph-client.mirror logs show the above bt.

Expected results:


Additional info:

Comment 6 errata-xmlrpc 2024-08-28 17:58:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 6.1 security, bug fix, and enhancement updates.), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:5960