Description of problem (please be detailed as possible and provide log snippests): ceph-mon crashed while running e2e flow tests [root@compute-2 /]# ceph crash info 2021-05-21_15:32:56.924290Z_dca2e592-cf0b-4d65-97f0-02f8694f94ae { "crash_id": "2021-05-21_15:32:56.924290Z_dca2e592-cf0b-4d65-97f0-02f8694f94ae", "timestamp": "2021-05-21 15:32:56.924290Z", "process_name": "ceph-mon", "entity_name": "mon.c", "ceph_version": "14.2.11-147.el8cp", "utsname_hostname": "rook-ceph-mon-c-597dc9bb4b-7jwvz", "utsname_sysname": "Linux", "utsname_release": "4.18.0-305.el8.x86_64", "utsname_version": "#1 SMP Thu Apr 29 08:54:30 EDT 2021", "utsname_machine": "x86_64", "os_name": "Red Hat Enterprise Linux", "os_id": "rhel", "os_version_id": "8.3", "os_version": "8.3 (Ootpa)", "assert_condition": "abort", "assert_func": "int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef)", "assert_file": "/builddir/build/BUILD/ceph-14.2.11/src/mon/MonitorDBStore.h", "assert_line": 354, "assert_thread_name": "safe_timer", "assert_msg": "/builddir/build/BUILD/ceph-14.2.11/src/mon/MonitorDBStore.h: In function 'int MonitorDBStore::apply_transaction(MonitorDBStore::TransactionRef)' thread 7fe07e800700 time 2021-05-21 15:32:56.921312\n/builddir/build/BUILD/ceph-14.2.11/src/mon/MonitorDBStore.h: 354: ceph_abort_msg(\"failed to write to db\")\n", "backtrace": [ "(()+0x12b30) [0x7fe08880ab30]", "(gsignal()+0x10f) [0x7fe08747284f]", "(abort()+0x127) [0x7fe08745cc45]", "(ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b4) [0x7fe08bb01015]", "(MonitorDBStore::apply_transaction(std::shared_ptr<MonitorDBStore::Transaction>)+0xe8c) [0x5569eca613fc]", "(Elector::persist_connectivity_scores()+0xe9) [0x5569ecb29979]", "(ConnectionTracker::increase_version()+0x1bd) [0x5569ecb3576d]", "(ConnectionTracker::report_dead_connection(int, double)+0xf0) [0x5569ecb35f60]", "(Elector::dead_ping(int)+0x15e) [0x5569ecb28c4e]", "(C_MonContext::finish(int)+0x3d) [0x5569eca6479d]", "(Context::complete(int)+0xd) [0x5569ecaa951d]", "(SafeTimer::timer_thread()+0x1aa) [0x7fe08bbdd37a]", "(SafeTimerThread::entry()+0x11) [0x7fe08bbdecc1]", "(()+0x815a) [0x7fe08880015a]", "(clone()+0x43) [0x7fe087537f73]" ] } Version of all relevant components (if applicable): ocp 4.8.0-0.nightly-2021-05-19-123944 rook: 4.8-133.5daada2.release_4.8 ceph version 14.2.11-147.el8cp (1f54d52f20d93c1b91f1ec6af4c67a4b81402800) nautilus (stable) ocs-operator.v4.8.0-399.ci Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? NA Is there any workaround available to the best of your knowledge? Run ceph crash archive 2021-05-21_15:32:56.924290Z_dca2e592-cf0b-4d65-97f0-02f8694f94ae Ceph health will be OK Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? 1/1 Can this issue reproduce from the UI? NA If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. https://github.com/red-hat-storage/ocs-ci/pull/4019/files#diff-2940650849aca221f3591c75030e3b4b3f68e7a7dd0f4389895bf93e0850f73dR174 Test covers the following flow operations while running workloads in the background: 1. Node reboot 2. Device replacement 3. Nooba Bucket policy: put, modify, delete 4. Node drain 5. Node n/w failure --> Crash occurred while this running step 6. nooba core pod delete with obc IO 2. 3. Actual results: Ceph-mon crash Expected results: Ceph-mon should not have crashed Additional info: After running ceph archive cmd, IO operations worked fine.
https://bugzilla.redhat.com/show_bug.cgi?id=1967135 not targeted for 5.0*