Description of problem (please be detailed as possible and provide log snippests): on a fresh deploying cluster Mon Pods are in CrashLoopBackOff with msg /builddir/build/BUILD/ceph-19.1.0-0-g9025b9024ba/src/osd/OSDMap.cc: 3286: FAILED ceph_assert(pg_upmap_primaries.empty()) Version of all relevant components (if applicable): OCP version:- 4.17.0-0.nightly-2024-07-31-035751 ODF version:- 4.17.0-57 CEPH version:- ceph version 19.1.0-0-g9025b9024ba (9025b9024baf597d63005552b5ee004013630404) squid (rc) ACM version:- 2.12.0-25 SUBMARINER version:- v0.18.0 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1.Deploy cluster over vmware 2.Install ODF 4.17 3. Actual results: mgrc handle_mgr_map Active mgr is now [v2:10.129.2.31:6800/76971175,v1:10.129.2.31:6801/76971175] debug -2> 2024-08-01T05:16:39.170+0000 7f41654cc640 5 mon.a@0(leader).paxos(paxos active c 1..435) is_readable = 1 - now=2024-08-01T05:16:39.172408+0000 lease_expire=2024-08-01T05:16:44.074475+0000 has v0 lc 435 debug -1> 2024-08-01T05:16:39.173+0000 7f41654cc640 -1 /builddir/build/BUILD/ceph-19.1.0-0-g9025b9024ba/src/osd/OSDMap.cc: In function 'void OSDMap::encode(ceph::buffer::v15_2_0::list&, uint64_t) const' thread 7f41654cc640 time 2024-08-01T05:16:39.173718+0000 /builddir/build/BUILD/ceph-19.1.0-0-g9025b9024ba/src/osd/OSDMap.cc: 3286: FAILED ceph_assert(pg_upmap_primaries.empty()) ceph version 19.1.0-0-g9025b9024ba (9025b9024baf597d63005552b5ee004013630404) squid (rc) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x12e) [0x7f416d9e9f62] 2: /usr/lib64/ceph/libceph-common.so.2(+0x182120) [0x7f416d9ea120] 3: /usr/lib64/ceph/libceph-common.so.2(+0x1c244b) [0x7f416da2a44b] 4: (OSDMonitor::reencode_full_map(ceph::buffer::v15_2_0::list&, unsigned long)+0xdf) [0x5589340e22df] 5: (OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v15_2_0::list&)+0x1e6) [0x5589340e3e26] 6: (OSDMonitor::build_latest_full(unsigned long)+0x133) [0x5589340e3fb3] 7: (OSDMonitor::check_osdmap_sub(Subscription*)+0x73) [0x5589340e6a73] 8: (Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0x1190) [0x558933f9f260] 9: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x796) [0x558933f98df6] 10: (Monitor::_ms_dispatch(Message*)+0x42f) [0x558933f99f8f] 11: ceph-mon(+0x260a6e) [0x558933f53a6e] 12: (DispatchQueue::entry()+0x542) [0x7f416dbe4602] 13: /usr/lib64/ceph/libceph-common.so.2(+0x410421) [0x7f416dc78421] 14: /lib64/libc.so.6(+0x89c02) [0x7f416d171c02] 15: /lib64/libc.so.6(+0x10ec40) [0x7f416d1f6c40] debug 0> 2024-08-01T05:16:39.174+0000 7f41654cc640 -1 *** Caught signal (Aborted) ** in thread 7f41654cc640 thread_name:ms_dispatch ceph version 19.1.0-0-g9025b9024ba (9025b9024baf597d63005552b5ee004013630404) squid (rc) 1: /lib64/libc.so.6(+0x3e6f0) [0x7f416d1266f0] 2: /lib64/libc.so.6(+0x8b94c) [0x7f416d17394c] 3: raise() 4: abort() 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x188) [0x7f416d9e9fbc] 6: /usr/lib64/ceph/libceph-common.so.2(+0x182120) [0x7f416d9ea120] 7: /usr/lib64/ceph/libceph-common.so.2(+0x1c244b) [0x7f416da2a44b] 8: (OSDMonitor::reencode_full_map(ceph::buffer::v15_2_0::list&, unsigned long)+0xdf) [0x5589340e22df] 9: (OSDMonitor::get_version_full(unsigned long, unsigned long, ceph::buffer::v15_2_0::list&)+0x1e6) [0x5589340e3e26] 10: (OSDMonitor::build_latest_full(unsigned long)+0x133) [0x5589340e3fb3] 11: (OSDMonitor::check_osdmap_sub(Subscription*)+0x73) [0x5589340e6a73] 12: (Monitor::handle_subscribe(boost::intrusive_ptr<MonOpRequest>)+0x1190) [0x558933f9f260] 13: (Monitor::dispatch_op(boost::intrusive_ptr<MonOpRequest>)+0x796) [0x558933f98df6] 14: (Monitor::_ms_dispatch(Message*)+0x42f) [0x558933f99f8f] 15: ceph-mon(+0x260a6e) [0x558933f53a6e] 16: (DispatchQueue::entry()+0x542) [0x7f416dbe4602] 17: /usr/lib64/ceph/libceph-common.so.2(+0x410421) [0x7f416dc78421] 18: /lib64/libc.so.6(+0x89c02) [0x7f416d171c02] 19: /lib64/libc.so.6(+0x10ec40) [0x7f416d1f6c40] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 3/ 5 mds_quiesce 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 rbd_pwl 0/ 5 journaler 0/ 5 objectcacher 0/ 5 immutable_obj_cache 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 0 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 rgw_sync 1/ 5 rgw_datacache 1/ 5 rgw_access 1/ 5 rgw_dbstore 1/ 5 rgw_flight 1/ 5 rgw_lifecycle 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 1/ 5 fuse 2/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace 1/ 5 prioritycache 0/ 5 test 0/ 5 cephfs_mirror 0/ 5 cephsqlite 0/ 5 crimson_interrupt 0/ 5 seastore 0/ 5 seastore_onode 0/ 5 seastore_odata 0/ 5 seastore_omap 0/ 5 seastore_tm 0/ 5 seastore_t 0/ 5 seastore_cleaner 0/ 5 seastore_epm 0/ 5 seastore_lba 0/ 5 seastore_fixedkv_tree 0/ 5 seastore_cache 0/ 5 seastore_journal 0/ 5 seastore_device 0/ 5 seastore_backref 0/ 5 alienstore 1/ 5 mclock 0/ 5 cyanstore 1/ 5 ceph_exporter 1/ 5 memstore 1/ 5 trace -2/-2 (syslog threshold) 99/99 (stderr threshold) --- pthread ID / name mapping for recent threads --- 7f4161cc5640 / ms_dispatch 7f41624c6640 / ceph-mon 7f4162cc7640 / fn_monstore 7f41634c8640 / msgr-worker-0 7f4163cc9640 / msgr-worker-1 7f41654cc640 / ms_dispatch 7f4167cd1640 / safe_timer 7f41694d4640 / msgr-worker-2 7f416b695640 / admin_socket 7f416c6fbb00 / ceph-mon max_recent 10000 max_new 1000 log_file /var/lib/ceph/crash/2024-08-01T05:16:39.176055Z_488cc8b3-a9e8-4422-bccc-17c758fc1b3a/log --- end dump of recent events --- Expected results: Pod should be in running state Additional info:
Please update the RDT flag/text appropriately.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.17.0 Security, Enhancement, & Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:8676
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days