Description of problem: Previously, Crimson OSD deployments through cephadm required the users to set these 3 vital config options before triggering OSD deployment: - ceph config set global 'enable_experimental_unrecoverable_data_corrupting_features' crimson - ceph osd set-allow-crimson --yes-i-really-mean-it - ceph config set mon osd_pool_default_crimson true Refer: - https://docs.ceph.com/en/latest/dev/crimson/crimson/?highlight=crimson#enabling-crimson - https://docs.redhat.com/en/documentation/red_hat_ceph_storage/7/html/administration_guide/crimson#configuring-crimson_admin - https://www.ibm.com/docs/en/storage-ceph/7.1?topic=preview-configuring-crimson However, with recent changes introduced around Fed 2024 with PR - https://github.com/ceph/ceph/pull/55276, the user was now required to also assign CPU core per OSD basis for optimal deployment and better performance. Notably the parameters in this PR were supported in a VStart deployment but could not be set in a cephadm deployment out of the box due to the following reasons: 1. The code changes enabled the user to define the CPU core for each OSD on each host but would also need the user to know the OSD ID being deployed on each host before deployment 2. User cannot anticipate the location of OSD in the Crush map. Setting the cpu cores before OSD deployment is difficult due to the *unpredictability* of the allocation. When upstream crimson image was picked up for testing few months ago, deployment has failed due the above mentioned reasons, to support cephadm deployments without reverting the new code, it was decided to come up with new config options which would enable user to do a "basic" deployment which may not yield the true power of Crimson OSD but would be enough to get started in the first place until we figure out a way to support the per OSD CPU core pinning through containerized deployment Upstream tracker: https://tracker.ceph.com/issues/65752 Upstream PR(merged in main): https://github.com/ceph/ceph/pull/57593 Around June 2024, new parameters were introduced to support the said 'basic' deployment for Crimson OSD with PR - https://github.com/ceph/ceph/pull/57593 And the two newly introduced config options are: - crimson_seastar_num_threads - crimson_alien_op_num_threads This PR(57593) is yet to be backported to Squid, and the first Crimson downstream build that was received on 09-Aug-2024 was based off of Squid branch and not 'main', consquently the required parameters are missing in this build. Request is to backport the upstream PR(https://github.com/ceph/ceph/pull/57593) to Squid and trigger a new downstream Crimson build off of the latest Squid branch >>> Until then containerized Crimson OSD deployment through Cephadm will remained blocked and so will any further testing Version-Release number of selected component (if applicable): ceph version 19.1.0-22.0.crimson.el9cp How reproducible: 1/1 Steps to Reproduce: 1. Deploy ceph cluster with crimson image cephadm --image cp.stg.icr.io/cp/ibm-ceph/ceph-8-crimson-rhel9:latest bootstrap --mon-ip 172.20.20.21 --registry-json ~/crimson/ibm-registry.json --allow-fqdn-hostname --allow-overwrite --orphan-initial-daemons --initial-dashboard-password crimson-80-tp 2. Add hosts with labels 3. Deploy Monitors and Managers 4. Configure required parameters to enable crimson - # ceph config set global 'enable_experimental_unrecoverable_data_corrupting_features' crimson - # ceph osd set-allow-crimson --yes-i-really-mean-it - # ceph config set mon osd_pool_default_crimson true 5. Set the CPU core affinity at global OSD level - # ceph config set osd crimson_seastar_num_threads <N> 6. Deploy OSDs, either on all available devices or using spec file Actual results: Crimson OSD deployment is not possible in containerized deployments through cephadm until PR https://github.com/ceph/ceph/pull/57593 is backported to Squid Expected results: Crimson OSD deployment with necessary config parameters set should be possible with cephadm orchestrator Additional info: >>> Build version - [root@bruuni011 ubuntu]# cephadm shell -- ceph versions Inferring fsid fe1e8690-603e-11ef-a532-0cc47af96462 Inferring config /var/lib/ceph/fe1e8690-603e-11ef-a532-0cc47af96462/mon.bruuni011/config Using ceph image with id '7d2a9d161de6' and tag 'latest' created on 2024-08-08 19:51:29 +0000 UTC cp.stg.icr.io/cp/ibm-ceph/ceph-8-crimson-rhel9@sha256:47d13be71934eb3a6d286333f8818628dce0f1bf43164a7d0b263d9840e5d300 2024-08-23T12:57:54.715+0000 7f70e563e640 -1 WARNING: the following dangerous and experimental features are enabled: crimson 2024-08-23T12:57:54.716+0000 7f70e563e640 -1 WARNING: the following dangerous and experimental features are enabled: crimson { "mon": { "ceph version 19.1.0-22.0.crimson.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)": 5 }, "mgr": { "ceph version 19.1.0-22.0.crimson.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)": 5 }, "overall": { "ceph version 19.1.0-22.0.crimson.el9cp (e5b7dfedb7d8a66d166eb0f98361f71bdb7905ad) squid (rc)": 10 } } >>> crimson_seastar_num_threads is unrecognized [root@bruuni011 ubuntu]# ceph config get osd crimson_seastar_num_threads 2024-08-22T17:45:36.835+0000 7f64f0141640 -1 WARNING: the following dangerous and experimental features are enabled: crimson 2024-08-22T17:45:36.835+0000 7f64f0141640 -1 WARNING: the following dangerous and experimental features are enabled: crimson Error ENOENT: unrecognized key 'crimson_seastar_num_threads' [root@bruuni011 ubuntu]# ceph config get osd crimson_alien_op_num_threads 2024-08-22T17:45:54.723+0000 7f184748b640 -1 WARNING: the following dangerous and experimental features are enabled: crimson 2024-08-22T17:45:54.724+0000 7f184748b640 -1 WARNING: the following dangerous and experimental features are enabled: crimson 6 >>> Config dump [root@bruuni011 ubuntu]# ceph config dump 2024-08-23T13:08:34.695+0000 7f7644f0b640 -1 WARNING: the following dangerous and experimental features are enabled: crimson 2024-08-23T13:08:34.696+0000 7f7644f0b640 -1 WARNING: the following dangerous and experimental features are enabled: crimson WHO MASK LEVEL OPTION VALUE RO global basic container_image cp.stg.icr.io/cp/ibm-ceph/ceph-8-crimson-rhel9@sha256:47d13be71934eb3a6d286333f8818628dce0f1bf43164a7d0b263d9840e5d300 * global advanced enable_experimental_unrecoverable_data_corrupting_features crimson mon advanced auth_allow_insecure_global_id_reclaim false mon advanced osd_pool_default_crimson true mon advanced public_network 172.20.20.0/24 * mgr advanced mgr/cephadm/container_init True * mgr advanced mgr/cephadm/migration_current 8 * mgr advanced mgr/dashboard/ALERTMANAGER_API_HOST http://bruuni011.back.ceph.redhat.com:9093 * mgr advanced mgr/dashboard/GRAFANA_API_SSL_VERIFY false * mgr advanced mgr/dashboard/GRAFANA_API_URL https://bruuni011.back.ceph.redhat.com:3000 * mgr advanced mgr/dashboard/PROMETHEUS_API_HOST http://bruuni011.back.ceph.redhat.com:9095 * mgr advanced mgr/dashboard/ssl_server_port 8443 * mgr advanced mgr/orchestrator/orchestrator cephadm osd advanced osd_memory_target_autotune true >>> OSD log snippet [root@bruuni003 fe1e8690-603e-11ef-a532-0cc47af96462]# tail -25 ceph-osd.3.log INFO 2024-08-23 12:37:31,125 [shard 0:main] ms - 0x3000827f98c0 client.?(temp_mon_client) 172.20.20.13:0/2760811900@50120 >> mon.? v2:172.20.20.18:3300/0 protocol CONNECTING execute_connecting is aborted at inconsistent CLOSING -- negotiation failure INFO 2024-08-23 12:37:31,125 [shard 0:main] ms - 0x3000827f9700 client.?(temp_mon_client) 172.20.20.13:0/2760811900@54376 >> mon.? v2:172.20.20.21:3300/0 protocol CONNECTING execute_connecting is aborted at inconsistent CLOSING -- negotiation failure WARN 2024-08-23 12:37:31,126 [shard 0:main] ms - 0x3000827f9540 client.?(temp_mon_client) 172.20.20.13:0/2760811900@53424 >> mon.2 v2:172.20.20.22:3300/0 UPDATE Policy(lossy=true) from server flags INFO 2024-08-23 12:37:31,126 [shard 0:main] ms - 0x3000827f9540 client.?(temp_mon_client) 172.20.20.13:0/2760811900@53424 >> mon.2 v2:172.20.20.22:3300/0 connected: gs=3, pgs=10349, cs=0, client_cookie=11050325961552303138, server_cookie=0, io(in_seq=0, is_out_queued=false, has_out_sent=false), new_sid=0, send 3 IOHandler::dispatch_connect() INFO 2024-08-23 12:37:31,126 [shard 0:main] ms - 0x3000827f9540 client.?(temp_mon_client) 172.20.20.13:0/2760811900@53424 >> mon.2 v2:172.20.20.22:3300/0 do_out_dispatch: stop(delay...) at delay, no out_exit_dispatching INFO 2024-08-23 12:37:31,126 [shard 0:main] ms - 0x3000827f9540 client.?(temp_mon_client) 172.20.20.13:0/2760811900@53424 >> mon.2 v2:172.20.20.22:3300/0 do_out_dispatch: stop(switched) at switched, no out_exit_dispatching INFO 2024-08-23 12:37:31,126 [shard 0:main] monc - got monmap 5, mon.bruuni012, is now rank 2 INFO 2024-08-23 12:37:31,126 [shard 0:main] monc - handle_monmap: renewing tickets INFO 2024-08-23 12:37:31,126 [shard 0:main] monc - renew_rotating_keyring renewing rotating keys (they expired before 1724416621.1261876) INFO 2024-08-23 12:37:31,126 [shard 0:main] monc - renew_rotating_keyring called too often (last: 1724416651.1257656) INFO 2024-08-23 12:37:31,126 [shard 0:main] monc - handle_mon_map: renewed tickets INFO 2024-08-23 12:37:31,126 [shard 0:main] monc - handle_auth_reply [0x3000827f9540 client.?(temp_mon_client) 172.20.20.13:0/2760811900@53424 >> mon.2 v2:172.20.20.22:3300/0] returns auth_reply(proto 2 0 (0) Success) v1: 0 INFO 2024-08-23 12:37:31,126 [shard 0:main] monc - handle_auth_reply INFO 2024-08-23 12:37:31,126 [shard 0:main] monc - renew_rotating_keyring renewing rotating keys (they expired before 1724416621.1262548) INFO 2024-08-23 12:37:31,126 [shard 0:main] monc - renew_rotating_keyring called too often (last: 1724416651.1257656) INFO 2024-08-23 12:37:31,126 [shard 0:main] monc - do_auth_single: [0x3000827f9540 client.?(temp_mon_client) 172.20.20.13:0/2760811900@53424 >> mon.2 v2:172.20.20.22:3300/0] returns auth_reply(proto 2 0 (0) Success) v1: 0 WARN 2024-08-23 12:37:31,126 [shard 0:main] monc - renew_subs - empty INFO 2024-08-23 12:37:31,126 [shard 0:main] monc - set_mon_vals no callback set INFO 2024-08-23 12:37:31,127 [shard 0:main] monc - stop INFO 2024-08-23 12:37:31,127 [shard 0:main] monc - close INFO 2024-08-23 12:37:31,127 [shard 0:main] ms - 0x3000827f9540 client.?(temp_mon_client) 172.20.20.13:0/2760811900@53424 >> mon.2 v2:172.20.20.22:3300/0 mark_down() at io_stat(io_state=open, in_seq=3, out_seq=3, out_pending_msgs_size=0, out_sent_msgs_size=0, need_ack=0, need_keepalive=0, need_keepalive_ack=0), send 1 notify_mark_down() INFO 2024-08-23 12:37:31,127 [shard 0:main] ms - 0x3000827f9540 client.?(temp_mon_client) 172.20.20.13:0/2760811900@53424 >> mon.2 v2:172.20.20.22:3300/0 closing: reset no, replace no INFO 2024-08-23 12:37:31,127 [shard 0:main] ms - 0x3000827f9540 client.?(temp_mon_client) 172.20.20.13:0/2760811900@53424 >> mon.2 v2:172.20.20.22:3300/0 do_in_dispatch(): fault at drop, io_stat(io_state=drop, in_seq=3, out_seq=3, out_pending_msgs_size=0, out_sent_msgs_size=0, need_ack=0, need_keepalive=0, need_keepalive_ack=0) -- read eof ERROR 2024-08-23 12:37:31,129 [shard 0:main] none - /builddir/build/BUILD/ceph-19.1.0/src/crimson/os/alienstore/alien_store.cc:112 : In function 'virtual seastar::future<> crimson::os::AlienStore::start()', ceph_assert(%s) cpu_cores.has_value()
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:10216