Description of problem: ----------------------- We have been observing this failure where OSDs don't come up on RHCS 8.0 in a particular scenario and cephadm logs give below error - 2024-11-07T14:45:04.634524+0000 mgr.ceph-saya-upgrade-g0llts-node1-installer.gvdvmd (mgr.14256) 160 : cephadm [WRN] Unable to set osd_memory_target on ceph-saya-upgrade-g0llts-node3 to 559624738: error parsing value: Value '559624738' is below minimum 939524096 Version-Release number of selected component (if applicable): ------------------------------------------------------------- RHCS 8.0 ceph version 19.2.0-52.el9cp (198c75de92aec6de59bc20028c0453bf3e4a0fa7) squid (stable) How reproducible: ------------------ Always Steps to Reproduce: ------------------ 1. Run the automation suite for RHCS 8.0 build - https://github.com/red-hat-storage/cephci/blob/master/suites/squid/cephadm/tier1-container-cli-args.yaml Actual results: ------------------ OSD deployment is stuck and fails with error msg - 2024-11-07T14:45:04.634524+0000 mgr.ceph-saya-upgrade-g0llts-node1-installer.gvdvmd (mgr.14256) 160 : cephadm [WRN] Unable to set osd_memory_target on ceph-saya-upgrade-g0llts-node3 to 559624738: error parsing value: Value '559624738' is below minimum 939524096 Expected results: ------------------ OSDs should be deployed successfully as well as all the tests in the suite should Pass and no failures observed. Additional info: ------------------ To provide some context - We have this regression suite was initially passing on RHCS 8.0 builds (till ceph-2:19.1.0-22.el9cp), but post build ceph-2:19.1.0-41.el9cp, its failing. Now, as per CI logs, the reason for failure is Prometheus service, please check http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/IBM/8.0/rhel-9/Regressi[…]r1-container-cli-args/Service_deployment_with_spec_0.log But, QE tried below scenarios - (A) Run the automation suite, add Prometheus service deployment and skip OSD deployment. Later deploy OSD services In this case, Prometheus deployment is success and OSD deployment fails. (B) Run the automation suite, add OSD deployment, but skip Prometheus service deployment In this case as well, OSD deployment fails. Apart from the above cephadm OSD memory_target error , we are also seeing below messages on cluster - """ Broadcast message from root@ceph-saraut-tfa-blddyl-node3 (Thu 2024-11-07 10:57:35 UTC): Password entry required for 'Please enter passphrase for disk ceph--4717011e--01e1--4b94--b5c2--c1f177fc3426-osd--block--4e4319fe--5533--4e96--9220--93e170495d39 (V0sLe0-wY7z-nQlV-2JAj-wonr-17YN-eyqRAz):' (PID 205). Please enter password with the systemd-tty-ask-password-agent tool. Broadcast message from root@ceph-saraut-tfa-blddyl-node3 (Thu 2024-11-07 10:57:35 UTC): Password entry required for 'Please enter passphrase for disk ceph--9971ceb3--9dd8--45b2--b371--3034aba7c3de-osd--block--4bd61e2b--7741--4012--bbf8--dde1bccd0bb9 (Vvn1FF-khGg-FytF-QnnV-Klu5-etgZ-KcBuyc):' (PID 205). Please enter password with the systemd-tty-ask-password-agent tool. """ this is similar to what I observed in https://bugzilla.redhat.com/show_bug.cgi?id=2304317 Moreover, this suite passes successfully on RHCS 7.1 builds - http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/RH/7.1/rhel-9/Regression/18.2.1-265/dmfg/181/tier1-container-cli-args-rh
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:10216