Bug 2325928 - OSD deployment fails with osd_memory_target error
Summary: OSD deployment fails with osd_memory_target error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat Storage
Component: Ceph-Volume
Version: 8.0
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 8.0
Assignee: Adam King
QA Contact: Sayalee
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-11-13 15:34 UTC by Sayalee
Modified: 2024-11-25 09:13 UTC (History)
8 users (show)

Fixed In Version: ceph-19.2.0-53.el9cp
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-11-25 09:13:40 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github ceph ceph pull 60727 0 None open ceph-volume: fix dmcrypt activation regression 2024-11-14 07:43:10 UTC
Red Hat Issue Tracker RHCEPH-10231 0 None None None 2024-11-13 15:35:58 UTC
Red Hat Product Errata RHBA-2024:10216 0 None None None 2024-11-25 09:13:43 UTC

Description Sayalee 2024-11-13 15:34:41 UTC
Description of problem:
-----------------------
We have been observing this failure where OSDs don't come up on RHCS 8.0 in a particular scenario and cephadm logs give below error -

2024-11-07T14:45:04.634524+0000 mgr.ceph-saya-upgrade-g0llts-node1-installer.gvdvmd (mgr.14256) 160 : cephadm [WRN] Unable to set osd_memory_target on ceph-saya-upgrade-g0llts-node3 to 559624738: error parsing value: Value '559624738' is below minimum 939524096


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHCS 8.0
ceph version 19.2.0-52.el9cp (198c75de92aec6de59bc20028c0453bf3e4a0fa7) squid (stable)


How reproducible:
------------------
Always


Steps to Reproduce:
------------------
1. Run the automation suite for RHCS 8.0 build -  https://github.com/red-hat-storage/cephci/blob/master/suites/squid/cephadm/tier1-container-cli-args.yaml


Actual results:
------------------
OSD deployment is stuck and fails with error msg -
2024-11-07T14:45:04.634524+0000 mgr.ceph-saya-upgrade-g0llts-node1-installer.gvdvmd (mgr.14256) 160 : cephadm [WRN] Unable to set osd_memory_target on ceph-saya-upgrade-g0llts-node3 to 559624738: error parsing value: Value '559624738' is below minimum 939524096


Expected results:
------------------
OSDs should be deployed successfully as well as all the tests in the suite should Pass and no failures observed.


Additional info:
------------------

To provide some context -
We have this regression suite was initially passing on RHCS 8.0 builds (till ceph-2:19.1.0-22.el9cp), but post build ceph-2:19.1.0-41.el9cp, its failing.

Now, as per CI logs, the reason for failure is Prometheus service, please check http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/IBM/8.0/rhel-9/Regressi[…]r1-container-cli-args/Service_deployment_with_spec_0.log

But, QE tried below scenarios -

(A)
Run the automation suite, add Prometheus service deployment and skip OSD deployment.
Later deploy OSD services
In this case, Prometheus deployment is success and OSD deployment fails.

(B)
Run the automation suite, add OSD deployment, but skip Prometheus service deployment
In this case as well, OSD deployment fails.


Apart from the above cephadm OSD memory_target error , we are also seeing below messages on cluster -

"""
Broadcast message from root@ceph-saraut-tfa-blddyl-node3 (Thu 2024-11-07 10:57:35 UTC):

Password entry required for 'Please enter passphrase for disk ceph--4717011e--01e1--4b94--b5c2--c1f177fc3426-osd--block--4e4319fe--5533--4e96--9220--93e170495d39 (V0sLe0-wY7z-nQlV-2JAj-wonr-17YN-eyqRAz):' (PID 205).
Please enter password with the systemd-tty-ask-password-agent tool.


Broadcast message from root@ceph-saraut-tfa-blddyl-node3 (Thu 2024-11-07 10:57:35 UTC):

Password entry required for 'Please enter passphrase for disk ceph--9971ceb3--9dd8--45b2--b371--3034aba7c3de-osd--block--4bd61e2b--7741--4012--bbf8--dde1bccd0bb9 (Vvn1FF-khGg-FytF-QnnV-Klu5-etgZ-KcBuyc):' (PID 205).
Please enter password with the systemd-tty-ask-password-agent tool.
"""

this is similar to what I observed in  https://bugzilla.redhat.com/show_bug.cgi?id=2304317 


Moreover, this suite passes successfully on RHCS 7.1 builds -
http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/RH/7.1/rhel-9/Regression/18.2.1-265/dmfg/181/tier1-container-cli-args-rh

Comment 12 errata-xmlrpc 2024-11-25 09:13:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:10216


Note You need to log in before you can comment on or make changes to this bug.