2325928 – OSD deployment fails with osd_memory_target error

Bug 2325928 - OSD deployment fails with osd_memory_target error

Summary: OSD deployment fails with osd_memory_target error

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Volume
Sub Component:
Version:	8.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	8.0
Assignee:	Adam King
QA Contact:	Sayalee
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2024-11-13 15:34 UTC by Sayalee
Modified:	2024-11-25 09:13 UTC (History)
CC List:	8 users (show)
Fixed In Version:	ceph-19.2.0-53.el9cp
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2024-11-25 09:13:40 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	ceph ceph pull 60727	None	open	ceph-volume: fix dmcrypt activation regression	2024-11-14 07:43:10 UTC
Red Hat Issue Tracker	RHCEPH-10231	None	None	None	2024-11-13 15:35:58 UTC
Red Hat Product Errata	RHBA-2024:10216	None	None	None	2024-11-25 09:13:43 UTC

Description Sayalee 2024-11-13 15:34:41 UTC

Description of problem:
-----------------------
We have been observing this failure where OSDs don't come up on RHCS 8.0 in a particular scenario and cephadm logs give below error -

2024-11-07T14:45:04.634524+0000 mgr.ceph-saya-upgrade-g0llts-node1-installer.gvdvmd (mgr.14256) 160 : cephadm [WRN] Unable to set osd_memory_target on ceph-saya-upgrade-g0llts-node3 to 559624738: error parsing value: Value '559624738' is below minimum 939524096


Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHCS 8.0
ceph version 19.2.0-52.el9cp (198c75de92aec6de59bc20028c0453bf3e4a0fa7) squid (stable)


How reproducible:
------------------
Always


Steps to Reproduce:
------------------
1. Run the automation suite for RHCS 8.0 build -  https://github.com/red-hat-storage/cephci/blob/master/suites/squid/cephadm/tier1-container-cli-args.yaml


Actual results:
------------------
OSD deployment is stuck and fails with error msg -
2024-11-07T14:45:04.634524+0000 mgr.ceph-saya-upgrade-g0llts-node1-installer.gvdvmd (mgr.14256) 160 : cephadm [WRN] Unable to set osd_memory_target on ceph-saya-upgrade-g0llts-node3 to 559624738: error parsing value: Value '559624738' is below minimum 939524096


Expected results:
------------------
OSDs should be deployed successfully as well as all the tests in the suite should Pass and no failures observed.


Additional info:
------------------

To provide some context -
We have this regression suite was initially passing on RHCS 8.0 builds (till ceph-2:19.1.0-22.el9cp), but post build ceph-2:19.1.0-41.el9cp, its failing.

Now, as per CI logs, the reason for failure is Prometheus service, please check http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/IBM/8.0/rhel-9/Regressi[…]r1-container-cli-args/Service_deployment_with_spec_0.log

But, QE tried below scenarios -

(A)
Run the automation suite, add Prometheus service deployment and skip OSD deployment.
Later deploy OSD services
In this case, Prometheus deployment is success and OSD deployment fails.

(B)
Run the automation suite, add OSD deployment, but skip Prometheus service deployment
In this case as well, OSD deployment fails.


Apart from the above cephadm OSD memory_target error , we are also seeing below messages on cluster -

"""
Broadcast message from root@ceph-saraut-tfa-blddyl-node3 (Thu 2024-11-07 10:57:35 UTC):

Password entry required for 'Please enter passphrase for disk ceph--4717011e--01e1--4b94--b5c2--c1f177fc3426-osd--block--4e4319fe--5533--4e96--9220--93e170495d39 (V0sLe0-wY7z-nQlV-2JAj-wonr-17YN-eyqRAz):' (PID 205).
Please enter password with the systemd-tty-ask-password-agent tool.


Broadcast message from root@ceph-saraut-tfa-blddyl-node3 (Thu 2024-11-07 10:57:35 UTC):

Password entry required for 'Please enter passphrase for disk ceph--9971ceb3--9dd8--45b2--b371--3034aba7c3de-osd--block--4bd61e2b--7741--4012--bbf8--dde1bccd0bb9 (Vvn1FF-khGg-FytF-QnnV-Klu5-etgZ-KcBuyc):' (PID 205).
Please enter password with the systemd-tty-ask-password-agent tool.
"""

this is similar to what I observed in  https://bugzilla.redhat.com/show_bug.cgi?id=2304317 


Moreover, this suite passes successfully on RHCS 7.1 builds -
http://magna002.ceph.redhat.com/cephci-jenkins/results/openstack/RH/7.1/rhel-9/Regression/18.2.1-265/dmfg/181/tier1-container-cli-args-rh

Comment 12 errata-xmlrpc 2024-11-25 09:13:40 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat Ceph Storage 8.0 security, bug fix, and enhancement updates), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:10216

Note You need to log in before you can comment on or make changes to this bug.