1901897 – osd containers fail to start when sr0 device is detected but no media is inserted

Bug 1901897 - osd containers fail to start when sr0 device is detected but no media is inserted

Summary: osd containers fail to start when sr0 device is detected but no media is inse...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Ceph Storage
Classification:	Red Hat Storage
Component:	Ceph-Volume
Sub Component:
Version:	3.3
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	3.3z7
Assignee:	Rishabh Dave
QA Contact:	Ameena Suhani S H
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-11-26 11:53 UTC by Takashi Kajinami
Modified:	2024-06-13 23:31 UTC (History)
CC List:	8 users (show)
Fixed In Version:	RHEL: ceph-12.2.12-136.el7cp Ubuntu: ceph_12.2.12-117redhat1
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-05-06 18:32:04 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2021:1518	0	None	None	None	2021-05-06 18:32:28 UTC

Description Takashi Kajinami 2020-11-26 11:53:21 UTC

Description of problem:

We observed that osd containers keep restarting and doesn't start properly.

In /var/log/messages I observed the following message is continuously recorded,
and it seems systemd tries to restart every time when osd container goes down.
~~~
Nov 22 03:42:02 ceph-0 ceph-osd-run.sh[573241]: No data partition found for OSD
Nov 22 03:42:02 ceph-0 systemd[1]: ceph-osd: Main process exited, code=exited, status=1/FAILURE
Nov 22 03:42:02 ceph-0 systemd[1]: ceph-osd: Failed with result 'exit-code'.
~~~

It turned out that the node has sr0 detected and doesn't have any media inserted.
This causes unexpected output to stdout
~~~
ceph-volume inventory --format json 2>/dev/null
 stderr: error: /dev/sr0: No medium found
[{"available": false, "rejected_reasons": ["Used by ceph-disk"], ...
~~~
and causes an failure in the following logic in osd_volume_activate.sh .

https://github.com/ceph/ceph-container/blob/stable-3.2/src/daemon/osd_scenarios/osd_volume_activate.sh#L6

This issue was fixed in OSC4.0 in the following bugzilla[1]
 [1] https://bugzilla.redhat.com/show_bug.cgi?id=1738576
and we need the same fix backported to OCS3.


Version-Release number of selected component (if applicable):

The issue is observed in the deployment which uses the latest image
 name: rhceph/rhceph-3-rhel7:3-48  48ca7dfe8752

How reproducible:
Always

Steps to Reproduce:
1. Prepare a node with sr0 detected but no media inserted
2. Deploy OCS3.3 with osd_scenario: "lvm"

Actual results:
osd containers keep restarting

Expected results:
osd containers start without any error

Additional info:

Comment 4 Rishabh Dave 2021-02-24 17:42:36 UTC

Backports just got merged into the downstream branch - https://gitlab.cee.redhat.com/ceph/ceph/-/merge_requests/47

Comment 10 errata-xmlrpc 2021-05-06 18:32:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Ceph Storage 3.3 Security and Bug Fix Update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:1518

Note You need to log in before you can comment on or make changes to this bug.