Description of problem: We observed that osd containers keep restarting and doesn't start properly. In /var/log/messages I observed the following message is continuously recorded, and it seems systemd tries to restart every time when osd container goes down. ~~~ Nov 22 03:42:02 ceph-0 ceph-osd-run.sh[573241]: No data partition found for OSD Nov 22 03:42:02 ceph-0 systemd[1]: ceph-osd: Main process exited, code=exited, status=1/FAILURE Nov 22 03:42:02 ceph-0 systemd[1]: ceph-osd: Failed with result 'exit-code'. ~~~ It turned out that the node has sr0 detected and doesn't have any media inserted. This causes unexpected output to stdout ~~~ ceph-volume inventory --format json 2>/dev/null stderr: error: /dev/sr0: No medium found [{"available": false, "rejected_reasons": ["Used by ceph-disk"], ... ~~~ and causes an failure in the following logic in osd_volume_activate.sh . https://github.com/ceph/ceph-container/blob/stable-3.2/src/daemon/osd_scenarios/osd_volume_activate.sh#L6 This issue was fixed in OSC4.0 in the following bugzilla[1] [1] https://bugzilla.redhat.com/show_bug.cgi?id=1738576 and we need the same fix backported to OCS3. Version-Release number of selected component (if applicable): The issue is observed in the deployment which uses the latest image name: rhceph/rhceph-3-rhel7:3-48 48ca7dfe8752 How reproducible: Always Steps to Reproduce: 1. Prepare a node with sr0 detected but no media inserted 2. Deploy OCS3.3 with osd_scenario: "lvm" Actual results: osd containers keep restarting Expected results: osd containers start without any error Additional info:
Backports just got merged into the downstream branch - https://gitlab.cee.redhat.com/ceph/ceph/-/merge_requests/47
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 3.3 Security and Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1518