Bug 1901897
| Summary: | osd containers fail to start when sr0 device is detected but no media is inserted | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Takashi Kajinami <tkajinam> |
| Component: | Ceph-Volume | Assignee: | Rishabh Dave <ridave> |
| Status: | CLOSED ERRATA | QA Contact: | Ameena Suhani S H <amsyedha> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 3.3 | CC: | aschoen, ceph-eng-bugs, ceph-qe-bugs, gmeno, mmuench, ridave, tserlin, vashastr |
| Target Milestone: | --- | ||
| Target Release: | 3.3z7 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | RHEL: ceph-12.2.12-136.el7cp Ubuntu: ceph_12.2.12-117redhat1 | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-05-06 18:32:04 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Backports just got merged into the downstream branch - https://gitlab.cee.redhat.com/ceph/ceph/-/merge_requests/47 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Ceph Storage 3.3 Security and Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:1518 |
Description of problem: We observed that osd containers keep restarting and doesn't start properly. In /var/log/messages I observed the following message is continuously recorded, and it seems systemd tries to restart every time when osd container goes down. ~~~ Nov 22 03:42:02 ceph-0 ceph-osd-run.sh[573241]: No data partition found for OSD Nov 22 03:42:02 ceph-0 systemd[1]: ceph-osd: Main process exited, code=exited, status=1/FAILURE Nov 22 03:42:02 ceph-0 systemd[1]: ceph-osd: Failed with result 'exit-code'. ~~~ It turned out that the node has sr0 detected and doesn't have any media inserted. This causes unexpected output to stdout ~~~ ceph-volume inventory --format json 2>/dev/null stderr: error: /dev/sr0: No medium found [{"available": false, "rejected_reasons": ["Used by ceph-disk"], ... ~~~ and causes an failure in the following logic in osd_volume_activate.sh . https://github.com/ceph/ceph-container/blob/stable-3.2/src/daemon/osd_scenarios/osd_volume_activate.sh#L6 This issue was fixed in OSC4.0 in the following bugzilla[1] [1] https://bugzilla.redhat.com/show_bug.cgi?id=1738576 and we need the same fix backported to OCS3. Version-Release number of selected component (if applicable): The issue is observed in the deployment which uses the latest image name: rhceph/rhceph-3-rhel7:3-48 48ca7dfe8752 How reproducible: Always Steps to Reproduce: 1. Prepare a node with sr0 detected but no media inserted 2. Deploy OCS3.3 with osd_scenario: "lvm" Actual results: osd containers keep restarting Expected results: osd containers start without any error Additional info: