Bug 1457231
Summary: | osds are down after node restart | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat Ceph Storage | Reporter: | Martin Kudlej <mkudlej> |
Component: | Ceph-Ansible | Assignee: | Sébastien Han <shan> |
Status: | CLOSED CANTFIX | QA Contact: | ceph-qe-bugs <ceph-qe-bugs> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 2.3 | CC: | adeza, aschoen, ceph-eng-bugs, dzafman, gmeno, kchai, mkudlej, nthomas, sankarshan, seb, uboppana |
Target Milestone: | rc | ||
Target Release: | 2.5 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2017-08-02 15:31:42 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Martin Kudlej
2017-05-31 11:40:15 UTC
I need more info on this: * can you check if the systemd unit is enabled? * the title diverges from the description, are all the osds down or some of them? Thanks! I have not installed cluster right now, so answer for first question is just guess. I think that if Ceph is installed by Ceph Ansible all systemd Ceph units should be enabled. Also because of some Osds were up after restart I think that they are enabled. I'm sure by second answer. As I've written in description many Osds were down after restart and next restart were down different set of Osds. ceph-disk is responsible for enabling osd unit files so they 'should' be enabled. Ok thanks, if you don't have the setup anymore that's going to be difficult to debug... :( ceph-disk cannot guarantee that all OSDs will be up after a system reboot. This has nothing to do with systemd units (in this case). Even if you manually enable all the OSD units this will still might not work correctly. It is hard to reproduce as well. You might be able to reboot a node and all OSDs might come up. See https://bugzilla.redhat.com/show_bug.cgi?id=1439210 From that ticket: > We just have no idea what's going on or why at this point. And: > Right now I have no better theory than "udev events are not fired as they should". Basically: it is a known issue with ceph-disk, is not commonly related to enabling of OSD units, and there is no robust fix regardless of the numerous attempts at handling udev/systemd/ceph-disk when a system boots. This is *not* an issue with ceph-ansible should we close this then? Closing as 'Can't Fix'. It should really be a 'known issue' though. |