| Summary: | rhel-osp-director: 9.0 After minor update (includes rhel7.2->rhel7.3 switch) + reboot of overcloud nodes, ceph OSDs are down. | |||
|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> | |
| Component: | documentation | Assignee: | RHOS Documentation Team <rhos-docs> | |
| Status: | CLOSED DUPLICATE | QA Contact: | RHOS Documentation Team <rhos-docs> | |
| Severity: | medium | Docs Contact: | Derek <dcadzow> | |
| Priority: | medium | |||
| Version: | 9.0 (Mitaka) | CC: | dbecker, ddomingo, gfidente, jcoufal, jefbrown, johfulto, jomurphy, mandreou, mburns, morazi, ohochman, rhel-osp-director-maint, sasha, srevivo, tvignaud | |
| Target Milestone: | async | Keywords: | Documentation | |
| Target Release: | 10.0 (Newton) | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1393474 (view as bug list) | Environment: | ||
| Last Closed: | 2017-07-12 00:31:57 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Bug Depends On: | ||||
| Bug Blocks: | 1393474 | |||
|
Description
Alexander Chuzhoy
2016-10-21 04:09:22 UTC
Reproduced. hi Alexander, is 'sudo chkconfig --list ceph' reporting ceph as enabled on boot on the ceph storage nodes? Hi Gioulio,
yes - it's enabled.
[stack@undercloud72 ~]$ ssh heat-admin.0.8 "sudo chkconfig --list ceph"
Note: This output shows SysV services only and does not include native
systemd services. SysV configuration data might be overridden by native
systemd configuration.
If you want to list systemd services use 'systemctl list-unit-files'.
To see services enabled on particular target use
'systemctl list-dependencies [target]'.
ceph 0:off 1:off 2:on 3:on 4:on 5:on 6:off
From my tests, rebooting a cephstorage node after it has been upgraded to RHEL 7.3 is not an issue. The ceph-osd is started on boot and it re-joins the cluster as long as the Ceph monitors remain available Upon restart, ceph-osd will try to reach one of the monitors for 1 minute, after which if it couldn't it will terminate itself. My understanding is that if controller and cephstorage nodes are rebooted roughly around the same time, it is possible that none of the Ceph monitors is available when the Ceph OSDs are attempting to start, causing them to terminate. Alexander, can you confirm that by rebooting the cephstorage nodes only all the OSDs are brought back up? I will also check if it is possible and feasible to increase the wait time of the OSDs. Might be Documentation only - reboot in particular order (following the same process as update steps). Sasha will verify and if works, please send to doc_text. This could be hit if all nodes went down at the same time, in which case if the Ceph OSDs start before the MONs, they will terminate themselves after a 1min timeout. It should be sufficient to attempt a manual start the ceph-osd systemd units manually after the MONs are available. For example, to restart the osd.0 instance, login on the node hosting the osd.0 and run the following as root: systemctl restart ceph-osd@0 As an alternative, the cephstorage nodes can be rebooted after MONs are available. (In reply to Jaromir Coufal from comment #10) > Might be Documentation only - reboot in particular order (following the same > process as update steps). Sasha will verify and if works, please send to > doc_text. (In reply to Giulio Fidente from comment #12) According to comment #12 in case of power-outage we can manually start the ceph-osd systemd units. this issue is going to be documented. I'm removing the blocker flag and lower Priority /Severity. Yeah, so I can confirm that after seeing this issue, I simply rebooted the ceph osd nodes and upon return all the OSDs were up. |