| Summary: | osp-director-10: Major Upgrade OSP9 -> 10 with Ceph-node, fails on : "Ceph cluster status to go HEALTH_OK\nWARNING" | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Omri Hochman <ohochman> |
| Component: | documentation | Assignee: | Dan Macpherson <dmacpher> |
| Status: | CLOSED NOTABUG | QA Contact: | RHOS Documentation Team <rhos-docs> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 10.0 (Newton) | CC: | dbecker, dmacpher, flucifre, jcoufal, jomurphy, josorior, lbopf, mburns, morazi, rhel-osp-director-maint, srevivo |
| Target Milestone: | --- | Keywords: | Documentation, Reopened |
| Target Release: | 10.0 (Newton) | ||
| Hardware: | x86_64 | ||
| OS: | All | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2017-02-15 16:23:04 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Omri Hochman
2016-10-07 22:31:54 UTC
It might be that the Ceph Warnings are the reason for this failure? in that case we would need to add ignore ( because deployments with 1 ceph will always have those Warnings )
[stack@undercloud-0 ~]$ heat deployment-show 12d566de-88e4-47f8-9675-ffad9d01f76a
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
"status": "FAILED",
"server_id": "9bd3ba9f-2eaa-4b69-99f0-478bda88d090",
"config_id": "a1789b3c-1dbe-4b9c-bff3-80f7dfa1c404",
"output_values": {
"deploy_stdout": "INFO: starting a1789b3c-1dbe-4b9c-bff3-80f7dfa1c404\nWARNING: Waiting for Ceph cluster status to go HEALTH_OK\nWARNING: Waiting for Ceph cluster status to go HEALTH_OK\
ARNING: Waiting for Ceph cluster status to go HEALTH_OK\nWARNING: Waiting for Ceph cluster status to go HEALTH_OK\nWARNING: Waiting for Ceph cluster status to go HEALTH_OK\nWARNING: Waiting
r Ceph cluster status to go HEALTH_OK\nWARNING: Waiting for Ceph cluster status to go HEALTH_OK\nWARNING: Waiting for Ceph cluster status to go HEALTH_OK\nWARNING: Waiting for Ceph cluster s
tus to go HEALTH_OK\nWARNING: Waiting for Ceph cluster status to go HEALTH_OK\n",
"deploy_stderr": "",
"deploy_status_code": 124
},
"creation_time": "2016-10-10T20:04:08Z",
"updated_time": "2016-10-10T20:10:36Z",
"input_values": {
"update_identifier": "",
"deploy_identifier": "1476129604"
},
"action": "CREATE",
"status_reason": "deploy_status_code : Deployment exited with non-zero status code: 124",
"id": "12d566de-88e4-47f8-9675-ffad9d01f76a"
On Controller:
---------------
[root@controller-0 ~]# ceph health
HEALTH_WARN 192 pgs degraded; 192 pgs stuck degraded; 192 pgs stuck unclean; 192 pgs stuck undersized; 192 pgs undersized
[root@controller-0 ~]# ceph health status
status not valid: status not in detail
Invalid command: unused arguments: ['status']
health {detail} : show cluster health
Error EINVAL: invalid command
[root@controller-0 ~]# ceph status
cluster 1a387610-8ce4-11e6-89aa-525400cc88d3
health HEALTH_WARN
192 pgs degraded
192 pgs stuck degraded
192 pgs stuck unclean
192 pgs stuck undersized
192 pgs undersized
monmap e1: 3 mons at {controller-0=172.17.3.13:6789/0,controller-1=172.17.3.11:6789/0,controller-2=172.17.3.15:6789/0}
election epoch 6, quorum 0,1,2 controller-1,controller-0,controller-2
osdmap e9: 1 osds: 1 up, 1 in
pgmap v18: 192 pgs, 5 pools, 0 bytes data, 0 objects
34980 kB used, 39881 MB / 39915 MB avail
192 active+undersized+degraded
trying to workaround by: cat > /home/stack/ignore-ceph.yaml <<EOF parameter_defaults: IgnoreCephUpgradeWarnings: true EOF And add the following to upgrade steps: -e /home/stack/ignore-ceph.yaml I don't think the puppet error of not finding the ::tripleo::trusted_cas class is related to Ceph. That class was not present in mitaka and was introduced in OSP10. Seems that the manifests are old and need to be updated. (In reply to Juan Antonio Osorio from comment #3) > I don't think the puppet error of not finding the ::tripleo::trusted_cas > class is related to Ceph. That class was not present in mitaka and was > introduced in OSP10. Seems that the manifests are old and need to be updated. please ignore the ::tripleo::trusted_cas - the issue is - when there are Ceph Warnings (which we *always* have with 1 ceph deployments) [root@controller-0 ~]# ceph status cluster 1a387610-8ce4-11e6-89aa-525400cc88d3 health HEALTH_WARN 192 pgs degraded Upgrade will fail on the step: Upgrade Controller and Block-storage . Notes: (1) this issue didn't happen during osp8 -> osp9 upgrades ,since we didn't upgraded Ceph . (2) theoretically - with 3 ceph cluster we should not have those warnings and upgrade should pass. (3) I found the workaround from comment #2 Valid (adding: IgnoreCephUpgradeWarnings: true) We would need PM decision about if we want to document it? or to find another solution. As I understand the 'IgnoreCephUpgradeWarnings' variable is for dev only, but we might use this exception for 1 ceph env upgrade. Ceph related topic, the decision needs to come out from their DFG. Moving there, raising urgency, targeting 10, raising question around blocker. Ceph is Operating as designed. Ceph requires 3 OSDs, and it is warning that an unhealthy cluster configuration is present — which it is, with one node. There is a workaround in #2, I do not think we need anything further. (In reply to Federico Lucifredi from comment #7) > Ceph is Operating as designed. Ceph requires 3 OSDs, and it is warning that > an unhealthy cluster configuration is present — which it is, with one node. > > There is a workaround in #2, I do not think we need anything further. re-open the bug to make sure it's going to be documented. adding requires_doc_text ? Moving to 'NEW' to be triaged as the schedule allows. |