Bug 1236167
Summary: | CephStorageNodesPostDeployment fails with "Deployment exited with non-zero status code: 6" | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Dan Sneddon <dsneddon> |
Component: | rhosp-director | Assignee: | Jay Dobies <jason.dobies> |
Status: | CLOSED ERRATA | QA Contact: | Amit Ugol <augol> |
Severity: | unspecified | Docs Contact: | |
Priority: | high | ||
Version: | 7.0 (Kilo) | CC: | augol, calfonso, gfidente, jdonohue, jliberma, mburns, morazi, rhel-osp-director-maint, rrosa, shardy, ukalifon, yeylon |
Target Milestone: | ga | Keywords: | Triaged |
Target Release: | Director | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | python-rdomanager-oscplugin-0.0.8-42.el7ost | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2015-08-05 13:57:00 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1191185, 1243520 |
Description
Dan Sneddon
2015-06-26 18:03:56 UTC
Note that I just tried doing a deployment with --ceph-storage-scale 0 and it still bombed out on CephStorageNodesPostDeployment. I confirmed that this is also happening on bare metal, at least with network isolation enabled. Here is the error from /var/log/messages on the Ceph node: un 27 17:10:29 localhost os-collect-config: -prepare-/srv/data]/returns: + test -b /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + mkdir -p /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + ceph-disk prepare /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: executed successfully\u001b[0m\n\u001b[mNotice: Finished catalog run in 302.67 seconds\u001b[0m\n", "deploy_stderr": "\u001b[1;31mError: Command exceeded timeout\nWrapped exception:\nexecution expired\u001b[0m\n\u001b[1;31mError: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout\u001b[0m\n", "deploy_status_code": 6} Jun 27 17:10:29 localhost os-collect-config: [2015-06-27 17:10:29,895] (heat-config) [INFO] Error: Command exceeded timeout Jun 27 17:10:29 localhost os-collect-config: Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout Jun 27 17:10:29 localhost os-collect-config: [2015-06-27 17:10:29,895] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-puppet/aab751ec-95dc-40e0-ae22-db4d452084b3.pp. [6] *** Bug 1236969 has been marked as a duplicate of this bug. *** Just did a bare metal deployment with keystone's auth token timeout increased to 7200 seconds. It got to CREATE_COMPLETE and the errors with Ceph were not seen. I think that means that when this patch lands we are golden: https://code.engineering.redhat.com/gerrit/#/c/51898/2 The other bug to track along with this (the patch should fix both) is https://bugzilla.redhat.com/show_bug.cgi?id=1235908 I'm not sure the "Command exceeded timeout" is actually related to a token (or heat) timeout? It looks to me more like the command on the box is timing out, e.g due to either a puppet or ceph timeout. For example see this upstream bug related to driving ceph-deploy via puppet: https://bugs.launchpad.net/fuel/+bug/1304268 It exhibits the same symptoms, so it may be that the command failure is unrelated to the heat/token timeouts. This exact bug still happens when I deploy without tuskar http://pastebin.test.redhat.com/299709 This happened to me also when I wasn't trying to use network isolation. I was unable to reproduce, can someone who did check NTP on the nodes (controllers and cephstorage) and attach output of ceph -s ? Should be fixed by https://github.com/rdo-management/python-rdomanager-oscplugin/commit/ae39af33200b171be4dbac72ee2b91ad83e85abd The deployments works well now. Note that I tested with only a few nodes, and have no idea what happens if we're trying to deploy a large number of nodes. The specific error does not reproduce though, so this specific issue is verified from my POV. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2015:1549 You can see this error when deploying ceph with tuskar or deploying with templates but missing the -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml parameter. Error in undercloud heat-engine.log: 2015-08-07 01:01:09.691 17016 INFO heat.engine.stack [-] Stack CREATE FAILED (overcloud): Resource CREATE failed: ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: Error: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6" Correct deployment command syntax: openstack overcloud deploy -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/network-environment.yaml --control-flavor control --compute-flavor compute --ceph-storage-flavor ceph --ntp-server 10.16.255.2 --control-scale 3 --compute-scale 4 --ceph-storage-scale 4 --block-storage-scale 0 --swift-storage-scale 0 -t 90 --templates -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml You can also get this error if using hiera to customize Ceph OSD disks, and the existing disks are either tagged for LVM or have non-GPT disk labels. |