Bug 1250654
| Summary: | rhel-osp-director: overcloud deployment fails on " CephStorageDeployment_Step1" , Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout. | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Alexander Chuzhoy <sasha> | ||||||
| Component: | rhosp-director | Assignee: | Jiri Stransky <jstransk> | ||||||
| Status: | CLOSED DUPLICATE | QA Contact: | yeylon <yeylon> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | high | ||||||||
| Version: | unspecified | CC: | djuran, jdonohue, jstransk, mburns, morazi, rhel-osp-director-maint, rnishtal, sasha, srevivo | ||||||
| Target Milestone: | y2 | Keywords: | Reopened, ZStream | ||||||
| Target Release: | 7.0 (Kilo) | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2015-11-04 17:20:26 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | 1245737 | ||||||||
| Bug Blocks: | 1191185, 1243520 | ||||||||
| Attachments: |
|
||||||||
Created attachment 1059571 [details]
heat-engine from the undercloud
Created attachment 1059572 [details]
messages file from ceph and heat logs from the undercloud.
I think this is related to the CLI/template changes that jistr put in for bug 1247585. @mburns yeah it could be. @sasha what was the command line you used to deploy? Please try passing the environment file as described here: https://bugzilla.redhat.com/show_bug.cgi?id=1247585#c6 Here's the command I use (same as on the last puddle): openstack overcloud deploy --plan overcloud --control-scale 1 --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e /home/stack/network-environment.yaml --ntp-server [IP] --timeout 90 No yaml file for cinder. @sasha -- can you try passing -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml as well and see if that works? Environment: openstack-tripleo-heat-templates-0.8.6-45.el7ost.noarch The file /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml doesn't exist. (In reply to Alexander Chuzhoy from comment #9) > Environment: openstack-tripleo-heat-templates-0.8.6-45.el7ost.noarch > > The file > /usr/share/openstack-tripleo-heat-templates/environments/storage-environment. > yaml doesn't exist. Note: this was resolved in a conversation. The fix requires 0.8.6-46, not -45. Was able to deploy the overcloud using this command: openstack overcloud deploy --templates --control-scale 1 --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e /home/stack/network-environment.yaml --ntp-server [IP] --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml Using this THT build: openstack-tripleo-heat-templates-0.8.6-46.el7ost.noarch Based on comment 11, this is notabug Re-opening this bug. As discussed on IRC, if a user selects to install a ceph-node, we should provide a reasonable default. Or at least point out that the template is needed. failing the deployment with a non-obvious error-message is not a good option We already had a smart default, but it wasn't overridable, causing a number of storage configurations to be impossible (see bug 1247585). We had to remove the smart default in favor of configurability. Re-adding that smart default should be possible once we have parameter overridability on CLI (bug 1245737). We are planning on providing this functionality via the param override functionality in https://bugzilla.redhat.com/show_bug.cgi?id=1245737 and we should track it there. if this solution is insufficient, please feel free to reopen this bug so we can track it distinctly. *** This bug has been marked as a duplicate of bug 1245737 *** The following files in puppet/manifests was hardcoded for ceph installation to go through.
overcloud_cephstorage.pp
23
24 Exec {
25 timeout => 9000,
26 }
27
28 if str2bool(hiera('ceph_osd_selinux_permissive', true)) {
overcloud_controller.pp"
33
34 Exec {
35 timeout => 9000,
36 }
37
overcloud_controller_pacemaker.pp
37
38 Exec {
39 timeout => 9000,
40 }
41
42 if hiera('step') >= 1 {
The timeout has been increased to 9000.
|
rhel-osp-director: overcloud deployment fails on " CephStorageDeployment_Step1" , Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout. Environment: ceph-osd-0.94.1-13.el7cp.x86_64 ceph-0.94.1-13.el7cp.x86_64 ceph-common-0.94.1-13.el7cp.x86_64 ceph-mon-0.94.1-13.el7cp.x86_64 instack-undercloud-2.1.2-22.el7ost.noarch Steps to reproduce: 1. Deploy the undercloud. 2. Attempt to deploy the overcloud with 1 controller, 1 compute and 1 ceph storage. Result: The deployment fails. --------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | parent_resource | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+------------------------- --------------------+ | CephStorageNodesPostDeployment | ee3b8caa-6a5c-48e0-a3f6-5849bdeddb52 | OS::TripleO::CephStoragePostDeployment | CREATE_FAILED | 2015-08-05T15:34:18Z | | | CephStorageDeployment_Step1 | af269a91-6213-4f4b-9a83-694155b1d84b | OS::Heat::StructuredDeployments | CREATE_FAILED | 2015-08-05T15:59:37Z | CephStorageNodesPostDepl oyment | | 0 | d0114bb3-5bd9-487a-bb2d-67b3c4cc7336 | OS::Heat::StructuredDeployment | CREATE_FAILED | 2015-08-05T15:59:38Z | CephStorageDeployment_St ep1 | +--------------------------- [root@overcloud-cephstorage-0 ~]# journalctl -u os-collect-config|grep -i error Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: d 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_size]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: + ceph-authtool /etc/ceph/ceph.client.admin.keyring --name client.admin --add-key AQDzLMJVAAAAABAAYgFxSJn0uFTEqet5IACsLw== --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: added entity client.admin auth auth(auid = 18446744073709551615 key=AQDzLMJVAAAAABAAYgFxSJn0uFTEqet5IACsLw== with 0 caps)\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pg_num]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/public_network]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + test -b /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + mkdir -p /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + ceph-disk prepare /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: executed successfully\u001b[0m\n\u001b[mNotice: Finished catalog run in 303.12 seconds\u001b[0m\n", "deploy_stderr": "\u001b[1;31mError: Command exceeded timeout\nWrapped exception:\nexecution expired\u001b[0m\n\u001b[1;31mError: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout\u001b[0m\n", "deploy_status_code": 6} Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: [2015-08-05 12:05:09,453] (heat-config) [INFO] Error: Command exceeded timeout Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: [2015-08-05 12:05:09,453] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-puppet/8295d161-635a-4edb-8d4a-ede3e76b073c.pp. [6] Expected result: The deployment shouldn't fail on ceph storage.