rhel-osp-director: overcloud deployment fails on " CephStorageDeployment_Step1" , Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout. Environment: ceph-osd-0.94.1-13.el7cp.x86_64 ceph-0.94.1-13.el7cp.x86_64 ceph-common-0.94.1-13.el7cp.x86_64 ceph-mon-0.94.1-13.el7cp.x86_64 instack-undercloud-2.1.2-22.el7ost.noarch Steps to reproduce: 1. Deploy the undercloud. 2. Attempt to deploy the overcloud with 1 controller, 1 compute and 1 ceph storage. Result: The deployment fails. --------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | parent_resource | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+------------------------- --------------------+ | CephStorageNodesPostDeployment | ee3b8caa-6a5c-48e0-a3f6-5849bdeddb52 | OS::TripleO::CephStoragePostDeployment | CREATE_FAILED | 2015-08-05T15:34:18Z | | | CephStorageDeployment_Step1 | af269a91-6213-4f4b-9a83-694155b1d84b | OS::Heat::StructuredDeployments | CREATE_FAILED | 2015-08-05T15:59:37Z | CephStorageNodesPostDepl oyment | | 0 | d0114bb3-5bd9-487a-bb2d-67b3c4cc7336 | OS::Heat::StructuredDeployment | CREATE_FAILED | 2015-08-05T15:59:38Z | CephStorageDeployment_St ep1 | +--------------------------- [root@overcloud-cephstorage-0 ~]# journalctl -u os-collect-config|grep -i error Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: d 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_size]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: + ceph-authtool /etc/ceph/ceph.client.admin.keyring --name client.admin --add-key AQDzLMJVAAAAABAAYgFxSJn0uFTEqet5IACsLw== --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: added entity client.admin auth auth(auid = 18446744073709551615 key=AQDzLMJVAAAAABAAYgFxSJn0uFTEqet5IACsLw== with 0 caps)\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pg_num]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/public_network]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + test -b /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + mkdir -p /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + ceph-disk prepare /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: executed successfully\u001b[0m\n\u001b[mNotice: Finished catalog run in 303.12 seconds\u001b[0m\n", "deploy_stderr": "\u001b[1;31mError: Command exceeded timeout\nWrapped exception:\nexecution expired\u001b[0m\n\u001b[1;31mError: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout\u001b[0m\n", "deploy_status_code": 6} Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: [2015-08-05 12:05:09,453] (heat-config) [INFO] Error: Command exceeded timeout Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: [2015-08-05 12:05:09,453] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-puppet/8295d161-635a-4edb-8d4a-ede3e76b073c.pp. [6] Expected result: The deployment shouldn't fail on ceph storage.
Created attachment 1059571 [details] heat-engine from the undercloud
Created attachment 1059572 [details] messages file from ceph and heat logs from the undercloud.
I think this is related to the CLI/template changes that jistr put in for bug 1247585.
@mburns yeah it could be. @sasha what was the command line you used to deploy? Please try passing the environment file as described here: https://bugzilla.redhat.com/show_bug.cgi?id=1247585#c6
Here's the command I use (same as on the last puddle): openstack overcloud deploy --plan overcloud --control-scale 1 --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e /home/stack/network-environment.yaml --ntp-server [IP] --timeout 90 No yaml file for cinder.
@sasha -- can you try passing -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml as well and see if that works?
Environment: openstack-tripleo-heat-templates-0.8.6-45.el7ost.noarch The file /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml doesn't exist.
(In reply to Alexander Chuzhoy from comment #9) > Environment: openstack-tripleo-heat-templates-0.8.6-45.el7ost.noarch > > The file > /usr/share/openstack-tripleo-heat-templates/environments/storage-environment. > yaml doesn't exist. Note: this was resolved in a conversation. The fix requires 0.8.6-46, not -45.
Was able to deploy the overcloud using this command: openstack overcloud deploy --templates --control-scale 1 --compute-scale 1 --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e /home/stack/network-environment.yaml --ntp-server [IP] --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml Using this THT build: openstack-tripleo-heat-templates-0.8.6-46.el7ost.noarch
Based on comment 11, this is notabug
Re-opening this bug. As discussed on IRC, if a user selects to install a ceph-node, we should provide a reasonable default. Or at least point out that the template is needed. failing the deployment with a non-obvious error-message is not a good option
We already had a smart default, but it wasn't overridable, causing a number of storage configurations to be impossible (see bug 1247585). We had to remove the smart default in favor of configurability. Re-adding that smart default should be possible once we have parameter overridability on CLI (bug 1245737).
We are planning on providing this functionality via the param override functionality in https://bugzilla.redhat.com/show_bug.cgi?id=1245737 and we should track it there. if this solution is insufficient, please feel free to reopen this bug so we can track it distinctly. *** This bug has been marked as a duplicate of bug 1245737 ***
The following files in puppet/manifests was hardcoded for ceph installation to go through. overcloud_cephstorage.pp 23 24 Exec { 25 timeout => 9000, 26 } 27 28 if str2bool(hiera('ceph_osd_selinux_permissive', true)) { overcloud_controller.pp" 33 34 Exec { 35 timeout => 9000, 36 } 37 overcloud_controller_pacemaker.pp 37 38 Exec { 39 timeout => 9000, 40 } 41 42 if hiera('step') >= 1 { The timeout has been increased to 9000.