Bug 1250654 - rhel-osp-director: overcloud deployment fails on " CephStorageDeployment_Step1" , Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout.
rhel-osp-director: overcloud deployment fails on " CephStorageDeployment_Step...
Status: CLOSED DUPLICATE of bug 1245737
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director (Show other bugs)
unspecified
Unspecified Unspecified
high Severity high
: y2
: 7.0 (Kilo)
Assigned To: Jiri Stransky
yeylon@redhat.com
: Reopened, ZStream
Depends On: 1245737
Blocks: 1191185 1243520
  Show dependency treegraph
 
Reported: 2015-08-05 12:53 EDT by Alexander Chuzhoy
Modified: 2016-04-18 02:51 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-11-04 12:20:26 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
heat-engine from the undercloud (18.91 MB, application/x-gzip)
2015-08-05 13:02 EDT, Alexander Chuzhoy
no flags Details
messages file from ceph and heat logs from the undercloud. (2.89 MB, application/x-gzip)
2015-08-05 13:05 EDT, Alexander Chuzhoy
no flags Details

  None (edit)
Description Alexander Chuzhoy 2015-08-05 12:53:37 EDT
rhel-osp-director: overcloud deployment fails on " CephStorageDeployment_Step1" , Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout.


Environment:
ceph-osd-0.94.1-13.el7cp.x86_64
ceph-0.94.1-13.el7cp.x86_64
ceph-common-0.94.1-13.el7cp.x86_64
ceph-mon-0.94.1-13.el7cp.x86_64
instack-undercloud-2.1.2-22.el7ost.noarch

Steps to reproduce:
1. Deploy the undercloud.
2. Attempt to deploy the overcloud with 1 controller, 1 compute and 1 ceph storage.

Result:
The deployment fails.
--------------------+
| resource_name                               | physical_resource_id                          | resource_type                                     | resource_status | updated_time         | parent_resource
                    |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+-------------------------
--------------------+
| CephStorageNodesPostDeployment              | ee3b8caa-6a5c-48e0-a3f6-5849bdeddb52          | OS::TripleO::CephStoragePostDeployment            | CREATE_FAILED   | 2015-08-05T15:34:18Z |
                    |
| CephStorageDeployment_Step1                 | af269a91-6213-4f4b-9a83-694155b1d84b          | OS::Heat::StructuredDeployments                   | CREATE_FAILED   | 2015-08-05T15:59:37Z | CephStorageNodesPostDepl
oyment              |
| 0                                           | d0114bb3-5bd9-487a-bb2d-67b3c4cc7336          | OS::Heat::StructuredDeployment                    | CREATE_FAILED   | 2015-08-05T15:59:38Z | CephStorageDeployment_St
ep1                 |
+---------------------------




[root@overcloud-cephstorage-0 ~]# journalctl -u os-collect-config|grep -i error
Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: d 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_size]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: + ceph-authtool /etc/ceph/ceph.client.admin.keyring --name client.admin --add-key AQDzLMJVAAAAABAAYgFxSJn0uFTEqet5IACsLw== --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: added entity client.admin auth auth(auid = 18446744073709551615 key=AQDzLMJVAAAAABAAYgFxSJn0uFTEqet5IACsLw== with 0 caps)\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pg_num]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/public_network]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + test -b /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + mkdir -p /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + ceph-disk prepare /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: executed successfully\u001b[0m\n\u001b[mNotice: Finished catalog run in 303.12 seconds\u001b[0m\n", "deploy_stderr": "\u001b[1;31mError: Command exceeded timeout\nWrapped exception:\nexecution expired\u001b[0m\n\u001b[1;31mError: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout\u001b[0m\n", "deploy_status_code": 6}
Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: [2015-08-05 12:05:09,453] (heat-config) [INFO] Error: Command exceeded timeout
Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout
Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: [2015-08-05 12:05:09,453] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-puppet/8295d161-635a-4edb-8d4a-ede3e76b073c.pp. [6]


Expected result:
The deployment shouldn't fail on ceph storage.
Comment 3 Alexander Chuzhoy 2015-08-05 13:02:30 EDT
Created attachment 1059571 [details]
heat-engine from the undercloud
Comment 4 Alexander Chuzhoy 2015-08-05 13:05:50 EDT
Created attachment 1059572 [details]
messages file from ceph and heat logs from the undercloud.
Comment 5 Mike Burns 2015-08-05 17:17:50 EDT
I think this is related to the CLI/template changes that jistr put in for bug 1247585.
Comment 6 Jiri Stransky 2015-08-06 04:20:58 EDT
@mburns yeah it could be.

@sasha what was the command line you used to deploy? Please try passing the environment file as described here:

https://bugzilla.redhat.com/show_bug.cgi?id=1247585#c6
Comment 7 Alexander Chuzhoy 2015-08-06 09:06:49 EDT
Here's the command I use (same as on the last puddle):
openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1  --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e /home/stack/network-environment.yaml --ntp-server [IP] --timeout 90

No yaml file for cinder.
Comment 8 Mike Burns 2015-08-06 09:11:35 EDT
@sasha -- can you try passing -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml as well and see if that works?
Comment 9 Alexander Chuzhoy 2015-08-06 09:22:38 EDT
Environment: openstack-tripleo-heat-templates-0.8.6-45.el7ost.noarch

The file /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml doesn't exist.
Comment 10 Mike Burns 2015-08-06 17:51:07 EDT
(In reply to Alexander Chuzhoy from comment #9)
> Environment: openstack-tripleo-heat-templates-0.8.6-45.el7ost.noarch
> 
> The file
> /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.
> yaml doesn't exist.

Note:  this was resolved in a conversation.  The fix requires 0.8.6-46, not -45.
Comment 11 Alexander Chuzhoy 2015-08-07 12:15:08 EDT
Was able to deploy the overcloud using this command:
openstack overcloud deploy --templates --control-scale 1  --compute-scale 1  --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e /home/stack/network-environment.yaml --ntp-server [IP] --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml


Using this THT build:
openstack-tripleo-heat-templates-0.8.6-46.el7ost.noarch
Comment 12 Mike Burns 2015-08-13 02:51:05 EDT
Based on comment 11, this is notabug
Comment 13 David Juran 2015-09-04 09:46:20 EDT
Re-opening this bug. As discussed on IRC, if a user selects to install a ceph-node, we should provide a reasonable default. Or at least point out that the template is needed. failing the deployment with a non-obvious error-message is not a good option
Comment 14 Jiri Stransky 2015-09-15 08:53:12 EDT
We already had a smart default, but it wasn't overridable, causing a number of storage configurations to be impossible (see bug 1247585). We had to remove the smart default in favor of configurability. Re-adding that smart default should be possible once we have parameter overridability on CLI (bug 1245737).
Comment 15 Mike Orazi 2015-11-04 12:20:26 EST
We are planning on providing this functionality via the param override functionality in https://bugzilla.redhat.com/show_bug.cgi?id=1245737 and we should track it there.  if this solution is insufficient, please feel free to reopen this bug so we can track it distinctly.

*** This bug has been marked as a duplicate of bug 1245737 ***
Comment 16 Rama 2015-11-04 12:27:04 EST
The following files in puppet/manifests was hardcoded for ceph installation to go through.
overcloud_cephstorage.pp
     23 
     24 Exec {
     25 timeout => 9000,
     26 }
     27 
     28 if str2bool(hiera('ceph_osd_selinux_permissive', true)) {
overcloud_controller.pp"
     33 
     34 Exec {
     35 timeout => 9000,
     36 }
     37 

overcloud_controller_pacemaker.pp
     37 
     38 Exec {
     39 timeout => 9000,
     40 }
     41 
     42 if hiera('step') >= 1 {

The timeout has been increased to 9000.

Note You need to log in before you can comment on or make changes to this bug.