Bug 1250654

Summary:

rhel-osp-director: overcloud deployment fails on " CephStorageDeployment_Step1" , Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout.

Product:

Red Hat OpenStack

Reporter:

Alexander Chuzhoy <sasha>

Component:

rhosp-director

Assignee:

Jiri Stransky <jstransk>

Status:

CLOSED DUPLICATE

QA Contact:

yeylon <yeylon>

Severity:

high

Docs Contact:

Priority:

high

Version:

unspecified

CC:

djuran, jdonohue, jstransk, mburns, morazi, rhel-osp-director-maint, rnishtal, sasha, srevivo

Target Milestone:

Keywords:

Reopened, ZStream

Target Release:

7.0 (Kilo)

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-11-04 17:20:26 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1245737

Bug Blocks:

1191185, 1243520

Attachments:

Description	Flags
heat-engine from the undercloud	none
messages file from ceph and heat logs from the undercloud.	none

Description Alexander Chuzhoy 2015-08-05 16:53:37 UTC

rhel-osp-director: overcloud deployment fails on " CephStorageDeployment_Step1" , Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout.


Environment:
ceph-osd-0.94.1-13.el7cp.x86_64
ceph-0.94.1-13.el7cp.x86_64
ceph-common-0.94.1-13.el7cp.x86_64
ceph-mon-0.94.1-13.el7cp.x86_64
instack-undercloud-2.1.2-22.el7ost.noarch

Steps to reproduce:
1. Deploy the undercloud.
2. Attempt to deploy the overcloud with 1 controller, 1 compute and 1 ceph storage.

Result:
The deployment fails.
--------------------+
| resource_name                               | physical_resource_id                          | resource_type                                     | resource_status | updated_time         | parent_resource
                    |
+---------------------------------------------+-----------------------------------------------+---------------------------------------------------+-----------------+----------------------+-------------------------
--------------------+
| CephStorageNodesPostDeployment              | ee3b8caa-6a5c-48e0-a3f6-5849bdeddb52          | OS::TripleO::CephStoragePostDeployment            | CREATE_FAILED   | 2015-08-05T15:34:18Z |
                    |
| CephStorageDeployment_Step1                 | af269a91-6213-4f4b-9a83-694155b1d84b          | OS::Heat::StructuredDeployments                   | CREATE_FAILED   | 2015-08-05T15:59:37Z | CephStorageNodesPostDepl
oyment              |
| 0                                           | d0114bb3-5bd9-487a-bb2d-67b3c4cc7336          | OS::Heat::StructuredDeployment                    | CREATE_FAILED   | 2015-08-05T15:59:38Z | CephStorageDeployment_St
ep1                 |
+---------------------------




[root@overcloud-cephstorage-0 ~]# journalctl -u os-collect-config|grep -i error
Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: d 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_size]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: + ceph-authtool /etc/ceph/ceph.client.admin.keyring --name client.admin --add-key AQDzLMJVAAAAABAAYgFxSJn0uFTEqet5IACsLw== --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: added entity client.admin auth auth(auid = 18446744073709551615 key=AQDzLMJVAAAAABAAYgFxSJn0uFTEqet5IACsLw== with 0 caps)\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pg_num]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/public_network]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + test -b /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + mkdir -p /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + ceph-disk prepare /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: executed successfully\u001b[0m\n\u001b[mNotice: Finished catalog run in 303.12 seconds\u001b[0m\n", "deploy_stderr": "\u001b[1;31mError: Command exceeded timeout\nWrapped exception:\nexecution expired\u001b[0m\n\u001b[1;31mError: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout\u001b[0m\n", "deploy_status_code": 6}
Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: [2015-08-05 12:05:09,453] (heat-config) [INFO] Error: Command exceeded timeout
Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout
Aug 05 12:05:09 overcloud-cephstorage-0.localdomain os-collect-config[4840]: [2015-08-05 12:05:09,453] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-puppet/8295d161-635a-4edb-8d4a-ede3e76b073c.pp. [6]


Expected result:
The deployment shouldn't fail on ceph storage.

Comment 3 Alexander Chuzhoy 2015-08-05 17:02:30 UTC

Created attachment 1059571 [details]
heat-engine from the undercloud

Comment 4 Alexander Chuzhoy 2015-08-05 17:05:50 UTC

Created attachment 1059572 [details]
messages file from ceph and heat logs from the undercloud.

Comment 5 Mike Burns 2015-08-05 21:17:50 UTC

I think this is related to the CLI/template changes that jistr put in for bug 1247585.

Comment 6 Jiri Stransky 2015-08-06 08:20:58 UTC

@mburns yeah it could be.

@sasha what was the command line you used to deploy? Please try passing the environment file as described here:

https://bugzilla.redhat.com/show_bug.cgi?id=1247585#c6

Comment 7 Alexander Chuzhoy 2015-08-06 13:06:49 UTC

Here's the command I use (same as on the last puddle):
openstack overcloud deploy --plan overcloud --control-scale 1  --compute-scale 1  --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e /home/stack/network-environment.yaml --ntp-server [IP] --timeout 90

No yaml file for cinder.

Comment 8 Mike Burns 2015-08-06 13:11:35 UTC

@sasha -- can you try passing -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml as well and see if that works?

Comment 9 Alexander Chuzhoy 2015-08-06 13:22:38 UTC

Environment: openstack-tripleo-heat-templates-0.8.6-45.el7ost.noarch

The file /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml doesn't exist.

Comment 10 Mike Burns 2015-08-06 21:51:07 UTC

(In reply to Alexander Chuzhoy from comment #9)
> Environment: openstack-tripleo-heat-templates-0.8.6-45.el7ost.noarch
> 
> The file
> /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.
> yaml doesn't exist.

Note:  this was resolved in a conversation.  The fix requires 0.8.6-46, not -45.

Comment 11 Alexander Chuzhoy 2015-08-07 16:15:08 UTC

Was able to deploy the overcloud using this command:
openstack overcloud deploy --templates --control-scale 1  --compute-scale 1  --ceph-storage-scale 1 --block-storage-scale 0 --swift-storage-scale 0 -e /home/stack/network-environment.yaml --ntp-server [IP] --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml


Using this THT build:
openstack-tripleo-heat-templates-0.8.6-46.el7ost.noarch

Comment 12 Mike Burns 2015-08-13 06:51:05 UTC

Based on comment 11, this is notabug

Comment 13 David Juran 2015-09-04 13:46:20 UTC

Re-opening this bug. As discussed on IRC, if a user selects to install a ceph-node, we should provide a reasonable default. Or at least point out that the template is needed. failing the deployment with a non-obvious error-message is not a good option

Comment 14 Jiri Stransky 2015-09-15 12:53:12 UTC

We already had a smart default, but it wasn't overridable, causing a number of storage configurations to be impossible (see bug 1247585). We had to remove the smart default in favor of configurability. Re-adding that smart default should be possible once we have parameter overridability on CLI (bug 1245737).

Comment 15 Mike Orazi 2015-11-04 17:20:26 UTC

We are planning on providing this functionality via the param override functionality in https://bugzilla.redhat.com/show_bug.cgi?id=1245737 and we should track it there.  if this solution is insufficient, please feel free to reopen this bug so we can track it distinctly.

*** This bug has been marked as a duplicate of bug 1245737 ***

Comment 16 Rama 2015-11-04 17:27:04 UTC

The following files in puppet/manifests was hardcoded for ceph installation to go through.
overcloud_cephstorage.pp
     23 
     24 Exec {
     25 timeout => 9000,
     26 }
     27 
     28 if str2bool(hiera('ceph_osd_selinux_permissive', true)) {
overcloud_controller.pp"
     33 
     34 Exec {
     35 timeout => 9000,
     36 }
     37 

overcloud_controller_pacemaker.pp
     37 
     38 Exec {
     39 timeout => 9000,
     40 }
     41 
     42 if hiera('step') >= 1 {

The timeout has been increased to 9000.