Created attachment 1057339 [details] heat logs tar gz Description of problem: Trying to deploy HA+ceph vi UI and getting create_failed create_aborted. moreover, heat stack-list is empty and nova list is empty after the failure, although I could see VM up and running, and was able to access one of the controllers. Version-Release number of selected component (if applicable): openstack-tuskar-ui-extras-0.0.4-1.el7ost.noarch rhos-release-0.65-1.noarch python-rdomanager-oscplugin-0.0.8-43.el7ost.noarch How reproducible: 100% ( have reproduced it 3 times on the same env) Steps to Reproduce: 1.Using Undercloud UI create 4 flavors, upload images , associate flavors to role , and set node number for controller 3 , for compute 1 and1 for ceph, then deploy the Overcloud. 2.Wait until the deployment will successfully finish Actual results: create failed, create aborted 2015-07-29 10:19:51.170 21101 INFO heat.engine.resource [-] CREATE: StructuredDeployments "CephStorageCephDeployment" Stack "ov ercloud" [ec394940-95e1-495c-a3e9-8fed68d3e9cf] 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource Traceback (most recent call last): 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", lin e 500, in _action_recorder 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource yield 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", lin e 570, in _do_action 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource yield self.action_handler_task(action, args=handler_args) 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", li ne 296, in wrapper 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource step = next(subtask) 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", lin e 541, in action_handler_task 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource handler_data = handler(*args) 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/opensta ck/heat/resource_group.py", line 263, in handle_create 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource {}, self.stack.timeout_mins) 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_r esource.py", line 265, in create_with_template 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource self.raise_local_exception(ex) 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_r esource.py", line 286, in raise_local_exception 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource raise local_ex(message=message) 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource StackValidationFailed: Property error : 0: server Expecting to find us ername or userId in passwordCredentials - the server could not comply with the request since it is either malformed or otherwis e incorrect. The client is assumed to be in error. (HTTP 400) 2015-07-29 10:19:51.170 21101 TRACE heat.engine.resource Expected results: deployment succeed Additional info: logs attached
Created attachment 1057340 [details] other logs
Ceph. Doh! Ola, are you hitting this always, or only with deployments that include Ceph?
(In reply to Hugh Brock from comment #5) > Ceph. Doh! Ola, are you hitting this always, or only with deployments that > include Ceph? Always with Ceph :) for now doing a non-ceph one at the moment
ok, so without ceph it was successfully deployed
OK, we're missing some verkakte Ceph parameter. Probably something in Tuskar. I'm gonna reassign this to jdob and see if he has an idea.
Does the deployment with Ceph work using Tuskar from the CLI? In other words, without using the --templates flag.
My other question is for Ana: Is there a way to output and save the templates/environment generated by the UI so I can compare it?
I'm able to locally deploy a Ceph node using the Tuskar CLI (haven't tried UI yet), so I'm pretty sure the issue isn't in the templates or in Tuskar, but rather in something odd the GUI is doing.
I'm currently attempting to reprovision and duplicate error on same environment.
I tried to replicate the error on opavlenk's environment but I am unable to at this time. I consistently hit a stack creation timeout, but that's a different error than what has been reported.
I was trying to reproduce the issue today on the same host. Not sure if related, but I've created 3 more flavors in UI and have associated the roles to these 3 different flavors, so control flavor for controller role, compute flavor for compute role and ceph flavor for ceph role, all the rest were associated to a flavor 'Flavor-1cpu-x86_64-4096MB-40GB' which was created by Ryan earlier. Got create_failed but now with 400.... ./heat-engine.log:2015-08-02 09:16:59.635 21372 INFO heat.engine.stack [-] Stack CREATE FAILED (overcloud): Resource CREATE failed: ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: StackValidationFailed: Property error : 1: server Expecting to find username or userId in passwordCredentials - the server could not comply with the request since it is either malformed or otherwise incorrect. The client is assumed to be in error. (HTTP 400)" [root@instack heat]# grep TRACE ./*.log ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource Traceback (most recent call last): ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 500, in _action_recorder ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource yield ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 570, in _do_action ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource yield self.action_handler_task(action, args=handler_args) ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 296, in wrapper ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource step = next(subtask) ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 541, in action_handler_task ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource handler_data = handler(*args) ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/openstack/heat/resource_group.py", line 263, in handle_create ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource {}, self.stack.timeout_mins) ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 265, in create_with_template ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource self.raise_local_exception(ex) ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 286, in raise_local_exception ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource raise local_ex(message=message) ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource StackValidationFailed: Property error : 1: server Expecting to find username or userId in passwordCredentials - the server could not comply with the request since it is either malformed or otherwise incorrect. The client is assumed to be in error. (HTTP 400) ./heat-engine.log:2015-08-02 09:12:57.332 21372 TRACE heat.engine.resource ./heat-engine.log:2015-08-02 09:12:59.021 21372 TRACE heat.engine.resource Traceback (most recent call last): ./heat-engine.log:2015-08-02 09:12:59.021 21372 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 500, in _action_recorder ./heat-engine.log:2015-08-02 09:12:59.021 21372 TRACE heat.engine.resource yield ./heat-engine.log:2015-08-02 09:12:59.021 21372 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 570, in _do_action ./heat-engine.log:2015-08-02 09:12:59.021 21372 TRACE heat.engine.resource yield self.action_handler_task(action, args=handler_args) ./heat-engine.log:2015-08-02 09:12:59.021 21372 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 313, in wrapper ./heat-engine.log:2015-08-02 09:12:59.021 21372 TRACE heat.engine.resource step = next(subtask) ./heat-engine.log:2015-08-02 09:12:59.021 21372 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 544, in action_handler_task ./heat-engine.log:2015-08-02 09:12:59.021 21372 TRACE heat.engine.resource while not check(handler_data): ./heat-engine.log:2015-08-02 09:12:59.021 21372 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 289, in check_create_complete ./heat-engine.log:2015-08-02 09:12:59.021 21372 TRACE heat.engine.resource return self._check_status_complete(resource.Resource.CREATE) ./heat-engine.log:2015-08-02 09:12:59.021 21372 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 332, in _check_status_complete ./heat-engine.log:2015-08-02 09:12:59.021 21372 TRACE heat.engine.resource status_reason=nested.status_reason) ./heat-engine.log:2015-08-02 09:12:59.021 21372 TRACE heat.engine.resource ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: StackValidationFailed: Property error : 1: server Expecting to find username or userId in passwordCredentials - the server could not comply with the request since it is either malformed or otherwise incorrect. The client is assumed to be in error. (HTTP 400)" ./heat-engine.log:2015-08-02 09:12:59.021 21372 TRACE heat.engine.resource
have reproduce the issue just the moment on a different env with SSL in undercloud: Scenario: 1. deploy via UI with 1 controller, 1 compute 1 ceph 2. Undeploy 3. change snmp password 4. redeploy Results: 1. successfully deployed 2. successfully undeployed 3. password was changed to 'password' 4. redeploy failed with : TRACE heat.engine.resource 2015-08-04 13:13:04.550 30210 INFO heat.engine.stack [-] Stack CREATE FAILED (overcloud-ControllerNodesPostDeployment-6vuxxz4odcge) : Resource CREATE failed: StackValidationFailed: Property error : 0: server Expecting to find username or userId in passwordCredent ials - the server could not comply with the request since it is either malformed or otherwise incorrect. The client is assumed to b e in error. (HTTP 400) here is the trace I got in heat-engine.log 2015-08-04 13:13:04.447 30210 INFO heat.engine.resource [-] CREATE: TemplateResource "ControllerNodesPostDeployment" [3da6a294-3f97-473f-9e21-25e684052384] Stack "overcloud" [f09a6f33-47a1-48a7-9a6e-79afb55b0f27] 2015-08-04 13:13:04.447 30210 TRACE heat.engine.resource Traceback (most recent call last): 2015-08-04 13:13:04.447 30210 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 500, in _action_recorder 2015-08-04 13:13:04.447 30210 TRACE heat.engine.resource yield 2015-08-04 13:13:04.447 30210 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 570, in _do_action 2015-08-04 13:13:04.447 30210 TRACE heat.engine.resource yield self.action_handler_task(action, args=handler_args) 2015-08-04 13:13:04.447 30210 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 313, in wrapper 2015-08-04 13:13:04.447 30210 TRACE heat.engine.resource step = next(subtask) 2015-08-04 13:13:04.447 30210 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 544, in action_handler_task 2015-08-04 13:13:04.447 30210 TRACE heat.engine.resource while not check(handler_data): 2015-08-04 13:13:04.447 30210 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 289, in check_create_complete 2015-08-04 13:13:04.447 30210 TRACE heat.engine.resource return self._check_status_complete(resource.Resource.CREATE) 2015-08-04 13:13:04.447 30210 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 332, in _check_status_complete 2015-08-04 13:13:04.447 30210 TRACE heat.engine.resource status_reason=nested.status_reason) 2015-08-04 13:13:04.447 30210 TRACE heat.engine.resource ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: StackValidationFailed: Property error : 0: server Expecting to find username or userId in passwordCredentials - the server could not comply with the request since it is either malformed or otherwise incorrect. The client is assumed to be in error. (HTTP 400)" 2015-08-04 13:13:04.447 30210 TRACE heat.engine.resource 2015-08-04 13:13:04.550 30210 INFO heat.engine.stack [-] Stack CREATE FAILED (overcloud-ControllerNodesPostDeployment-6vuxxz4odcge): Resource CREATE failed: StackValidationFailed: Property error : 0: server Expecting to find username or userId in passwordCredentials - the server could not comply with the request since it is either malformed or otherwise incorrect. The client is assumed to be in error. (HTTP 400) | ComputeNodesPostDeployment | 8a6b092b-956b-4429-b2bc-8c85b7e1a14d | OS::TripleO::ComputePostDeployment | CREATE_FAILED | 2015-08-04T16:41:08Z | | ControllerNodesPostDeployment | 3da6a294-3f97-473f-9e21-25e684052384 | OS::TripleO::ControllerPostDeployment | CREATE_FAILED | 2015-08-04T16:41:08Z | heat resource-list overcloud -n 5 | grep -iv complete +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+----------------------------------------+ | resource_name | physical_resource_id | resource_type | resource_status | updated_time | parent_resource | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+----------------------------------------+ | ComputeNodesPostDeployment | 8a6b092b-956b-4429-b2bc-8c85b7e1a14d | OS::TripleO::ComputePostDeployment | CREATE_FAILED | 2015-08-04T16:41:08Z | | | ControllerNodesPostDeployment | 3da6a294-3f97-473f-9e21-25e684052384 | OS::TripleO::ControllerPostDeployment | CREATE_FAILED | 2015-08-04T16:41:08Z | | | ControllerOvercloudServicesDeployment_Step4 | | OS::Heat::StructuredDeployments | CREATE_FAILED | 2015-08-04T17:04:35Z | ControllerNodesPostDeployment | | ComputePuppetDeployment | e39f0443-ab86-4ad7-9fb2-730d20778bbb | OS::Heat::StructuredDeployments | CREATE_IN_PROGRESS | 2015-08-04T17:05:30Z | ComputeNodesPostDeployment | | 0 | e33c0344-1706-43d5-9761-f1ff1b59593f | OS::Heat::StructuredDeployment | CREATE_IN_PROGRESS | 2015-08-04T17:06:01Z | ComputePuppetDeployment | +---------------------------------------------+-----------------------------------------------+---------------------------------------------------+--------------------+----------------------+----------------------------------------+ env: lynx17.qa.lab.tlv.redhat.com | qum10net its Virt , so the instack vm is : 192.168.122.88
3 different flavors were created via UI with identical params but different names.
I had a go at reproducing this today on request from jdob. I am going on the reproducer at comment 15 (am using one flavor, the default 'baremetal'). I confirm that I could deploy the 1/1/1 compute/control/ceph with the CLI/Tuskar, undeploy, change snmp pass and deploy again OK (even with current ceph deploy considerations as described at https://bugzilla.redhat.com/show_bug.cgi?id=1251533 and https://bugzilla.redhat.com/show_bug.cgi?id=1247585#c7). I tried to recreate the UI repro but was unable to as I hit https://bugzilla.redhat.com/show_bug.cgi?id=1234745 (deploy once OK, undeploy, change snmp pass in UI, deploy again ---> BZ 1234745). Testing details and sequencing below if interested. I think it would be good if someone from the UI has a look here too, thanks. TESTING: 1. Confirm deploy/undeploy/snmp_password/deploy OK with Tuskar/CLI: # Set the required storage-environment: openstack management plan set 005e0705-02a3-4628-abc4-817d02795157 -P Controller-1::CinderEnableIscsiBackend=false -P Controller-1::CinderEnableRbdBackend=true -P Controller-1::GlanceBackend=rbd -P Compute-1::NovaEnableRbdBackend=true # Deploy: openstack overcloud deploy --plan overcloud --control-scale 1 --compute-scale 1 --ceph-storage-scale 1 # Undeploy: heat stack-delete overcloud # Set SNMP password openstack management plan set 005e0705-02a3-4628-abc4-817d02795157 -P Compute-1::SnmpdReadonlyUserPassword="foo" -P Controller-1::SnmpdReadonlyUserPassword="foo" -P Swift-Storage-1::SnmpdReadonlyUserPassword="foo" -P Cinder-Storage-1::SnmpdReadonlyUserPassword="foo" # Deploy again: openstack overcloud deploy --plan overcloud --control-scale 1 --compute-scale 1 --ceph-storage-scale 1 OK 2. Try deploy with UI - could deploy once fine (1 ceph/compute/control). Undeploy also fine. Set snmp password to "foo" and try to deploy again. I get "Error: Unable to deploy overcloud. Reason: Value must be valid JSON: Expecting property name enclosed in double quotes: line 1 column 2 (char 1" which looks like https://bugzilla.redhat.com/show_bug.cgi?id=1234745 - if someone else can confirm them that is a blocker bug for this
Ryan, could you give it another go at reproducing this please.
Unable to replicate the bug. attempting to find a root cause by debugging the failed deployment.
deployment fails due to ceph issues. [stack@instack ~]$ heat deployment-show 892ac955-d585-4b3b-9ffc-6f99c40daf27 { "status": "FAILED", "server_id": "1f65750d-2de4-4582-9bae-70aaece85f96", "config_id": "9ba58305-1206-46f4-b518-1f0a843d2481", "output_values": { "deploy_stdout": "\u001b[mNotice: Compiled catalog for overcloud-cephstorage-0.localdomain in environment production in 1.28 seconds\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Exec[set selinux to permissive]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pgp_num]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/File[/etc/ceph/ceph.client.openstack.keyring]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_min_size]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/auth_service_required]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/fsid]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/cluster_network]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/auth_supported]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/auth_cluster_required]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Exec[set selinux to permissive on boot]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/File[/etc/ceph/ceph.client.admin.keyring]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/mon_host]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: + ceph-authtool /etc/ceph/ceph.client.openstack.keyring --name client.openstack --add-key '' --cap mon 'allow r' --cap osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rwx pool=images'\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.bootstrap-osd]/File[/var/lib/ceph/bootstrap-osd/ceph.keyring]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.bootstrap-osd]/Exec[ceph-key-client.bootstrap-osd]/returns: + ceph-authtool /var/lib/ceph/bootstrap-osd/ceph.keyring --name client.bootstrap-osd --add-key '' --cap mon 'allow profile bootstrap-osd'\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.bootstrap-osd]/Exec[ceph-key-client.bootstrap-osd]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Sysctl::Value[net.ipv4.tcp_keepalive_intvl]/Sysctl[net.ipv4.tcp_keepalive_intvl]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Sysctl::Value[net.ipv4.tcp_keepalive_intvl]/Exec[exec_sysctl_net.ipv4.tcp_keepalive_intvl]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/auth_client_required]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Sysctl::Value[net.ipv4.tcp_keepalive_time]/Sysctl[net.ipv4.tcp_keepalive_time]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Sysctl::Value[net.ipv4.tcp_keepalive_time]/Exec[exec_sysctl_net.ipv4.tcp_keepalive_time]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Sysctl::Value[net.ipv4.tcp_keepalive_probes]/Sysctl[net.ipv4.tcp_keepalive_probes]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Sysctl::Value[net.ipv4.tcp_keepalive_probes]/Exec[exec_sysctl_net.ipv4.tcp_keepalive_probes]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_size]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: + ceph-authtool /etc/ceph/ceph.client.admin.keyring --name client.admin --add-key '' --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pg_num]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph/Ceph_config[global/public_network]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + test -b /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + mkdir -p /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + ceph-disk prepare /srv/data\u001b[0m\n\u001b[mNotice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: executed successfully\u001b[0m\n\u001b[mNotice: Finished catalog run in 302.14 seconds\u001b[0m\n", "deploy_stderr": "\u001b[1;31mError: Command exceeded timeout\nWrapped exception:\nexecution expired\u001b[0m\n\u001b[1;31mError: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout\u001b[0m\n", "deploy_status_code": 6 }, "creation_time": "2015-08-20T14:54:07Z", "updated_time": "2015-08-20T14:59:36Z", "input_values": {}, "action": "CREATE", "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 6", "id": "892ac955-d585-4b3b-9ffc-6f99c40daf27" }
Dan, can you help us identify the issue here?
Could we verify that this is broken when using the CLI with --templates as well?
Cleaned up the stdout and stderr in note21 so they make *any sense whatever* to a human: Notice: Compiled catalog for overcloud-cephstorage-0.localdomain in environment production in 1.28 seconds Notice: /Stage[main]/Main/Exec[set selinux to permissive]/returns: executed successfully Notice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pgp_num]/ensure: created Notice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/File[/etc/ceph/ceph.client.openstack.keyring]/ensure: created Notice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_min_size]/ensure: created Notice: /Stage[main]/Ceph/Ceph_config[global/auth_service_required]/ensure: created Notice: /Stage[main]/Ceph/Ceph_config[global/fsid]/ensure: created Notice: /Stage[main]/Ceph/Ceph_config[global/cluster_network]/ensure: created Notice: /Stage[main]/Ceph/Ceph_config[global/auth_supported]/ensure: created Notice: /Stage[main]/Ceph/Ceph_config[global/auth_cluster_required]/ensure: created Notice: /Stage[main]/Main/Exec[set selinux to permissive on boot]/returns: executed successfully Notice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/File[/etc/ceph/ceph.client.admin.keyring]/ensure: created Notice: /Stage[main]/Ceph/Ceph_config[global/mon_host]/ensure: created Notice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: + ceph-authtool /etc/ceph/ceph.client.openstack.keyring --name client.openstack --add-key '' --cap mon 'allow r' --cap osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rwx pool=images' Notice: /Stage[main]/Ceph::Keys/Ceph::Key[client.openstack]/Exec[ceph-key-client.openstack]/returns: executed successfully Notice: /Stage[main]/Ceph::Keys/Ceph::Key[client.bootstrap-osd]/File[/var/lib/ceph/bootstrap-osd/ceph.keyring]/ensure: created Notice: /Stage[main]/Ceph::Keys/Ceph::Key[client.bootstrap-osd]/Exec[ceph-key-client.bootstrap-osd]/returns: + ceph-authtool /var/lib/ceph/bootstrap-osd/ceph.keyring --name client.bootstrap-osd --add-key '' --cap mon 'allow profile bootstrap-osd' Notice: /Stage[main]/Ceph::Keys/Ceph::Key[client.bootstrap-osd]/Exec[ceph-key-client.bootstrap-osd]/returns: executed successfully Notice: /Stage[main]/Main/Sysctl::Value[net.ipv4.tcp_keepalive_intvl]/Sysctl[net.ipv4.tcp_keepalive_intvl]/ensure: created Notice: /Stage[main]/Main/Sysctl::Value[net.ipv4.tcp_keepalive_intvl]/Exec[exec_sysctl_net.ipv4.tcp_keepalive_intvl]: Triggered 'refresh' from 1 events Notice: /Stage[main]/Ceph/Ceph_config[global/auth_client_required]/ensure: created Notice: /Stage[main]/Main/Sysctl::Value[net.ipv4.tcp_keepalive_time]/Sysctl[net.ipv4.tcp_keepalive_time]/ensure: created Notice: /Stage[main]/Main/Sysctl::Value[net.ipv4.tcp_keepalive_time]/Exec[exec_sysctl_net.ipv4.tcp_keepalive_time]: Triggered 'refresh' from 1 events Notice: /Stage[main]/Main/Sysctl::Value[net.ipv4.tcp_keepalive_probes]/Sysctl[net.ipv4.tcp_keepalive_probes]/ensure: created Notice: /Stage[main]/Main/Sysctl::Value[net.ipv4.tcp_keepalive_probes]/Exec[exec_sysctl_net.ipv4.tcp_keepalive_probes]: Triggered 'refresh' from 1 events Notice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_size]/ensure: created Notice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: + ceph-authtool /etc/ceph/ceph.client.admin.keyring --name client.admin --add-key '' --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *' Notice: /Stage[main]/Ceph::Keys/Ceph::Key[client.admin]/Exec[ceph-key-client.admin]/returns: executed successfully Notice: /Stage[main]/Ceph/Ceph_config[global/osd_pool_default_pg_num]/ensure: created Notice: /Stage[main]/Ceph/Ceph_config[global/public_network]/ensure: created Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + test -b /srv/data Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + mkdir -p /srv/data Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: + ceph-disk prepare /srv/data Notice: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-prepare-/srv/data]/returns: executed successfully Notice: Finished catalog run in 302.14 seconds stderr: Error: Command exceeded timeout Wrapped exception: execution expired Error: /Stage[main]/Ceph::Osds/Ceph::Osd[/srv/data]/Exec[ceph-osd-activate-/srv/data]/returns: change from notrun to 0 failed: Command exceeded timeout
Attempting to deploy via cli using tuskar results in a stack timeout.
Succeed to reproduce with 1 compute and 3 controllers via UI. Got create_aborted in heat-api.log Traceback in heat-api.log: 2015-08-24 10:48:19.656 30279 INFO eventlet.wsgi.server [req-5b138c43-fd14-45b9-b49b-0d6cfcd9d206 - service] Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 477, in handle_one_response write(b''.join(towrite)) File "/usr/lib/python2.7/site-packages/eventlet/wsgi.py", line 426, in write _writelines(towrite) File "/usr/lib64/python2.7/socket.py", line 334, in writelines self.flush() File "/usr/lib64/python2.7/socket.py", line 303, in flush self._sock.sendall(view[write_offset:write_offset+buffer_size]) File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 376, in sendall tail = self.send(data, flags) File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 359, in send total_sent += fd.send(data[total_sent:], flags) error: [Errno 32] Broken pipe 2015-08-24 10:48:19.656 30279 INFO eventlet.wsgi.server [req-5b138c43-fd14-45b9-b49b-0d6cfcd9d206 - service] 192.0.2.1 - - [24/Aug/2015 10:48:19] "GET /v1/1a05e9ccca154e56bb9a2c594126ba1b/stacks/overcloud/dfca09d2-9e9d-47bb-ad2e-8c9ae70113c5/resources?nested_depth=5 HTTP/1.1" 200 0 18.888437 Trace in heat-engine.log: 2015-08-24 09:35:38.774 30244 INFO heat.engine.resource [-] CREATE: TemplateResource "ControllerNodesPostDeployment" [38550c75-7cba-48e1-a9ef-6f90bb06a45d] Stack "overcloud" [dfca09d2-9e9d-47bb-ad2e-8c9ae70113c5] 2015-08-24 09:35:38.774 30244 TRACE heat.engine.resource Traceback (most recent call last): 2015-08-24 09:35:38.774 30244 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 500, in _action_recorder 2015-08-24 09:35:38.774 30244 TRACE heat.engine.resource yield 2015-08-24 09:35:38.774 30244 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 570, in _do_action 2015-08-24 09:35:38.774 30244 TRACE heat.engine.resource yield self.action_handler_task(action, args=handler_args) 2015-08-24 09:35:38.774 30244 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/scheduler.py", line 313, in wrapper 2015-08-24 09:35:38.774 30244 TRACE heat.engine.resource step = next(subtask) 2015-08-24 09:35:38.774 30244 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resource.py", line 544, in action_handler_task 2015-08-24 09:35:38.774 30244 TRACE heat.engine.resource while not check(handler_data): 2015-08-24 09:35:38.774 30244 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 281, in check_create_complete 2015-08-24 09:35:38.774 30244 TRACE heat.engine.resource return self._check_status_complete(resource.Resource.CREATE) 2015-08-24 09:35:38.774 30244 TRACE heat.engine.resource File "/usr/lib/python2.7/site-packages/heat/engine/resources/stack_resource.py", line 324, in _check_status_complete 2015-08-24 09:35:38.774 30244 TRACE heat.engine.resource status_reason=nested.status_reason) 2015-08-24 09:35:38.774 30244 TRACE heat.engine.resource ResourceUnknownStatus: Resource failed - Unknown status FAILED due to "Resource CREATE failed: unicode: Property error : 0: server Expecting to find username or userId in passwordCredentials - the server could not comply with the request since it is either malformed or otherwise incorrect. The client is assumed to be in error. (HTTP 400)" 2015-08-24 09:35:38.774 30244 TRACE heat.engine.resource
Successfully deployed via UI today: A) 1 control, 1 compute B) 3 control, 1 compute Still attempting to find the exact case where I can reproduce this bug.
Unable to reproduce this bug.
As Ryan wrote in comments 27 and 28, we are getting successful deployments (including HA deployments) using the configurations listed (c#27). Ceph based deployments are currently broken on tuskar and there's a different bug to address that: https://bugzilla.redhat.com/show_bug.cgi?id=1253628.