Bug 1457495 - Unable to deploy 60 compute nodes
Summary: Unable to deploy 60 compute nodes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-openstacklib
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: z9
: 10.0 (Newton)
Assignee: Sergii Golovatiuk
QA Contact: nlevinki
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-05-31 19:49 UTC by bigswitch
Modified: 2021-02-18 12:55 UTC (History)
17 users (show)

Fixed In Version: puppet-openstacklib-9.5.0-3.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-17 16:54:10 UTC
Target Upstream Version:
Embargoed:
sgolovat: needinfo-
sgolovat: needinfo-


Attachments (Terms of Use)
os-collect-config and os-apply-config log (1.13 MB, text/plain)
2017-06-02 10:40 UTC, bigswitch
no flags Details
messages from compute-0 and compute-3 (853.38 KB, application/zip)
2017-06-06 17:47 UTC, bigswitch
no flags Details
added new sos report from the compute node (10.39 MB, application/x-xz)
2017-09-16 00:02 UTC, bigswitch
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:2670 0 None None None 2018-09-17 16:55:29 UTC

Internal Links: 1640134

Description bigswitch 2017-05-31 19:49:15 UTC
Description of problem:
Unable to deploy 60 compute nodes

Version-Release number of selected component (if applicable):
RHOSP 10

How reproducible:
100%

Steps to Reproduce:
1. deploy RHOSP 10 with 10 compute nodes at a time, when attempting to deploy 60 compute nodes it will timeout and fail the deployment
The initial deployment is 3 controller and 3 ceph storage nodes, than 10 compute nodes are added in each deployment.


Actual results:
Deployment failed

Expected results:
Should complete

Additional info:
From undercloud log:
2017-05-31 13:27:47Z [overcloud-ComputeAllNodesDeployment-ftrxasu4uksd.25]: SIGNAL_COMPLETE  Unknown
2017-05-31 13:27:47Z [overcloud-ComputeAllNodesDeployment-ftrxasu4uksd.31]: SIGNAL_COMPLETE  Unknown
2017-05-31 13:27:47Z [overcloud-ComputeAllNodesDeployment-ftrxasu4uksd.7]: SIGNAL_COMPLETE  Unknown
2017-05-31 13:27:47Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.12]: SIGNAL_IN_PROGRESS  Signal: deployment a253ce71-d446-43dd-92a5-5183a63e8c39 succeeded
2017-05-31 13:28:40Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.24]: SIGNAL_IN_PROGRESS  Signal: deployment 010945f7-a72c-471d-b308-77ade333cf14 succeeded
2017-05-31 13:28:40Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.1]: UPDATE_COMPLETE  state changed
2017-05-31 13:28:40Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.10]: SIGNAL_IN_PROGRESS  Signal: deployment b9aefe8f-30f6-452e-b543-0bcd17021abe succeeded
2017-05-31 13:28:47Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.57]: SIGNAL_IN_PROGRESS  Signal: deployment 3261f92f-e46b-411d-89c7-280a6b0c69ac succeeded
2017-05-31 13:28:47Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.17]: SIGNAL_IN_PROGRESS  Signal: deployment 87faa37c-980f-42cf-bfb9-ea6104652c18 succeeded
2017-05-31 13:28:48Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.38]: SIGNAL_IN_PROGRESS  Signal: deployment db1543f2-bafc-4c7c-a800-d79f112ab2be succeeded
2017-05-31 13:29:03Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.20]: SIGNAL_IN_PROGRESS  Signal: deployment 8b1f1ff7-28e3-4585-884b-ae5da1edd3f3 succeeded
2017-05-31 13:29:03Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.38]: UPDATE_COMPLETE  state changed
2017-05-31 13:29:03Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.57]: CREATE_COMPLETE  state changed
2017-05-31 13:29:31Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.6]: SIGNAL_IN_PROGRESS  Signal: deployment e2195d00-c438-4919-bb3a-ffa775403eac succeeded
2017-05-31 13:29:31Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.39]: SIGNAL_IN_PROGRESS  Signal: deployment 49e2e469-a059-4151-81f4-d8beeb78b58c succeeded
2017-05-31 13:29:51Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.47]: SIGNAL_IN_PROGRESS  Signal: deployment 391bb613-9576-44df-8c1c-32a942ce6eb7 succeeded
2017-05-31 13:29:51Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.35]: SIGNAL_IN_PROGRESS  Signal: deployment ee886b23-a1f1-4768-a941-1a7e0e3a37e4 succeeded
2017-05-31 13:29:51Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.34]: SIGNAL_IN_PROGRESS  Signal: deployment 80b4c880-7c80-45c6-ad32-e221689f0768 succeeded
2017-05-31 13:29:51Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.45]: SIGNAL_IN_PROGRESS  Signal: deployment dcdb5402-b8cf-4ab6-8b87-ec2d339f8f39 succeeded
2017-05-31 13:29:51Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.26]: SIGNAL_IN_PROGRESS  Signal: deployment ae4354c0-bbc3-458a-ab2b-281b9641c1b9 succeeded
2017-05-31 13:30:44Z [overcloud-ComputeAllNodesDeployment-ftrxasu4uksd.43]: SIGNAL_COMPLETE  Unknown
2017-05-31 13:30:44Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.27]: SIGNAL_IN_PROGRESS  Signal: deployment b85b09f3-eef7-4e9f-a7b6-2da86c69d76c succeeded
2017-05-31 13:30:44Z [overcloud-ComputeAllNodesDeployment-ftrxasu4uksd.21]: SIGNAL_COMPLETE  Unknown
2017-05-31 13:30:44Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.9]: SIGNAL_IN_PROGRESS  Signal: deployment f2f8d9c8-861d-4781-b2f6-7d41f45b9648 succeeded
2017-05-31 13:30:44Z [overcloud-ComputeAllNodesDeployment-ftrxasu4uksd.4]: SIGNAL_COMPLETE  Unknown
2017-05-31 13:31:09Z [overcloud-ComputeAllNodesDeployment-ftrxasu4uksd.41]: SIGNAL_COMPLETE  Unknown
2017-05-31 13:31:09Z [overcloud-ComputeAllNodesDeployment-ftrxasu4uksd.54]: SIGNAL_COMPLETE  Unknown
2017-05-31 13:31:13Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.44]: SIGNAL_IN_PROGRESS  Signal: deployment 74396c90-9d8c-48c6-b2af-978e69603dc7 succeeded
2017-05-31 13:31:19Z [overcloud-ComputeAllNodesDeployment-ftrxasu4uksd.50]: SIGNAL_COMPLETE  Unknown
2017-05-31 13:31:34Z [overcloud-ComputeAllNodesDeployment-ftrxasu4uksd.0]: SIGNAL_COMPLETE  Unknown
2017-05-31 13:31:34Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.36]: SIGNAL_IN_PROGRESS  Signal: deployment 79434e05-4c44-4160-9766-9969a105f54f succeeded
2017-05-31 13:31:38Z [overcloud-ComputeAllNodesDeployment-ftrxasu4uksd.30]: SIGNAL_COMPLETE  Unknown
2017-05-31 13:31:38Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.20]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:38Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.9]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:38Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.6]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:38Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.44]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:38Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.35]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:39Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.22]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:39Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.10]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:39Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.26]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:39Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.32]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:39Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.47]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:39Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.24]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:40Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.34]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:40Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.12]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:40Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.39]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:40Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.27]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:40Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.36]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:40Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.17]: UPDATE_COMPLETE  state changed
2017-05-31 13:31:40Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.45]: UPDATE_COMPLETE  state changed
2017-05-31 13:33:16Z [overcloud-ComputeAllNodesDeployment-ftrxasu4uksd.59]: SIGNAL_COMPLETE  Unknown
2017-05-31 13:33:22Z [overcloud-ComputeAllNodesDeployment-ftrxasu4uksd.51]: SIGNAL_COMPLETE  Unknown
2017-05-31 13:33:57Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.42]: SIGNAL_IN_PROGRESS  Signal: deployment 11d81794-9c4e-4281-bf02-aa5905c30bed succeeded
2017-05-31 13:33:58Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.42]: UPDATE_COMPLETE  state changed
2017-05-31 13:35:01Z [overcloud-ComputeHostsDeployment-wcfrwc3upvp4.8]: SIGNAL_COMPLETE  Unknown
2017-05-31 13:35:10Z [AllNodesDeploySteps]: UPDATE_FAILED  UPDATE aborted
2017-05-31 13:35:10Z [overcloud]: UPDATE_FAILED  Timed out
2017-05-31 13:35:11Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg.ComputeDeployment_Step2]: UPDATE_FAILED  UPDATE aborted
2017-05-31 13:35:11Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg]: UPDATE_FAILED  Operation cancelled
2017-05-31 13:35:12Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.2]: UPDATE_FAILED  UPDATE aborted
2017-05-31 13:35:12Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.28]: UPDATE_FAILED  UPDATE aborted
2017-05-31 13:35:12Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.31]: UPDATE_FAILED  UPDATE aborted
2017-05-31 13:35:12Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.18]: UPDATE_FAILED  UPDATE aborted
2017-05-31 13:35:12Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.15]: UPDATE_FAILED  UPDATE aborted
2017-05-31 13:35:12Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.46]: UPDATE_FAILED  UPDATE aborted
2017-05-31 13:35:12Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.14]: UPDATE_FAILED  UPDATE aborted
2017-05-31 13:35:12Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.48]: UPDATE_FAILED  UPDATE aborted
2017-05-31 13:35:13Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.11]: UPDATE_FAILED  UPDATE aborted
2017-05-31 13:35:13Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.19]: UPDATE_FAILED  UPDATE aborted
2017-05-31 13:35:13Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.23]: UPDATE_FAILED  UPDATE aborted
2017-05-31 13:35:13Z [overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc.5]: UPDATE_FAILED  UPDATE aborted

 Stack overcloud UPDATE_FAILED

Heat Stack update failed.

real    242m13.612s
user    0m5.377s
sys     0m0.397s


[stack@undercloud ~]$ heat resource-list overcloud -n5 | grep -i fail
WARNING (shell) "heat resource-list" is deprecated, please use "openstack stack resource list" instead
| AllNodesDeploySteps                           | 0f4be635-c0e6-40fe-8dd4-60aed0d9c48d                                            | OS::TripleO::PostDeploySteps                                                                                        | UPDATE_FAILED   | 2017-05-31T10:03:31Z | overcloud                                                                                                                                                                |
| ComputeDeployment_Step2                       | c92fa8be-1efc-4870-b361-739c796bf5d8                                            | OS::Heat::StructuredDeploymentGroup                                                                                 | UPDATE_FAILED   | 2017-05-31T12:27:51Z | overcloud-AllNodesDeploySteps-rlsgvaqzmbxg                                                                                                                               |
| 19                                            | ee5223ec-4faf-455c-b13c-7a9a8251e25b                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2017-05-31T12:55:17Z | overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc                                                                                          |
| 2                                             | 83ef4eff-2acc-40c7-a751-3884bd9dcb6b                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2017-05-31T12:55:18Z | overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc                                                                                          |
| 31                                            | 116d9aa3-f00d-4fb2-be6c-1f74ec7db64d                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2017-05-31T12:56:37Z | overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc                                                                                          |
| 46                                            | de4f17c5-74b8-4bb3-b285-9e3b578d4a64                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2017-05-31T12:56:37Z | overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc                                                                                          |
| 28                                            | f9ee643c-9407-4b21-a2fc-a08f0217be8a                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2017-05-31T12:56:39Z | overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc                                                                                          |
| 23                                            | c1824911-6080-4079-816c-00c1bb950715                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2017-05-31T12:56:40Z | overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc                                                                                          |
| 18                                            | 2ddb2a4a-8557-485d-9ac6-de219c50c6da                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2017-05-31T12:56:41Z | overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc                                                                                          |
| 15                                            | 739f4ec3-bf08-464a-a787-2434619ae9a6                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2017-05-31T12:56:44Z | overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc                                                                                          |
| 5                                             | 77cdc7b7-26b2-4662-8133-7034c0d4a93c                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2017-05-31T12:56:44Z | overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc                                                                                          |
| 14                                            | d1f9ad50-c8a0-4a0f-bbfd-2f43aadf0be7                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2017-05-31T12:56:46Z | overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc                                                                                          |
| 48                                            | 13a1ce11-19d0-4bb2-807c-ef6f9bc02dfc                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2017-05-31T12:56:47Z | overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc                                                                                          |
| 11                                            | 370adbc8-0a1d-4b1e-af45-3c92821246cb                                            | OS::Heat::StructuredDeployment                                                                                      | UPDATE_FAILED   | 2017-05-31T12:56:48Z | overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc                                                                                          |
| 56                                            | 923e515a-62b3-404a-9020-4206ad4bac0a                                            | OS::Heat::StructuredDeployment                                                                                      | CREATE_FAILED   | 2017-05-31T12:56:49Z | overcloud-AllNodesDeploySteps-rlsgvaqzmbxg-ComputeDeployment_Step2-rg5l333yhpbc                                                                                          |
[stack@undercloud ~]$

Comment 1 bigswitch 2017-05-31 20:26:56 UTC
[stack@undercloud ~]$ heat deployment-show 923e515a-62b3-404a-9020-4206ad4bac0a
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "COMPLETE",
  "server_id": "5b436053-0991-46f2-b9b5-70b3ba64eeac",
  "config_id": "aaa7de61-91dc-4a91-a0a6-3f625067983b",
  "output_values": {
    "deploy_stdout": "Matching apachectl 'Server version: Apache/2.4.6 (Red Hat Enterprise Linux)\nServer built:   Mar  8 2017 05:09:47'\n\u001b[mNotice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.\u001b[0m\n\u001b[mNotice: Compiled catalog for overcloud-compute-56.localdomain in environment production in 0.93 seconds\u001b[0m\n\u001b[mNotice: /Stage[setup]/Firewall::Linux::Redhat/File[/etc/sysconfig/iptables]/seluser: seluser changed 'system_u' to 'unconfined_u'\u001b[0m\n\u001b[mNotice: /File[/etc/sysconfig/iptables]/seltype: seltype changed 'etc_t' to 'system_conf_t'\u001b[0m\n\u001b[mNotice: /Stage[main]/Main/Package_manifest[/var/lib/tripleo/installed-packages/overcloud_compute2]/ensure: created\u001b[0m\n\u001b[mNotice: /File[/etc/localtime]/seltype: seltype changed 'locale_t' to 'etc_t'\u001b[0m\n\u001b[mNotice: Finished catalog run in 0.54 seconds\u001b[0m\n",
    "deploy_stderr": "exception: connect failed\n",
    "deploy_status_code": 0
  },
  "creation_time": "2017-05-31T12:58:59Z",
  "updated_time": "2017-05-31T13:38:02Z",
  "input_values": {
    "step": 2,
    "update_identifier": "1496223286"
  },
  "action": "CREATE",
  "status_reason": "Outputs received",
  "id": "923e515a-62b3-404a-9020-4206ad4bac0a"
}
[stack@undercloud ~]$ heat deployment-show 370adbc8-0a1d-4b1e-af45-3c92821246cb
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "COMPLETE",
  "server_id": "4bcdd24c-7dee-4c4e-b4ee-0e82584c9152",
  "config_id": "c9802f53-abc5-41ae-bfc0-b748cdb2cc19",
  "output_values": {
    "deploy_stdout": "Matching apachectl 'Server version: Apache/2.4.6 (Red Hat Enterprise Linux)\nServer built:   Mar  8 2017 05:09:47'\n\u001b[mNotice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.\u001b[0m\n\u001b[mNotice: Compiled catalog for overcloud-compute-11.localdomain in environment production in 0.94 seconds\u001b[0m\n\u001b[mNotice: Finished catalog run in 0.50 seconds\u001b[0m\n",
    "deploy_stderr": "exception: connect failed\n",
    "deploy_status_code": 0
  },
  "creation_time": "2017-05-30T13:24:16Z",
  "updated_time": "2017-05-31T13:37:53Z",
  "input_values": {
    "step": 2,
    "update_identifier": "1496223286"
  },
  "action": "UPDATE",
  "status_reason": "Outputs received",
  "id": "370adbc8-0a1d-4b1e-af45-3c92821246cb"
}
[stack@undercloud ~]$

Comment 2 Mike Burns 2017-06-01 13:24:28 UTC
Can you please provide the logs from the nodes that are failing to deploy? 

os-collect-config and os-apply-config logs

Comment 3 bigswitch 2017-06-02 10:40:55 UTC
Created attachment 1284391 [details]
os-collect-config and os-apply-config log

Comment 4 bigswitch 2017-06-02 10:42:43 UTC
Hi Mike,
The logs is attached, this is from a second attempt at deploying without Bigswitch virtual switch plugin

| 560a2fd8-c13c-45b6-b280-40269fb0959b | be25551d-e7f5-4248-9d06-e2c167eb2c13 | 62f720a5-acaa-4ab5-b48b-31fd18bf1bc1 | CREATE | IN_PROGRESS | 2017-06-02T01:34:39Z | Deploy data available |
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+----------------------+-----------------------+
[stack@undercloud ~]$ heat stack-list
WARNING (shell) "heat stack-list" is deprecated, please use "openstack stack list" instead
+--------------------------------------+------------+---------------+----------------------+----------------------+
| id                                   | stack_name | stack_status  | creation_time        | updated_time         |
+--------------------------------------+------------+---------------+----------------------+----------------------+
| ee70573c-03b5-4703-a1ec-1d156d503553 | overcloud  | UPDATE_FAILED | 2017-06-01T13:36:44Z | 2017-06-02T01:08:30Z |
+--------------------------------------+------------+---------------+----------------------+----------------------+
[stack@undercloud ~]$ heat deployment-show 560a2fd8-c13c-45b6-b280-40269fb0959b
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "IN_PROGRESS",
  "server_id": "62f720a5-acaa-4ab5-b48b-31fd18bf1bc1",
  "config_id": "be25551d-e7f5-4248-9d06-e2c167eb2c13",
  "output_values": null,
  "creation_time": "2017-06-02T01:34:39Z",
  "input_values": {},
  "action": "CREATE",
  "status_reason": "Deploy data available",
  "id": "560a2fd8-c13c-45b6-b280-40269fb0959b"
}
[stack@undercloud ~]$

Comment 5 Joe Talerico 2017-06-02 16:49:08 UTC
Hey Song - 

In the log I see : 
Jun 02 02:43:31 overcloud-compute-51.localdomain os-collect-config[3077]: HTTPConnectionPool(host='169.254.169.254', port=80): Read timed out. (read timeout=10.0)

Is this Undercloud a VM?

Comment 6 bigswitch 2017-06-02 16:52:05 UTC
Hi Joe,
No the undercloud is not a VM. Its a baremetal server with 260gig ram. I have seen this message in a smaller setup as well with a couple of compute nodes. I cant find 169.254.169.254 on the undercloud namespace or interfaces though.

Comment 7 bigswitch 2017-06-02 17:11:32 UTC
btw, the same setup has scale to 140 compute nodes, 3 controller and 3 ceph-storage nodes with RHOSP 9

Comment 8 Joe Talerico 2017-06-02 18:01:24 UTC
Hey - Song,

# iptables -nvL -t nat | grep 169
    0     0 REDIRECT   tcp  --  br-ctlplane *       0.0.0.0/0            169.254.169.254      tcp dpt:80 redir ports 8775

At Scale I have ran into this : https://bugs.launchpad.net/tripleo/+bug/1674732

Which doesn't seem to have a fix in Newton - however you should be able to update the 51-hosts script if we think this could be the issue? Based on that bug, are you seeing the same symptoms?

Comment 9 bigswitch 2017-06-02 18:21:59 UTC
Hi Joe,
I will edit the 51-hosts script with the changes and retry the deployment, thank you so much

Song

Comment 10 Joe Talerico 2017-06-02 19:02:23 UTC
Hey Song - Does the symptom mentioned in the BZ fit what you are seeing?

Comment 11 bigswitch 2017-06-02 19:14:04 UTC
Hi Joe,
no I dont see the symptoms mentioned in the BZ. Most of the deployment is Complete except for one node.

Comment 12 bigswitch 2017-06-06 17:43:06 UTC
Hi Joe,
I edited the 51-hosts script, and re-run deployment. It is still failing at 50 nodes.
I will attach /var/log/messages from both compute-0 and compute-3

openstack stack list
+--------------------------------------+------------+---------------+----------------------+----------------------+
| ID                                   | Stack Name | Stack Status  | Creation Time        | Updated Time         |
+--------------------------------------+------------+---------------+----------------------+----------------------+
| d8dc9449-84d1-49f0-a2c8-550fd1588a62 | overcloud  | UPDATE_FAILED | 2017-06-05T16:41:26Z | 2017-06-05T23:50:21Z |
+--------------------------------------+------------+---------------+----------------------+----------------------+


[stack@undercloud ~]$ heat deployment-list | grep -v COMPLETE
WARNING (shell) "heat deployment-list" is deprecated, please use "openstack software deployment list" instead
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+----------------------+-----------------------+
| id                                   | config_id                            | server_id                            | action | status      | creation_time        | status_reason         |
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+----------------------+-----------------------+
| a62108d1-f2ea-491d-a9e8-0b39f2a7d7ab | 27afae20-294c-4ba7-9929-93131c715daf | 8cfea261-17cb-4be2-b11e-73ba4ca7d199 | UPDATE | IN_PROGRESS | 2017-06-05T18:38:01Z | Deploy data available |
| bdd55ad5-a91c-4977-ab78-4309c33d6f4d | ff1c2c3b-d1b9-47ae-baf9-c59689f15b36 | baa9de96-b8a7-428c-80f2-3e3533d69ab3 | UPDATE | IN_PROGRESS | 2017-06-05T18:38:03Z | Deploy data available |
| 670b94c0-4020-479f-963b-78631f04351e | d9bdbd83-4075-427f-9c23-0a3771bbeadb | 1f12a4e5-0d74-4909-8260-7bd2d30ae330 | UPDATE | IN_PROGRESS | 2017-06-05T18:38:04Z | Deploy data available |
| abac6f55-e55c-4b97-8489-a40588ef1997 | 45bab28e-01ff-46f0-8cb9-019a901976a9 | 3f085640-192b-44ea-918b-d77a75ff6900 | UPDATE | IN_PROGRESS | 2017-06-05T18:38:22Z | Deploy data available |
| d9289b34-3223-4493-900b-781d68778556 | d097ac23-9c7e-4157-a197-594e7cfca023 | a7f89ba3-97fd-4d98-a880-de17588f180e | UPDATE | IN_PROGRESS | 2017-06-05T18:38:25Z | Deploy data available |
| 6c598614-8289-4fc8-8246-3682febfbc84 | 165a36c0-a2a8-45cb-88f1-638b1ca6a075 | 47d6cc7d-fb36-409b-901c-606604233816 | UPDATE | IN_PROGRESS | 2017-06-05T22:51:00Z | Deploy data available |
| cfe1a74f-d1fc-4620-bd58-086cb09c9481 | 9b23e56e-588d-44ba-9231-b0d8c07df8d4 | e87bcb80-1ae6-4673-bd01-2fcfc6ec5b0f | CREATE | IN_PROGRESS | 2017-06-06T00:41:31Z | Deploy data available |
+--------------------------------------+--------------------------------------+--------------------------------------+--------+-------------+----------------------+-----------------------+
[stack@undercloud ~]$ heat deployment-show a62108d1-f2ea-491d-a9e8-0b39f2a7d7ab
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "IN_PROGRESS",
  "server_id": "8cfea261-17cb-4be2-b11e-73ba4ca7d199",
  "config_id": "27afae20-294c-4ba7-9929-93131c715daf",
  "output_values": {
    "deploy_stdout": "os-apply-config deployment 2ae1ceee-8cbf-48b8-9f62-ddd25e56f6f9 completed",
    "deploy_stderr": null,
    "deploy_status_code": "0"
  },
  "creation_time": "2017-06-05T18:38:01Z",
  "updated_time": "2017-06-06T00:37:13Z",
  "input_values": {
    "bootstrap_nodeid": "overcloud-compute-0",
    "bootstrap_nodeid_ip": "192.0.2.12"
  },
  "action": "UPDATE",
  "status_reason": "Deploy data available",
  "id": "a62108d1-f2ea-491d-a9e8-0b39f2a7d7ab"
}
[stack@undercloud ~]$ heat deployment-show bdd55ad5-a91c-4977-ab78-4309c33d6f4d
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "IN_PROGRESS",
  "server_id": "baa9de96-b8a7-428c-80f2-3e3533d69ab3",
  "config_id": "ff1c2c3b-d1b9-47ae-baf9-c59689f15b36",
  "output_values": {
    "deploy_stdout": "os-apply-config deployment f804e16d-141e-4a66-a40a-7d0a961f52bd completed",
    "deploy_stderr": null,
    "deploy_status_code": "0"
  },
  "creation_time": "2017-06-05T18:38:03Z",
  "updated_time": "2017-06-06T00:42:05Z",
  "input_values": {
    "bootstrap_nodeid": "overcloud-compute-0",
    "bootstrap_nodeid_ip": "192.0.2.12"
  },
  "action": "UPDATE",
  "status_reason": "Deploy data available",
  "id": "bdd55ad5-a91c-4977-ab78-4309c33d6f4d"
}
[stack@undercloud ~]$ heat deployment-show 670b94c0-4020-479f-963b-78631f04351e
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "IN_PROGRESS",
  "server_id": "1f12a4e5-0d74-4909-8260-7bd2d30ae330",
  "config_id": "d9bdbd83-4075-427f-9c23-0a3771bbeadb",
  "output_values": {
    "deploy_stdout": "os-apply-config deployment 6e8ca884-6ac8-487f-b520-c7aec537051e completed",
    "deploy_stderr": null,
    "deploy_status_code": "0"
  },
  "creation_time": "2017-06-05T18:38:04Z",
  "updated_time": "2017-06-06T00:42:03Z",
  "input_values": {
    "bootstrap_nodeid": "overcloud-compute-0",
    "bootstrap_nodeid_ip": "192.0.2.12"
  },
  "action": "UPDATE",
  "status_reason": "Deploy data available",
  "id": "670b94c0-4020-479f-963b-78631f04351e"
}
[stack@undercloud ~]$ heat deployment-show abac6f55-e55c-4b97-8489-a40588ef1997
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "IN_PROGRESS",
  "server_id": "3f085640-192b-44ea-918b-d77a75ff6900",
  "config_id": "45bab28e-01ff-46f0-8cb9-019a901976a9",
  "output_values": {
    "deploy_stdout": "os-apply-config deployment 9baf1192-f957-4d80-a043-5ca1d4b4a58d completed",
    "deploy_stderr": null,
    "deploy_status_code": "0"
  },
  "creation_time": "2017-06-05T18:38:22Z",
  "updated_time": "2017-06-06T00:39:12Z",
  "input_values": {
    "bootstrap_nodeid": "overcloud-compute-0",
    "bootstrap_nodeid_ip": "192.0.2.12"
  },
  "action": "UPDATE",
  "status_reason": "Deploy data available",
  "id": "abac6f55-e55c-4b97-8489-a40588ef1997"
}
[stack@undercloud ~]$ heat deployment-show d9289b34-3223-4493-900b-781d68778556
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "IN_PROGRESS",
  "server_id": "a7f89ba3-97fd-4d98-a880-de17588f180e",
  "config_id": "d097ac23-9c7e-4157-a197-594e7cfca023",
  "output_values": {
    "deploy_stdout": "os-apply-config deployment 82c3b35d-7ac4-4004-a259-ca3db4850c21 completed",
    "deploy_stderr": null,
    "deploy_status_code": "0"
  },
  "creation_time": "2017-06-05T18:38:25Z",
  "updated_time": "2017-06-06T00:41:03Z",
  "input_values": {
    "bootstrap_nodeid": "overcloud-compute-0",
    "bootstrap_nodeid_ip": "192.0.2.12"
  },
  "action": "UPDATE",
  "status_reason": "Deploy data available",
  "id": "d9289b34-3223-4493-900b-781d68778556"
}
[stack@undercloud ~]$ heat deployment-show 6c598614-8289-4fc8-8246-3682febfbc84
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "IN_PROGRESS",
  "server_id": "47d6cc7d-fb36-409b-901c-606604233816",
  "config_id": "165a36c0-a2a8-45cb-88f1-638b1ca6a075",
  "output_values": {
    "deploy_stdout": "os-apply-config deployment b49461ad-356a-4ae5-bcb1-028d696a5736 completed",
    "deploy_stderr": null,
    "deploy_status_code": "0"
  },
  "creation_time": "2017-06-05T22:51:00Z",
  "updated_time": "2017-06-06T00:37:09Z",
  "input_values": {
    "bootstrap_nodeid": "overcloud-compute-0",
    "bootstrap_nodeid_ip": "192.0.2.12"
  },
  "action": "UPDATE",
  "status_reason": "Deploy data available",
  "id": "6c598614-8289-4fc8-8246-3682febfbc84"
}
[stack@undercloud ~]$ heat deployment-show cfe1a74f-d1fc-4620-bd58-086cb09c9481
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "IN_PROGRESS",
  "server_id": "e87bcb80-1ae6-4673-bd01-2fcfc6ec5b0f",
  "config_id": "9b23e56e-588d-44ba-9231-b0d8c07df8d4",
  "output_values": null,
  "creation_time": "2017-06-06T00:41:31Z",
  "input_values": {
    "bootstrap_nodeid": "overcloud-compute-0",
    "bootstrap_nodeid_ip": "192.0.2.12"
  },
  "action": "CREATE",
  "status_reason": "Deploy data available",
  "id": "cfe1a74f-d1fc-4620-bd58-086cb09c9481"
}
[stack@undercloud ~]$ nova list | grep "e87bcb80-1ae6-4673-bd01-2fcfc6ec5b0f"
| e87bcb80-1ae6-4673-bd01-2fcfc6ec5b0f | overcloud-compute-42    | ACTIVE | -          | Running     | ctlplane=192.0.2.62 |
[stack@undercloud ~]$ nova list | grep "47d6cc7d-fb36-409b-901c-606604233816"
| 47d6cc7d-fb36-409b-901c-606604233816 | overcloud-compute-23    | ACTIVE | -          | Running     | ctlplane=192.0.2.45 |
[stack@undercloud ~]$ nova list | grep "a7f89ba3-97fd-4d98-a880-de17588f180e"
| a7f89ba3-97fd-4d98-a880-de17588f180e | overcloud-compute-3     | ACTIVE | -          | Running     | ctlplane=192.0.2.34 |

Comment 13 bigswitch 2017-06-06 17:47:18 UTC
Created attachment 1285477 [details]
messages from compute-0 and compute-3

Comment 14 Joe Talerico 2017-06-06 20:56:12 UTC
On compute-3 I see :
Jun  6 01:24:24 localhost os-collect-config: + status=500
Jun  6 01:24:24 localhost os-collect-config: + cat /tmp/tmp.nv7ukt3Btd
Jun  6 01:24:24 localhost os-collect-config: <ErrorResponse><Error><Message>The request processing has failed due to an internal error:Timed out waiting for a reply to message ID 6108dc42dc7c418ea3d0af334634d44b</Message><Code>InternalFailure</Code><Type>Server</Type></Error></ErrorResponse>+ rm /tmp/tmp.nv7ukt3Btd
Jun  6 01:24:24 localhost os-collect-config: + '[' 500 '!=' 200 ']'
Jun  6 01:24:24 localhost os-collect-config: + exit 1
Jun  6 01:24:24 localhost os-collect-config: [2017-06-05 22:24:24,104] (os-refresh-config) [ERROR] during post-configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/post-configure.d']' returned non-zero exit status 1]
Jun  6 01:24:24 localhost os-collect-config: [2017-06-05 22:24:24,104] (os-refresh-config) [ERROR] Aborting...
Jun  6 01:24:24 localhost os-collect-config: Command failed, will not cache new data. Command 'os-refresh-config --timeout 14400' returned non-zero exit status 1
Jun  6 01:24:24 localhost os-collect-config: Sleeping 1.00 seconds before re-exec.
Jun  6 01:24:25 localhost os-collect-config: /var/lib/os-collect-config/local-data not found. Skipping

Can you share the openstack logs from your undercloud please?

Comment 15 Joe Talerico 2017-06-07 11:46:58 UTC
Song - Can you look at the swift logs? os-collect-config will read from Swift on the undercloud -- could there be a chance your disk on the undercloud is saturated?

Comment 16 bigswitch 2017-06-07 17:54:57 UTC
Hi Joe,
Attached is the logs from the undercloud. One from journalctl -u openstack-swift-* and another from journalctl | grep swift.

Comment 17 bigswitch 2017-06-07 18:37:33 UTC
Uploaded logs to box

https://bigswitch.box.com/s/c7m26k0anayr0fvsngzjesjf9102yfmt

Comment 18 bigswitch 2017-06-10 19:38:27 UTC
I redeploy with the neutronworkers set to 16, but still not able to deploy 60 compute nodes. It failed with the below error

[stack@undercloud scripts]$ heat deployment-listironic node-listheat deployment-list | grep -v COMPLETE
WARNING (shell) "heat deployment-list" is deprecated, please use "openstack software deployment list" instead
+--------------------------------------+--------------------------------------+--------------------------------------+--------+----------+----------------------+---------------------------------------------------------------------+
| id                                   | config_id                            | server_id                            | action | status   | creation_time        | status_reason                                                       |
+--------------------------------------+--------------------------------------+--------------------------------------+--------+----------+----------------------+---------------------------------------------------------------------+
| 89d4e97d-efdc-4150-bf76-306c9936d5ef | 59bd707c-a46a-4a60-86d3-8dcfa78f8277 | aca3b004-5f97-43f8-8df5-d61a88946fd7 | UPDATE | FAILED   | 2017-06-09T06:05:36Z | deploy_status_code : Deployment exited with non-zero status code: 6 |
+--------------------------------------+--------------------------------------+--------------------------------------+--------+----------+----------------------+---------------------------------------------------------------------+
[stack@undercloud scripts]$ heat deployment-list | grep -v COMPLETESHOW show 89d4e97d-efdc-4150-bf76-306c9936d5ef
WARNING (shell) "heat deployment-show" is deprecated, please use "openstack software deployment show" instead
{
  "status": "FAILED", 
  "server_id": "aca3b004-5f97-43f8-8df5-d61a88946fd7", 
  "config_id": "59bd707c-a46a-4a60-86d3-8dcfa78f8277", 
  "output_values": {
    "deploy_stdout": "Matching apachectl 'Server version: Apache/2.4.6 (Red Hat Enterprise Linux)\nServer built:   Mar  8 2017 05:09:47'\n\u001b[mNotice: Scope(Class[Tripleo::Firewall::Post]): At this stage, all network traffic is blocked.\u001b[0m\n\u001b[mNotice: Compiled catalog for overcloud-controller-1.localdomain in environment production in 18.70 seconds\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Pacemaker::Database::Mysql/Exec[galera-set-root-password]/returns: executed successfully\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[DEFAULT/notify_nova_on_port_status_changes]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/auth_type]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server/Neutron_config[DEFAULT/router_scheduler_driver]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Keystone::Authtoken/Keystone::Resource::Authtoken[neutron_config]/Neutron_config[keystone_authtoken/auth_type]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/project_name]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Keystone::Authtoken/Keystone::Resource::Authtoken[neutron_config]/Neutron_config[keystone_authtoken/project_name]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Keystone::Authtoken/Keystone::Resource::Authtoken[neutron_config]/Neutron_config[keystone_authtoken/password]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server/Neutron_config[DEFAULT/allow_automatic_l3agent_failover]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Keystone::Authtoken/Keystone::Resource::Authtoken[neutron_config]/Neutron_config[keystone_authtoken/username]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/username]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server/Neutron_config[DEFAULT/rpc_workers]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Keystone::Authtoken/Keystone::Resource::Authtoken[neutron_config]/Neutron_config[keystone_authtoken/project_domain_name]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Policy/Oslo::Policy[neutron_config]/Neutron_config[oslo_policy/policy_file]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Keystone::Authtoken/Keystone::Resource::Authtoken[neutron_config]/Neutron_config[keystone_authtoken/auth_uri]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server/Neutron_config[DEFAULT/max_l3_agents_per_router]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server/Neutron_config[DEFAULT/api_workers]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/password]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server/Oslo::Middleware[neutron_config]/Neutron_config[oslo_middleware/enable_proxy_headers_parsing]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/project_domain_id]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[DEFAULT/notify_nova_on_port_data_changes]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/tenant_name]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server/Neutron_config[DEFAULT/router_distributed]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Keystone::Authtoken/Keystone::Resource::Authtoken[neutron_config]/Neutron_config[keystone_authtoken/user_domain_name]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/user_domain_id]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[nova/auth_url]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server::Notifications/Neutron_config[DEFAULT/nova_url]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Tripleo::Profile::Base::Nova/Package[openstack-nova-migration]/ensure: removed\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Keystone::Authtoken/Keystone::Resource::Authtoken[neutron_config]/Neutron_config[keystone_authtoken/auth_url]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Db/Oslo::Db[neutron_config]/Neutron_config[database/db_max_retries]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Db/Oslo::Db[neutron_config]/Neutron_config[database/connection]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Db/Oslo::Db[neutron_config]/Neutron_config[database/max_retries]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Nova::Scheduler::Filter/Nova_config[DEFAULT/scheduler_available_filters]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Server/Neutron_config[DEFAULT/l3_ha]/ensure: created\u001b[0m\n\u001b[mNotice: /Stage[main]/Nova::Deps/Anchor[nova::config::end]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Nova::Deps/Anchor[nova::service::begin]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Nova::Vncproxy/Nova::Generic_service[vncproxy]/Service[nova-vncproxy]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Nova::Consoleauth/Nova::Generic_service[consoleauth]/Service[nova-consoleauth]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Nova::Api/Nova::Generic_service[api]/Service[nova-api]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Nova::Scheduler/Nova::Generic_service[scheduler]/Service[nova-scheduler]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Nova::Conductor/Nova::Generic_service[conductor]/Service[nova-conductor]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Nova::Deps/Anchor[nova::service::end]: Triggered 'refresh' from 5 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::config::end]: Triggered 'refresh' from 31 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::service::begin]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Agents::Ml2::Ovs/Service[neutron-ovs-agent-service]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Agents::Dhcp/Service[neutron-dhcp-service]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Agents::Metadata/Service[neutron-metadata]: Triggered 'refresh' from 1 events\u001b[0m\n\u001b[mNotice: /Stage[main]/Neutron::Deps/Anchor[neutron::service::end]: Triggered 'refresh' from 3 events\u001b[0m\n\u001b[mNotice: Finished catalog run in 243.54 seconds\u001b[0m\n", 
    "deploy_stderr": "exception: connect failed\n\u001b[1;31mWarning: Scope(Class[Cinder::Api]): keystone_enabled is deprecated, use auth_strategy instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Keystone]): Fernet token is recommended in Mitaka release. The default for token_provider will be changed to 'fernet' in O release.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Glance::Api]): default_store not provided, it will be automatically set to glance.store.http.Store\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Heat]): keystone_user_domain_id is deprecated, use the name option instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Heat]): keystone_project_domain_id is deprecated, use the name option instead.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Server::Notifications]): nova_url is deprecated and will be removed after Newton cycle.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Neutron::Plugins::Ml2::Bigswitch]): python-networking-bigswitch package management is deprecated, it will be dropped in a future release.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova]): Could not look up qualified variable '::nova::scheduler::filter::cpu_allocation_ratio'; class ::nova::scheduler::filter has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova]): Could not look up qualified variable '::nova::scheduler::filter::ram_allocation_ratio'; class ::nova::scheduler::filter has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova]): Could not look up qualified variable '::nova::scheduler::filter::disk_allocation_ratio'; class ::nova::scheduler::filter has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Mongodb::Server]): Replset specified, but no replset_members or replset_config provided.\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Keystone::Authtoken]): Could not look up qualified variable '::nova::api::admin_user'; class ::nova::api has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Keystone::Authtoken]): Could not look up qualified variable '::nova::api::admin_password'; class ::nova::api has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Keystone::Authtoken]): Could not look up qualified variable '::nova::api::admin_tenant_name'; class ::nova::api has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Keystone::Authtoken]): Could not look up qualified variable '::nova::api::auth_uri'; class ::nova::api has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Keystone::Authtoken]): Could not look up qualified variable '::nova::api::auth_version'; class ::nova::api has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Keystone::Authtoken]): Could not look up qualified variable '::nova::api::identity_uri'; class ::nova::api has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_host'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_protocol'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_port'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Nova::Vncproxy::Common]): Could not look up qualified variable '::nova::compute::vncproxy_path'; class ::nova::compute has not been evaluated\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Ceilometer]): Both $metering_secret and $telemetry_secret defined, using $telemetry_secret\u001b[0m\n\u001b[1;31mWarning: You cannot collect exported resources without storeconfigs being set; the collection will be ignored on line 166 in file /etc/puppet/modules/gnocchi/manifests/api.pp\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Gnocchi::Api]): gnocchi:api::keystone_identity_uri is deprecated, use gnocchi::keystone::authtoken::auth_url instead\u001b[0m\n\u001b[1;31mWarning: Scope(Class[Gnocchi::Api]): gnocchi::api::keystone_auth_uri is deprecated, use gnocchi::keystone::authtoken::auth_uri instead\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Scope(Haproxy::Config[haproxy]): haproxy: The $merge_options parameter will default to true in the next major release. Please review the documentation regarding the implications.\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mWarning: Not collecting exported resources without storeconfigs\u001b[0m\n\u001b[1;31mError: /Stage[main]/Neutron::Server/Service[neutron-server]: Failed to call refresh: Could not restart Service[neutron-server]: Execution of '/usr/bin/systemctl restart neutron-server' returned 1: Job for neutron-server.service failed because the control process exited with error code. See \"systemctl status neutron-server.service\" and \"journalctl -xe\" for details.\u001b[0m\n\u001b[1;31mError: /Stage[main]/Neutron::Server/Service[neutron-server]: Could not restart Service[neutron-server]: Execution of '/usr/bin/systemctl restart neutron-server' returned 1: Job for neutron-server.service failed because the control process exited with error code. See \"systemctl status neutron-server.service\" and \"journalctl -xe\" for details.\u001b[0m\n", 
    "deploy_status_code": 6
  }, 
  "creation_time": "2017-06-09T06:05:36Z", 
  "updated_time": "2017-06-10T01:34:49Z", 
  "input_values": {
    "step": 5, 
    "update_identifier": "1497054399"
  }, 
  "action": "UPDATE", 
  "status_reason": "deploy_status_code : Deployment exited with non-zero status code: 6", 
  "id": "89d4e97d-efdc-4150-bf76-306c9936d5ef"
}
[stack@undercloud scripts]$  

I redeploy the same command, and it failed with timeout after 4 hours but heat deployment-list show everything is Complete


2017-06-10 18:41:50Z [overcloud-AllNodesDeploySteps-xq7p2yzshd3h-ComputeDeployment_Step2-36momhijhrjp.39]: UPDATE_COMPLETE  state changed
2017-06-10 18:41:53Z [overcloud-ComputeAllNodesDeployment-sfjlzcj7ycuh.18]: SIGNAL_COMPLETE  Unknown
2017-06-10 18:41:53Z [overcloud-AllNodesDeploySteps-xq7p2yzshd3h-ComputeDeployment_Step2-36momhijhrjp.3]: UPDATE_COMPLETE  state changed
2017-06-10 18:41:53Z [overcloud-AllNodesDeploySteps-xq7p2yzshd3h-ComputeDeployment_Step2-36momhijhrjp.22]: UPDATE_COMPLETE  state changed
2017-06-10 18:41:55Z [overcloud-AllNodesDeploySteps-xq7p2yzshd3h-ComputeDeployment_Step2-36momhijhrjp.32]: UPDATE_COMPLETE  state changed
2017-06-10 18:41:55Z [overcloud-AllNodesDeploySteps-xq7p2yzshd3h-ComputeDeployment_Step2-36momhijhrjp.23]: UPDATE_COMPLETE  state changed
2017-06-10 18:41:55Z [overcloud-AllNodesDeploySteps-xq7p2yzshd3h-ComputeDeployment_Step2-36momhijhrjp.46]: UPDATE_COMPLETE  state changed
2017-06-10 18:41:56Z [overcloud-AllNodesDeploySteps-xq7p2yzshd3h-ComputeDeployment_Step2-36momhijhrjp.30]: UPDATE_COMPLETE  state changed
2017-06-10 18:41:56Z [overcloud-AllNodesDeploySteps-xq7p2yzshd3h-ComputeDeployment_Step2-36momhijhrjp.9]: UPDATE_COMPLETE  state changed
ERROR: Timed out waiting for a reply to message ID 28b7b6a7c48745c3a0ea4888d11b7fe5

real    244m47.145s
user    0m5.334s
sys     0m0.408s

Please advise what is the next step

Comment 19 bigswitch 2017-09-08 16:51:06 UTC
Hello,

We gave this another try by redeploying, and noticed that os-net-config doesn't apply as soon as node number reaches 50. Any addition of nodes after that doesn't work.
Checking the compute node - everything looks good under /etc/puppet/hieradata/*.yaml. The config seems correct, but it appears that puppet didn't run on those nodes at all.
All the packages are installed, but no config changes are made - the part that's done by puppet.

Has something similar been observed earlier?

Comment 20 Steve Reichard 2017-09-08 17:28:18 UTC
Joe,


Was on a call and they are looking to understand the next steps.

Comment 21 Joe Talerico 2017-09-08 18:31:23 UTC
Are you seeing the same error mentioned on Comment 14?

Can you recreate the bug without any special plugins? ie ovsml2.

Comment 22 bigswitch 2017-09-14 16:06:09 UTC
we tried with OVS deployment , still after the 50+ node , update is getting failed , i believe , it is something with the file size (may he hosts) , below error seen

Sep 14 01:32:48 localhost os-collect-config: 172.18.0.32 scale-compute-32.storage.localdomain scale-compute-32.storage
Sep 14 01:32:48 localhost os-collect-config: 192.0.2.52 scale-compute-32.storagemgmt.localdomain scale-compute-32.storagemgmt
Sep 14 01:32:48 localhost os-collect-config: 172.16.0.47 scale-compute-32.tenant.localdomain scale-compute-32.tenant
Sep 14 01:32:48 localhost os-collect-config: 192.0.2.52 scale-compute-32.management.localdomain scale-compute-32.management
Sep 14 01:32:48 localhost os-collect-config: 192.0.2.52 scale-compute-32.ctlplane.localdomain scale-compute-32.ctlplane
Sep 14 01:32:48 localhost os-collect-config: \n172.17.0.55 scale-compute-33.localdomain scale-compute-33
Sep 14 01:42:56 localhost journal: Suppressed 1657 messages from /system.slice/os-collect-config.service
Sep 14 01:42:56 localhost os-collect-config: + status=500
Sep 14 01:42:56 localhost os-collect-config: + cat /tmp/tmp.dOaj7bQd22
Sep 14 01:42:56 localhost os-collect-config: <ErrorResponse><Error><Message>The request processing has failed due to an internal error:Timed out waiting for a reply to message ID c8161a0ad3094aeea095fb67ae88978e</Me
ssage><Code>InternalFailure</Code><Type>Server</Type></Error></ErrorResponse>+ rm /tmp/tmp.dOaj7bQd22
Sep 14 01:42:56 localhost os-collect-config: + '[' 500 '!=' 200 ']'
Sep 14 01:42:56 localhost os-collect-config: + exit 1
Sep 14 01:42:56 localhost os-collect-config: [2017-09-14 01:42:56,193] (os-refresh-config) [ERROR] during post-configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/post-configure.d']' returne
d non-zero exit status 1]
Sep 14 01:42:56 localhost os-collect-config: [2017-09-14 01:42:56,193] (os-refresh-config) [ERROR] Aborting...
Sep 14 01:42:56 localhost os-collect-config: Command failed, will not cache new data. Command 'os-refresh-config --timeout 14400' returned non-zero exit status 1
Sep 14 01:42:56 localhost os-collect-config: Sleeping 1.00 seconds before re-exec.
Sep 14 01:42:57 localhost os-collect-config: /var/lib/os-collect-config/local-data not found. Skipping
Sep 14 01:42:57 localhost os-collect-config: No local metadata found (['/var/lib/os-collect-config/local-data'])

Comment 23 bigswitch 2017-09-14 16:39:26 UTC
update is failing with the following place.

if grep -q “^# HEAT_HOSTS_START” “$file”; then
       temp=$(mktemp)
       awk -v v=“$entries” ‘/^# HEAT_HOSTS_START/ {
           print $0
           print v
           f=1
           }f &&!/^# HEAT_HOSTS_END$/{next}/^# HEAT_HOSTS_END$/{f=0}!f’ “$file” > “$temp”
           echo “INFO: Updating hosts file $file, check below for changes”
           diff “$file” “$temp” || true
           cat “$temp” > “$file”
   else
       echo -ne “\n# HEAT_HOSTS_START - Do not edit manually within this section!\n” >> “$file”
       echo “$entries” >> “$file”
       echo -ne “# HEAT_HOSTS_END\n\n” >> “$file”
   fi

}


[9:35] 
51 hosts


[9:35] 
path /usr/libexec/os-refresh-config/configure.d/51-hosts



Sep 14 01:54:35 localhost os-collect-config: <ErrorResponse><Error><Message>The request processing has failed due to an internal error:Timed out waiting for a reply to message ID 261263f8224a4565ba2145ab35a07635</Me
ssage><Code>InternalFailure</Code><Type>Server</Type></Error></ErrorResponse>+ rm /tmp/tmp.6iuSBduuAl
Sep 14 01:54:35 localhost os-collect-config: + ‘[’ 500 ‘!=’ 200 ‘]’
Sep 14 01:54:35 localhost os-collect-config: + exit 1
Sep 14 01:54:35 localhost os-collect-config: [2017-09-13 22:54:35,696] (os-refresh-config) [ERROR] during post-configure phase. [Command ‘[’dib-run-parts’, ‘/usr/libexec/os-refresh-config/post-configure.d’]' returne
d non-zero exit status 1]
Sep 14 01:54:35 localhost os-collect-config: [2017-09-13 22:54:35,696] (os-refresh-config) [ERROR] Aborting...


[9:28] 
Sep 14 01:44:20 localhost os-collect-config: dib-run-parts Wed Sep 13 22:44:20 PDT 2017 40-truncate-nova-config completed
Sep 14 01:44:20 localhost os-collect-config: dib-run-parts Wed Sep 13 22:44:20 PDT 2017 Running /usr/libexec/os-refresh-config/configure.d/51-hosts
Sep 14 01:44:20 localhost os-collect-config: + set -o pipefail
Sep 14 01:44:20 localhost os-collect-config: ++ os-apply-config --key hosts --type raw --key-default ‘’
Sep 14 01:44:20 localhost os-collect-config: ++ tr ‘[A-Z]’ ‘[a-z]’
Sep 14 01:44:20 localhost os-collect-config: + ENTRIES=’172.17.0.13 scale-controller-0.localdomain scale-controller-0
Sep 14 01:44:20 localhost os-collect-config: 10.9.28.29 scale-controller-0.external.localdomain scale-controller-0.external
Sep 14 01:44:20 localhost os-collect-config: 172.17.0.13 scale-controller-0.internalapi.localdomain scale-controller-0.inter


[9:30] 
Sep 14 03:06:51 localhost journal: Suppressed 1909 messages from /system.slice/os-collect-config.service
Sep 14 03:06:51 localhost os-collect-config: + status=500
Sep 14 03:06:51 localhost os-collect-config: + cat /tmp/tmp.6vp4EQGiaD
Sep 14 03:06:51 localhost os-collect-config: <ErrorResponse><Error><Message>The request processing has failed due to an internal error:Timed out waiting for a reply to message ID 506cde9a572d46d1bab0738e3e162134</Me
ssage><Code>InternalFailure</Code><Type>Server</Type></Error></ErrorResponse>+ rm /tmp/tmp.6vp4EQGiaD

Comment 24 Steve Reichard 2017-09-14 17:11:52 UTC
Question - are you deploying all computes at the same time?  Or is this just when you have a total over 50 nodes?


I believe Joe Talerico already shared some recommendation with you, one being to limit to 32 nodes at a time.

Comment 25 bigswitch 2017-09-14 17:15:58 UTC
Updating the deployment 15 node at time..

First 3 controller , 5 compute , After the deployment successful , update the deployment 15 node at time

Once the Hosts reaches 50 +, it starts giving the 51 hosts file error. and update will fail.

Comment 26 bigswitch 2017-09-14 17:28:25 UTC
we modified the changes mentioned in the "https://bugs.launchpad.net/tripleo/+bug/1674732"  vi /usr/share/tripleo-image-elements/hosts/os-refresh-config/configure.d/51-hosts (RHOSP director) 

How ever while deploying compute nodes does not get the changes ,  Do you know how and where do we need to apply for compute node to get the new changes?

Comment 27 Steve Reichard 2017-09-14 18:13:55 UTC
You can use virt-customize to make changes to the overcloud image before import the image into glance.

https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/10/html/partner_integration/overcloud_images

with virt-customize, using copy-in or run-cmd are optinos.

Comment 28 bigswitch 2017-09-15 05:27:16 UTC
used virt-customize to update the 51-hosts patch , now i can see the compute  node has the patch , How ever still with the 25+ hosts , 51-hosts is not completing.

how ever if i run manually in the compute node , 51-hosts file is completing.

dib-run-parts /usr/libexec/os-refresh-config/configure.d/ --debug -- This completes.

During deploymentn 51-hosts stuck , does not complete at all.


[root@scale-compute-25 heat-admin]# cat /usr/libexec/os-refresh-config/configure.d/51-hosts
#!/bin/bash

set -eux
set -o pipefail

write_entries() {
    local file="$1"
    local entries="$2"

    # Don't do anything if the file isn't there
    if [ ! -f "$file" ]; then
        return
    fi

    if grep -q "^# HEAT_HOSTS_START" "$file"; then
        temp=$(mktemp)
        (
        sed '/^# HEAT_HOSTS_START/,$d' "$file"
        echo -ne "\n# HEAT_HOSTS_START - Do not edit manually within this section!\n"
        echo "$entries"
        echo -ne "# HEAT_HOSTS_END\n\n"
        sed '1,/^# HEAT_HOSTS_END/d' "$file"
        ) > "$temp"
        echo "INFO: Updating hosts file $file, check below for changes"
        diff "$file" "$temp" || true
        cat "$temp" > "$file"
    else
        echo -ne "\n# HEAT_HOSTS_START - Do not edit manually within this section!\n" >> "$file"
        echo "$entries" >> "$file"
        echo -ne "# HEAT_HOSTS_END\n\n" >> "$file"
    fi

}

ENTRIES=$(os-apply-config --key hosts --type raw --key-default '' | tr '[A-Z]' '[a-z]' | sed -e 's/\\n/\n/g' -e '/^$/d')
if [ ! -z "$ENTRIES" ]; then
    # cloud-init files are /etc/cloud/templates/hosts.OSNAME.tmpl
    DIST=$(lsb_release -is | tr -s '[A-Z]' '[a-z]')
    case $DIST in
        fedora|redhatenterpriseserver)
            name="redhat"
            ;;
        *)
            name="$DIST"
            ;;
    esac
    write_entries "/etc/cloud/templates/hosts.${name}.tmpl" "$ENTRIES"
    write_entries "/etc/hosts" "$ENTRIES"
else
    echo "No hosts in Heat, nothing written."
fi
[root@scale-compute-25 heat-admin]#

Comment 29 bigswitch 2017-09-15 21:20:55 UTC
RHOSP 10  we cannot use at all for scale , 51 hosts file execution will stuck any where from 20 - 50 nodes deployments

Unless we have some solution from RH , we cannot recommend at all.

Comment 30 bigswitch 2017-09-16 00:02:03 UTC
Created attachment 1326622 [details]
added new sos report from the compute node

added new sos report

Comment 31 Joe Talerico 2017-09-17 22:43:22 UTC
Did you review Comment 8?

Reviewing upstream, it seems like you are now trying to implement Comment 8? https://bugs.launchpad.net/tripleo/+bug/1674732/comments/12

I simply added exit 0 to the top of the script so the script would not execute. However, you can implement the fix that is upstream for this bug.

Comment 32 bigswitch 2017-09-18 15:27:12 UTC
Yes , we tried the patch mentioned comment #8 , still we see 51-hosts getting started but never end and stuck there.

Now i am thinking of having exit 0 , so not to execute 51-hosts , Not sure what will be the implications of this though.

Comment 33 bigswitch 2017-09-20 18:07:55 UTC
Hi,

We've tried both the changes for the patch to 51-hosts, backporting the changes as well as doing a premature exit using exit 0. However, we still get stuck around the 50 nodes mark. We're wondering there is some configuration that we're missing, due to which this doesn't proceed further.

I know we've had some specific configurations that have been shared here and there regarding some particular issue (mysqldb pool_size when we encounter issue related to it, etc).
Is there a standard guide of settings to be used for undercloud/overcloud in scale setup that you recommend? It would be great if you could share that.

Comment 34 bigswitch 2017-09-21 00:24:36 UTC
Can anyone let us know how the following works , I will also open a support case 

some nodes are stuck during deployments with the following output gets no value , so port-configure.d will do exit 0

Not working node or stuck node

[root@overcloud-compute-20 post-configure.d]# os-apply-config --key instance-id --type raw --key-default ""

[root@overcloud-compute-20 post-configure.d]# exit


Working node

[root@overcloud-compute-23 heat-admin]# os-apply-config --key instance-id --type raw --key-default ""
i-00000701
[root@overcloud-compute-23 heat-admin]# exit

Comment 35 bigswitch 2017-09-21 01:01:48 UTC
Following error was seen before the ID was reset 

nfig[3327]: + DEPLOY_URL='http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-ComputeHost
Sep 20 14:53:32 overcloud-compute-20.bigswitch.com os-collect-config[3327]: + '[' '!' -f /var/lib/os-apply-config-deployments/deployed/479d7cec-977b-40c2-87e7-e49921853a40 ']'
Sep 20 14:53:32 overcloud-compute-20.bigswitch.com os-collect-config[3327]: + echo 'Signalling os-apply-config deployment 479d7cec-977b-40c2-87e7-e49921853a40 http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3
Sep 20 14:53:32 overcloud-compute-20.bigswitch.com os-collect-config[3327]: Signalling os-apply-config deployment 479d7cec-977b-40c2-87e7-e49921853a40 http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18
Sep 20 14:53:32 overcloud-compute-20.bigswitch.com os-collect-config[3327]: + call_curl_deployment POST 'http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercl
Sep 20 14:53:32 overcloud-compute-20.bigswitch.com os-collect-config[3327]: + local method=POST
Sep 20 14:53:32 overcloud-compute-20.bigswitch.com os-collect-config[3327]: + local 'url=http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-ComputeHosts
Sep 20 14:53:32 overcloud-compute-20.bigswitch.com os-collect-config[3327]: + local 'stdout=os-apply-config deployment 479d7cec-977b-40c2-87e7-e49921853a40 completed'
Sep 20 14:53:32 overcloud-compute-20.bigswitch.com os-collect-config[3327]: ++ mktemp
Sep 20 14:53:32 overcloud-compute-20.bigswitch.com os-collect-config[3327]: + local output=/tmp/tmp.GTG0ZQmLGM
Sep 20 14:53:32 overcloud-compute-20.bigswitch.com os-collect-config[3327]: ++ curl -s -w '%{http_code}' -X POST -H 'Content-Type: application/json' -o /tmp/tmp.GTG0ZQmLGM --data-binary '{"deploy_stdout": "os-apply-
Sep 20 15:03:41 overcloud-compute-20.bigswitch.com os-collect-config[3327]: + status=500
Sep 20 15:03:41 overcloud-compute-20.bigswitch.com os-collect-config[3327]: + cat /tmp/tmp.GTG0ZQmLGM
Sep 20 15:03:41 overcloud-compute-20.bigswitch.com os-collect-config[3327]: <ErrorResponse><Error><Message>The request processing has failed due to an internal error:Timed out waiting for a reply to message ID e2872
Sep 20 15:03:41 overcloud-compute-20.bigswitch.com os-collect-config[3327]: + '[' 500 '!=' 200 ']'
Sep 20 15:03:41 overcloud-compute-20.bigswitch.com os-collect-config[3327]: + exit 1
Sep 20 15:03:41 overcloud-compute-20.bigswitch.com os-collect-config[3327]: [2017-09-20 15:03:41,666] (os-refresh-config) [ERROR] during post-configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-con
Sep 20 15:03:41 overcloud-compute-20.bigswitch.com os-collect-config[3327]: [2017-09-20 15:03:41,666] (os-refresh-config) [ERROR] Aborting...
Sep 20 15:03:41 overcloud-compute-20.bigswitch.com os-collect-config[3327]: Command failed, will not cache new data. Command 'os-refresh-config --timeout 14400' returned non-zero exit status 1
Sep 20 15:03:41 overcloud-compute-20.bigswitch.com os-collect-config[3327]: Sleeping 1.00 seconds before re-exec.
Sep 20 15:03:53 overcloud-compute-20.bigswitch.com os-collect-config[3327]: HTTPConnectionPool(host='169.254.169.254', port=80): Read timed out. (read timeout=10.0)
Sep 20 15:03:53 overcloud-compute-20.bigswitch.com os-collect-config[3327]: Source [ec2] Unavailable.
Sep 20 15:03:53 overcloud-compute-20.bigswitch.com os-collect-config[3327]: /var/lib/os-collect-config/local-data not found. Skipping
Sep 20 15:03:53 overcloud-compute-20.bigswitch.com os-collect-config[3327]: No local metadata found (['/var/lib/os-collect-config/local-data'])

,

It looks like RHOSP director listening on 8000 not able to respond in time , because of that some nodes are stuck for the progress update.. 


Not it looks like some tuning paramter for RHOSP director , Is there any tuning parameter for the RHOSP director heat config

Comment 36 Joe Talerico 2017-09-21 12:05:22 UTC
Could you verify the keystone worker count, could you verify the number of processes you have? I sent Song a upstream document that discusses the tunings/best practices, I can re-send if necessary.  

Example of the keystone-admin (update both the keystone-admin and keystone-main 

WSGIDaemonProcess keystone_admin display-name=keystone-admin group=keystone processes=8 threads=4 user=keystone

Comment 37 Steven Hardy 2017-09-21 12:17:48 UTC
> It looks like RHOSP director listening on 8000 not able to respond in time , because of that some nodes are stuck for the progress update..

Yes from this log output that does appear to be the case.  Port 8000 is the heat-api-cfn service, and it's used for signalling on completion of SoftwareDeployments to indicate success or failure (that's what the curl -s -w '%{http_code}' -X POST -H 'Content-Type: application/json' -o /tmp/tmp.GTG0ZQmLGM --data-binary '{"deploy_stdout": "os-apply- ... is doing, it's from https://github.com/openstack/tripleo-image-elements/blob/master/elements/os-refresh-config/os-refresh-config/post-configure.d/99-refresh-completed#L19)

The data flow here is:

curl -> heat_api_cfn -> rabbitmq -> heat-engine -> database

It may be that the large number of nodes all signalling at the same time is causing this problem, which could be due to hardware issues or tuning of the services - however Joe has indicated we've not previously seen any similar issues at this relatively low number of nodes, so it would be good to confirm the state of the hardware via some performance monitoring during the deploy.

Please can you run:

iostat 1 -x -t | tee iostat_log.txt

and

vmstat 1 -t | tee vmstat_log.txt

On the undercloud node in two terminals while the deploy is in-progress?

Also please can you share the heat logs (/var/log/heat*), the heat config (/etc/heat/heat.conf sanitized to remove all passwords and auth_encryption_key).

Also please provide the output of ps axjf | grep heat to confirm the number of heat workers.

With this information hopefully we can clarify the best way to proceed, thanks!

Comment 38 bigswitch 2017-09-21 16:27:13 UTC
Thanks , Attached heat logs , heat.conf , profiler outputs while update in progress 

ps axjf | grep heat

[root@undercloud heat]# ps axjf | grep heat
113145 113153 113145  24220 pts/0    113145 S+    1000   0:03  |               \_ /usr/bin/python2 /usr/bin/openstack overcloud deploy --templates -r /home/stack/templates/roles_data.yaml -e /home/stack/templates/node-info.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e /home/stack/templates/network-environment.yaml -e /home/stack/templates/bigswitch-config.yaml -e /home/stack/templates/timezone.yaml --neutron-disable-tunneling --ntp-server 0.rhel.pool.ntp.org --timeout 240
178530 178646 178646 178473 pts/3    178646 S+    1000   0:00  |               \_ ssh heat-admin.2.6
122392 128252 128251 122392 pts/6    128251 S+       0   0:00          \_ grep --color=auto heat
     1  41434  41434  41434 ?            -1 Ss     187   5:07 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41434  41444  41434  41434 ?            -1 S      187  84:43  \_ /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41434  41445  41434  41434 ?            -1 S      187  66:31  \_ /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41434  41446  41434  41434 ?            -1 S      187  65:15  \_ /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41434  41447  41434  41434 ?            -1 S      187  86:57  \_ /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
     1  41479  41479  41479 ?            -1 Ss     187   0:01 /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41489  41479  41479 ?            -1 S      187   0:02  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41490  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41491  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41492  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41493  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41494  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41495  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41496  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41497  41479  41479 ?            -1 S      187   0:01  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41498  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41499  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41500  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41501  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41502  41479  41479 ?            -1 S      187   0:21  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41503  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41504  41479  41479 ?            -1 S      187   0:21  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41505  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41506  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41507  41479  41479 ?            -1 S      187   0:21  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41508  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41509  41479  41479 ?            -1 S      187   5:06  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41510  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41511  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf
 41479  41512  41479  41479 ?            -1 S      187   0:00  \_ /usr/bin/python /usr/bin/heat-api --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat.conf


Undercloud CPU information:

[root@undercloud heat]# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                24
On-line CPU(s) list:   0-23
Thread(s) per core:    2
Core(s) per socket:    12
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
Stepping:              1
CPU MHz:               2522.781
CPU max MHz:           2900.0000
CPU min MHz:           1200.0000
BogoMIPS:              4394.90
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0-23

Comment 39 bigswitch 2017-09-21 16:41:20 UTC
I could not attach the logs , I have uploaded to google drive

https://drive.google.com/drive/u/0/folders/0B07f4p28b_XUTlBpbEdlcEVSQkE

Comment 40 bigswitch 2017-09-21 17:17:47 UTC
g[3341]: + DEPLOYMENTS='2837b61c-4c89-4f31-b44e-3f8c9c91a47chttp://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-ComputeAllNodesDeploymen
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: 3f1ed3d0-0119-4529-ac56-bcc7a45ff411http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-ComputeHostsDeployment-d4cqfnvk5ney%2Fb
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: d03ad8ca-e402-4caa-9526-e08d9c12600ahttp://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-ComputeHostsDeployment-d4cqfnvk5ney%2Fb
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: cd1c7f6a-5dec-462f-b47d-b5dd37a33b47http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-Compute-cp5hofkq5e7m-40-7aiayi5k3xin%2F
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: decf01da-443f-452a-9808-7bcf3678f3a1http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-Compute-cp5hofkq5e7m-40-7aiayi5k3xin-Co
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: 0d715f97-6b6d-4e2c-b25a-33f20be77b09http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-Compute-cp5hofkq5e7m-40-7aiayi5k3xin%2F
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + DEPLOYED_DIR=/var/lib/os-apply-config-deployments/deployed
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + '[' '!' -d /var/lib/os-apply-config-deployments/deployed ']'
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + for dep in '${DEPLOYMENTS}'
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: ++ echo '2837b61c-4c89-4f31-b44e-3f8c9c91a47chttp://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-ComputeAllNodesDeployment-c6d5
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: ++ sed 's/http.*$//'
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + DEPLOY_ID=2837b61c-4c89-4f31-b44e-3f8c9c91a47c
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: ++ sed 's/^.*http/http/'
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: ++ echo '2837b61c-4c89-4f31-b44e-3f8c9c91a47chttp://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-ComputeAllNodesDeployment-c6d5
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + DEPLOY_URL='http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-ComputeAllNodesDeployment-c6d5abqhvr76%2F84cdd635-20dc-4819-b
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + '[' '!' -f /var/lib/os-apply-config-deployments/deployed/2837b61c-4c89-4f31-b44e-3f8c9c91a47c ']'
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + echo 'Skipping 2837b61c-4c89-4f31-b44e-3f8c9c91a47c, already deployed'
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: Skipping 2837b61c-4c89-4f31-b44e-3f8c9c91a47c, already deployed
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + for dep in '${DEPLOYMENTS}'
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: ++ echo '3f1ed3d0-0119-4529-ac56-bcc7a45ff411http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-ComputeHostsDeployment-d4cqfnv
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: ++ sed 's/http.*$//'
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + DEPLOY_ID=3f1ed3d0-0119-4529-ac56-bcc7a45ff411
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: ++ sed 's/^.*http/http/'
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: ++ echo '3f1ed3d0-0119-4529-ac56-bcc7a45ff411http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-ComputeHostsDeployment-d4cqfnv
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + DEPLOY_URL='http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-ComputeHostsDeployment-d4cqfnvk5ney%2Fb0b78ffc-7770-440b-aa4d
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + '[' '!' -f /var/lib/os-apply-config-deployments/deployed/3f1ed3d0-0119-4529-ac56-bcc7a45ff411 ']'
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + echo 'Signalling os-apply-config deployment 3f1ed3d0-0119-4529-ac56-bcc7a45ff411 http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fov
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: Signalling os-apply-config deployment 3f1ed3d0-0119-4529-ac56-bcc7a45ff411 http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + call_curl_deployment POST 'http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-ComputeHostsDeployment-d4cqfnvk5ney%2Fb0b78ffc
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + local method=POST
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + local 'url=http://192.0.2.1:8000/v1/signal/arn%3Aopenstack%3Aheat%3A%3Aaa18eefdc57e47a8ad2035671f41792d%3Astacks%2Fovercloud-ComputeHostsDeployment-d4cqfnvk5ney%2Fb0b78ffc-7770-440b-aa4d-
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + local 'stdout=os-apply-config deployment 3f1ed3d0-0119-4529-ac56-bcc7a45ff411 completed'
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: ++ mktemp
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + local output=/tmp/tmp.J0aBuS5SQQ
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: ++ curl -s -w '%{http_code}' -X POST -H 'Content-Type: application/json' -o /tmp/tmp.J0aBuS5SQQ --data-binary '{"deploy_stdout": "os-apply-config deployment 3f1ed3d0-0119-4529-ac56-bcc7a45f
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: + status=000
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: [2017-09-21 10:16:26,617] (os-refresh-config) [ERROR] during post-configure phase. [Command '['dib-run-parts', '/usr/libexec/os-refresh-config/post-configure.d']' returned non-zero exit sta
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: [2017-09-21 10:16:26,617] (os-refresh-config) [ERROR] Aborting...
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: Command failed, will not cache new data. Command 'os-refresh-config --timeout 14400' returned non-zero exit status 1
Sep 21 10:16:26 overcloud-compute-40.bigswitch.com os-collect-config[3341]: Sleeping 1.00 seconds before re-exec.
Sep 21 10:16:28 overcloud-compute-40.bigswitch.com os-collect-config[3341]: /var/lib/os-collect-config/local-data not found. Skipping
Sep 21 10:16:28 overcloud-compute-40.bigswitch.com os-collect-config[3341]: No local metadata found (['/var/lib/os-collect-config/local-data'])
Sep 21 10:16:28 overcloud-compute-40.bigswitch.com os-collect-config[3341]: [2017-09-21 10:16:28,471] (os-refresh-config) [INFO] Starting phase pre-configure
Sep 21 10:16:28 overcloud-compute-40.bigswitch.com os-collect-config[3341]: dib-run-parts Thu Sep 21 10:16:28 PDT 2017 Running /usr/libexec/os-refresh-config/pre-configure.d/06-rhel-registration

Comment 41 Lukas Bezdicka 2018-06-28 14:58:42 UTC
Hi, I seemd to have hit similar issue.

I got through the stuck nodes by these steps:
openstack stack resource list overcloud -n | grep -v COMPLETE
heat deployment-show <deployment-id>
heat config-show <config-id>
curl -X POST <url in config>

This looks like heat-api-cfn failing to let heat know. After I updated the workers of heat to 16 and heat-api-cfn workers to 24 I didn't hit this issue... so far.

Comment 43 Lukas Bezdicka 2018-06-28 22:04:19 UTC
hieradata_override file that allowed me to deploy ~150 nodes:

swift::proxy::workers: "24"
glance::api::workers: "24"
glance::registry::workers: "24"
heat::engine::num_engine_workers: "24"
heat::api::workers: "24"
heat::api_cfn::workers: "24"
keystone::admin_workers: "24"
keystone::public_workers: "24"
neutron::server::api_workers: "24"
neutron::server::rpc_workers: "24"
neutron::agents::metadata::metadata_workers: "24"
nova::api::osapi_compute_workers: "24"
nova::api::metadata_workers: "24"
nova::conductor::workers: "24"
ironic::api::workers: "24"
mistral::api::api_workers: "24"
swift::proxy::workers: "24"
keystone::wsgi::apache::threads: "1"
keystone::wsgi::apache::workers: "64"
ceilometer::wsgi::apache::threads: "1"
ceilometer::wsgi::apache::workers: "64"
aodh::wsgi::apache::threads: "1"
aodh::wsgi::apache::workers: "64"
nova::wsgi::apache::threads: "1"
nova::wsgi::apache::workers: "64"

Comment 53 Alex McLeod 2018-09-03 08:01:07 UTC
Hi there,

If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field.

The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to -.

Thanks,
Alex

Comment 55 errata-xmlrpc 2018-09-17 16:54:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2670


Note You need to log in before you can comment on or make changes to this bug.