Created attachment 1300111 [details] stdout from openstack overcloud deploy -e ~/environment.yaml --templates Description of problem: With Ocata release, scaling up compute nodes or deleting compute nodes ends with UPDATE_FAILED as the Heat status. Although UPDATE_FAILED is listed as status, a new node is brought up by scale up and the node is deleted successfully when deleting nodes. This problem doesn't happen in Newton. Version-Release number of selected component (if applicable): openstack-tripleo-common-6.1.0-2.el7ost.noarch openstack-heat-api-8.0.2-2.el7ost.noarch openstack-heat-api-cfn-8.0.2-2.el7ost.noarch openstack-heat-common-8.0.2-2.el7ost.noarch openstack-heat-engine-8.0.2-2.el7ost.noarch openstack-tripleo-heat-templates-6.1.0-1.el7ost.noarch openstack-mistral-api-4.0.2-1.el7ost.noarch openstack-mistral-common-4.0.2-1.el7ost.noarch openstack-mistral-engine-4.0.2-1.el7ost.noarch openstack-mistral-executor-4.0.2-1.el7ost.noarch openstack-nova-api-15.0.6-3.el7ost.noarch openstack-nova-cert-15.0.6-3.el7ost.noarch openstack-nova-common-15.0.6-3.el7ost.noarch openstack-nova-compute-15.0.6-3.el7ost.noarch openstack-nova-conductor-15.0.6-3.el7ost.noarch openstack-nova-placement-api-15.0.6-3.el7ost.noarch openstack-nova-scheduler-15.0.6-3.el7ost.noarch How reproducible: Always Steps to Reproduce: 1. Deploy undercloud and overcloud using instack-virt-setup environment. 1 controller and 1 compute node is brought up in overcloud. 2. Scale up overcloud using: [stack@instack ~]$ cat environment.yml parameter_defaults: ComputeCount: 2 openstack overcloud deploy -e ~/environment.yml --templates Actual results: New compute node is brought up, but CLI command and heat status ends with UPDATE_FAILED status. Expected results: New compute node is brought up and SUCCESS is indicated as status. Additional info: See attachments for stdout and openstack service logs.
Created attachment 1300112 [details] heat, nova, ironic, and mistral service logs
Hey folks, is this issue related to a scaling issue or a problem when doing upgrades/updates? Seems to be a stack update problem when scaling up.
It looks like a puppet error but we would need the logs from the host that the error occurred on. Please provide a sosreport from the controller that it failed on. Alternatively the information may show up in an 'openstack stack failures list overcloud'
@Carlos, the issue occurs during scale up.
@Alex, here is the stack failures list: [stack@instack ~]$ openstack stack failures list overcloud overcloud.AllNodesDeploySteps.ControllerDeployment_Step4.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 6ffcbd26-d156-4698-9670-1e47daec0717 status: UPDATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 deploy_stdout: | ... Notice: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv6]: Dependency Package[swift-account] has failures: true Notice: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv6]: Dependency Package[swift-container] has failures: true Notice: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv6]: Dependency Package[swift-object] has failures: true Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/iptables]: Dependency Package[swift-account] has failures: true Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/iptables]: Dependency Package[swift-container] has failures: true Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/iptables]: Dependency Package[swift-object] has failures: true Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/ip6tables]: Dependency Package[swift-account] has failures: true Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/ip6tables]: Dependency Package[swift-container] has failures: true Notice: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/ip6tables]: Dependency Package[swift-object] has failures: true Notice: Applied catalog in 75.06 seconds (truncated, view all with --long) deploy_stderr: | ... Warning: /Stage[main]/Gnocchi::Deps/Anchor[gnocchi::service::begin]: Skipping because of failed dependencies Warning: /Stage[main]/Gnocchi::Api/Service[gnocchi-api]: Skipping because of failed dependencies Warning: /Stage[main]/Apache::Service/Service[httpd]: Skipping because of failed dependencies Warning: /Stage[main]/Keystone::Deps/Anchor[keystone::service::end]: Skipping because of failed dependencies Warning: /Stage[main]/Gnocchi::Deps/Anchor[gnocchi::service::end]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Firewall::Post/Firewall[998 log all]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv4]: Skipping because of failed dependencies Warning: /Stage[main]/Tripleo::Firewall::Post/Tripleo::Firewall::Rule[999 drop all]/Firewall[999 drop all ipv6]: Skipping because of failed dependencies Warning: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/iptables]: Skipping because of failed dependencies Warning: /Stage[main]/Firewall::Linux::Redhat/File[/etc/sysconfig/ip6tables]: Skipping because of failed dependencies (truncated, view all with --long)
Created attachment 1306024 [details] sosreport from undercloud node
Created attachment 1306025 [details] full stack failures list Looks like a not enough memory issue. I have these memory settings in my dev environment. export UNDERCLOUD_NODE_MEM=12288 export NODE_MEM=8192
Correct the error is not enough memory. There is not enough resources on the node being deployed. Is there actually 8G of memory available on your overcloud nodes? Error: /Stage[main]/Tripleo::Profile::Pacemaker::Haproxy/Pacemaker::Resource::Service[haproxy]/Pacemaker::Resource::Systemd[haproxy]/Pcmk_resource[haproxy]: Could not evaluate: Cannot allocate memory - /usr/sbin/pcs Error: /Stage[main]/Swift::Storage::Account/Swift::Storage::Generic[account]/Package[swift-account]: Could not evaluate: Cannot allocate memory - fork(2) Error: /Stage[main]/Swift::Storage::Container/Swift::Storage::Generic[container]/Package[swift-container]: Could not evaluate: Cannot allocate memory - fork(2) Error: /Stage[main]/Swift::Storage::Object/Swift::Storage::Generic[object]/Package[swift-object]: Could not evaluate: Cannot allocate memory - fork(2) Additionally what services are you deploying as that can also affect the memory footprint? If this is a development environment we have 2 basic deployment options available to you right now: You may try enabling swap via -e /usr/share/openstack-tripleo-heat-templates/environments/enable-swap.yaml You may try using -e /usr/share/openstack-tripleo-heat-templates/environments/low-memory-usage.yaml
Closing due to lack of updates. Feel free to open again if continues to be a problem after enabling swap or using the low-memory-usage.yaml