We need to deploy the availability monitoring solution that Ggillies has put together. http://file.bne.redhat.com/~ggillies/optools_doc/ Use case: As an operator, I need to be able to validate that openstack services are correctly functioning Satisfaction criterias: * solution is automatically deployed when the appropriate option is activated * solution is documented
*** Bug 1290250 has been marked as a duplicate of this bug. ***
Under the new HA architecture that is planned for OSP 10, most/all of the A/A OpenStack services will be managed by systemd and will be able to start, stop, restart independently when needed. We will need to monitor and alert on services that are stopped, that will not start or that are in a constant state of restarting. Should this be added to this BZ or should there be another to track this work?
These changes have been merged upstream (https://review.openstack.org/#/c/254788/)
Tested with openstack-tripleo-heat-templates-5.0.0-0.20160907212643.90c852e.1.el7ost.noarch Stack deployment failed failed. Here is the deploy command I used: openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e monitoring-environment.yaml --control-scale 3 --compute-scale 1 --ntp-server 10.11.160.238 Here is the error I got: ------------------------------------------------------------------------------ [stack@puma42 ~]$ openstack stack failures list overcloud WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils overcloud.ControllerAllNodesValidationDeployment: resource_type: OS::Heat::StructuredDeployments physical_resource_id: 540c6ff6-ee1a-4303-b9da-114edf813654 status: CREATE_FAILED status_reason: | CREATE aborted overcloud.ControllerNodesPostDeployment.ControllerPrePuppet.ControllerPrePuppetMaintenanceModeDeployment: resource_type: OS::Heat::SoftwareDeployments physical_resource_id: 664bab56-185f-46f8-b62b-190fb897258a status: CREATE_FAILED status_reason: | CREATE aborted overcloud.ControllerNodesPostDeployment.ControllerArtifactsDeploy: resource_type: OS::Heat::StructuredDeployments physical_resource_id: 53f77add-767c-46c0-b433-43d88783af5d status: CREATE_FAILED status_reason: | CREATE aborted overcloud.ComputeNodesPostDeployment.ComputeOvercloudServicesDeployment_Step3.0: resource_type: OS::Heat::StructuredDeployment physical_resource_id: 563feda4-77b5-46fd-a866-307c444adcfa status: CREATE_FAILED status_reason: | Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6 deploy_stdout: | ... Notice: /Stage[main]/Sensu::Enterprise::Dashboard/Anchor[sensu::enterprise::dashboard::end]: Dependency File[/etc/sensu/handlers] has failures: true Notice: /Stage[main]/Sensu::Enterprise::Dashboard/Anchor[sensu::enterprise::dashboard::end]: Dependency File[/etc/sensu/extensions] has failures: true Notice: /Stage[main]/Sensu::Enterprise::Dashboard/Anchor[sensu::enterprise::dashboard::end]: Dependency File[/etc/sensu/mutators] has failures: true Notice: /Stage[main]/Sensu::Enterprise::Dashboard/Anchor[sensu::enterprise::dashboard::end]: Dependency File[/etc/sensu/plugins] has failures: true Notice: /Stage[main]/Sensu/Anchor[sensu::end]: Dependency File[/etc/sensu/conf.d] has failures: true Notice: /Stage[main]/Sensu/Anchor[sensu::end]: Dependency File[/etc/sensu/handlers] has failures: true Notice: /Stage[main]/Sensu/Anchor[sensu::end]: Dependency File[/etc/sensu/extensions] has failures: true Notice: /Stage[main]/Sensu/Anchor[sensu::end]: Dependency File[/etc/sensu/mutators] has failures: true Notice: /Stage[main]/Sensu/Anchor[sensu::end]: Dependency File[/etc/sensu/plugins] has failures: true Notice: Finished catalog run in 3.64 seconds (truncated, view all with --long) deploy_stderr: | ... Warning: /Stage[main]/Sensu::Redis::Config/Sensu_redis_config[overcloud-novacompute-0.localdomain]: Skipping because of failed dependencies Warning: /Stage[main]/Sensu::Client::Config/Sensu_client_config[overcloud-novacompute-0.localdomain]: Skipping because of failed dependencies Warning: /Stage[main]/Sensu::Client::Config/File[/etc/sensu/conf.d/client.json]: Skipping because of failed dependencies Warning: /Stage[main]/Sensu::Client::Service/Service[sensu-client]: Skipping because of failed dependencies Warning: /Stage[main]/Sensu::Api::Service/Service[sensu-api]: Skipping because of failed dependencies Warning: /Stage[main]/Sensu::Server::Service/Service[sensu-server]: Skipping because of failed dependencies Warning: /Stage[main]/Sensu::Enterprise::Dashboard/Anchor[sensu::enterprise::dashboard::begin]: Skipping because of failed dependencies Warning: /Package[sensu-enterprise-dashboard]: Skipping because of failed dependencies Warning: /Stage[main]/Sensu::Enterprise::Dashboard/Anchor[sensu::enterprise::dashboard::end]: Skipping because of failed dependencies Warning: /Stage[main]/Sensu/Anchor[sensu::end]: Skipping because of failed dependencies (truncated, view all with --long)
mike, can someone from ReleaseDelivery dfg take this one and look into the downstream image builds?
openstack-tripleo-heat-templates-5.0.0-0.6.0rc3.el7ost.noarch Availability Monitoring successfully deployed by tripleo using monitoring-environment.yaml template. sensu-client started on all overcloud nodes and configuration files was properly configured.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-2948.html