Bug 1290251 - [RFE][OpsTools] We need an availability monitoring solution deployed by director.
Summary: [RFE][OpsTools] We need an availability monitoring solution deployed by direc...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 10.0 (Newton)
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: 10.0 (Newton)
Assignee: Martin Magr
QA Contact: Leonid Natapov
URL:
Whiteboard:
: 1290250 (view as bug list)
Depends On: 1379538
Blocks: 1398468
TreeView+ depends on / blocked
 
Reported: 2015-12-09 22:57 UTC by Nick Barcet
Modified: 2017-07-10 09:37 UTC (History)
17 users (show)

Fixed In Version: openstack-tripleo-heat-templates-5.0.0-0.20160907212643.90c852e.1.el7ost
Doc Type: Enhancement
Doc Text:
With this update, a new feature to enable connecting the overcloud to a monitoring infrastructure adds availability monitoring agents (sensu-client) to be deployed on the overcloud nodes. To enable the monitoring agents deployment, use the environment file '/usr/share/openstack/tripleo-heat-templates/environments/monitoring-environment.yaml' and fill in the following parameters in the configuration YAML file: MonitoringRabbitHost: host where the RabbitMQ instance for monitoring purposes is running MonitoringRabbitPort: port on which the RabbitMQ instance for monitoring purposes is running MonitoringRabbitUserName: username to connect to RabbitMQ instance MonitoringRabbitPassword: password to connect to RabbitMQ instance MonitoringRabbitVhost: RabbitMQ vhost used for monitoring purposes
Clone Of:
Environment:
Last Closed: 2016-12-14 15:19:40 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2016:2948 normal SHIPPED_LIVE Red Hat OpenStack Platform 10 enhancement update 2016-12-14 19:55:27 UTC
OpenStack gerrit 254788 None None None 2016-08-17 09:19:29 UTC
OpenStack gerrit 349690 None None None 2016-08-17 09:20:07 UTC

Description Nick Barcet 2015-12-09 22:57:34 UTC
We need to deploy the availability monitoring solution that Ggillies has put together.
http://file.bne.redhat.com/~ggillies/optools_doc/

Use case:
As an operator, I need to be able to validate that openstack services are correctly functioning

Satisfaction criterias:
* solution is automatically deployed when the appropriate option is activated
* solution is documented

Comment 2 Nick Barcet 2015-12-09 23:22:51 UTC
*** Bug 1290250 has been marked as a duplicate of this bug. ***

Comment 3 Rob Young 2016-06-14 15:04:15 UTC
Under the new HA architecture that is planned for OSP 10, most/all of the A/A OpenStack services will be managed by systemd and will be able to start, stop, restart independently when needed. We will need to monitor and alert on services that are stopped, that will not start or that are in a constant state of restarting. Should this be added to this BZ or should there be another to track this work?

Comment 4 Lars Kellogg-Stedman 2016-09-19 18:49:48 UTC
These changes have been merged upstream (https://review.openstack.org/#/c/254788/)

Comment 6 Leonid Natapov 2016-09-26 08:22:24 UTC
Tested with openstack-tripleo-heat-templates-5.0.0-0.20160907212643.90c852e.1.el7ost.noarch
Stack deployment failed failed. 

Here is the deploy command I used:
openstack overcloud deploy --templates  -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml -e monitoring-environment.yaml --control-scale 3 --compute-scale 1 --ntp-server 10.11.160.238

Here is the error I got:
------------------------------------------------------------------------------
[stack@puma42 ~]$ openstack stack failures list overcloud
WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils
overcloud.ControllerAllNodesValidationDeployment:
  resource_type: OS::Heat::StructuredDeployments
  physical_resource_id: 540c6ff6-ee1a-4303-b9da-114edf813654
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted
overcloud.ControllerNodesPostDeployment.ControllerPrePuppet.ControllerPrePuppetMaintenanceModeDeployment:
  resource_type: OS::Heat::SoftwareDeployments
  physical_resource_id: 664bab56-185f-46f8-b62b-190fb897258a
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted
overcloud.ControllerNodesPostDeployment.ControllerArtifactsDeploy:
  resource_type: OS::Heat::StructuredDeployments
  physical_resource_id: 53f77add-767c-46c0-b433-43d88783af5d
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted
overcloud.ComputeNodesPostDeployment.ComputeOvercloudServicesDeployment_Step3.0:
  resource_type: OS::Heat::StructuredDeployment
  physical_resource_id: 563feda4-77b5-46fd-a866-307c444adcfa
  status: CREATE_FAILED
  status_reason: |
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 6
  deploy_stdout: |
    ...
    Notice: /Stage[main]/Sensu::Enterprise::Dashboard/Anchor[sensu::enterprise::dashboard::end]: Dependency File[/etc/sensu/handlers] has failures: true
    Notice: /Stage[main]/Sensu::Enterprise::Dashboard/Anchor[sensu::enterprise::dashboard::end]: Dependency File[/etc/sensu/extensions] has failures: true
    Notice: /Stage[main]/Sensu::Enterprise::Dashboard/Anchor[sensu::enterprise::dashboard::end]: Dependency File[/etc/sensu/mutators] has failures: true
    Notice: /Stage[main]/Sensu::Enterprise::Dashboard/Anchor[sensu::enterprise::dashboard::end]: Dependency File[/etc/sensu/plugins] has failures: true
    Notice: /Stage[main]/Sensu/Anchor[sensu::end]: Dependency File[/etc/sensu/conf.d] has failures: true
    Notice: /Stage[main]/Sensu/Anchor[sensu::end]: Dependency File[/etc/sensu/handlers] has failures: true
    Notice: /Stage[main]/Sensu/Anchor[sensu::end]: Dependency File[/etc/sensu/extensions] has failures: true
    Notice: /Stage[main]/Sensu/Anchor[sensu::end]: Dependency File[/etc/sensu/mutators] has failures: true
    Notice: /Stage[main]/Sensu/Anchor[sensu::end]: Dependency File[/etc/sensu/plugins] has failures: true
    Notice: Finished catalog run in 3.64 seconds
    (truncated, view all with --long)
  deploy_stderr: |
    ...
    Warning: /Stage[main]/Sensu::Redis::Config/Sensu_redis_config[overcloud-novacompute-0.localdomain]: Skipping because of failed dependencies
    Warning: /Stage[main]/Sensu::Client::Config/Sensu_client_config[overcloud-novacompute-0.localdomain]: Skipping because of failed dependencies
    Warning: /Stage[main]/Sensu::Client::Config/File[/etc/sensu/conf.d/client.json]: Skipping because of failed dependencies
    Warning: /Stage[main]/Sensu::Client::Service/Service[sensu-client]: Skipping because of failed dependencies
    Warning: /Stage[main]/Sensu::Api::Service/Service[sensu-api]: Skipping because of failed dependencies
    Warning: /Stage[main]/Sensu::Server::Service/Service[sensu-server]: Skipping because of failed dependencies
    Warning: /Stage[main]/Sensu::Enterprise::Dashboard/Anchor[sensu::enterprise::dashboard::begin]: Skipping because of failed dependencies
    Warning: /Package[sensu-enterprise-dashboard]: Skipping because of failed dependencies
    Warning: /Stage[main]/Sensu::Enterprise::Dashboard/Anchor[sensu::enterprise::dashboard::end]: Skipping because of failed dependencies
    Warning: /Stage[main]/Sensu/Anchor[sensu::end]: Skipping because of failed dependencies
    (truncated, view all with --long)

Comment 10 James Slagle 2016-10-06 17:06:53 UTC
mike, can someone from ReleaseDelivery dfg take this one and look into the downstream image builds?

Comment 23 Leonid Natapov 2016-10-31 09:56:10 UTC
openstack-tripleo-heat-templates-5.0.0-0.6.0rc3.el7ost.noarch

Availability Monitoring successfully deployed by tripleo using monitoring-environment.yaml template. sensu-client started on all overcloud nodes and configuration files was properly configured.

Comment 26 errata-xmlrpc 2016-12-14 15:19:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-2948.html


Note You need to log in before you can comment on or make changes to this bug.