Bug 1130372
| Summary: | [RFE][ceilometer]: Rebase partitioned alarm evaluation on tooz | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | RHOS Integration <rhos-integ> |
| Component: | openstack-ceilometer | Assignee: | Eoghan Glynn <eglynn> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Amit Ugol <augol> |
| Severity: | medium | Docs Contact: | |
| Priority: | low | ||
| Version: | unspecified | CC: | chdent, eglynn, jruzicka, markmc, pbrady, sgordon, slong, yeylon |
| Target Milestone: | ga | Keywords: | FutureFeature |
| Target Release: | 6.0 (Juno) | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| URL: | https://blueprints.launchpad.net/ceilometer/+spec/hash-based-alarm-partitioning | ||
| Whiteboard: | upstream_milestone_juno-3 upstream_definition_approved upstream_status_implemented | ||
| Fixed In Version: | openstack-ceilometer-2014.2-2.el7 | Doc Type: | Enhancement |
| Doc Text: |
Tooz-driven group membership coordination is now used, which allows multiple ceilometer-alarm-evaluator services to share the workload. The group membership-based solution provides a simple but robust technique for managing workload sharing that is less problematic than the previous RPC-fanout-based solution. Alarm evaluators can now be set up on multiple nodes using configuration for individual instances; if an evaluator fails, its workload is transferred to the other evaluators.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2015-02-09 20:05:06 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
RHOS Integration
2014-08-15 04:03:22 UTC
Here's an approach to test the alarm partitioning using tooz.
Ensure memcached is running on the controller host (it should be in any case for keystone):
$ sudo rpm -qa memcached
$ sudo service memcached status
Configure the tooz backend as memcache with the following setting in the /etc/ceilometer/ceilometer.conf:
[coordination]
backend_url = memcached://CONTROLLER_HOSTNAME
Restart the alarm-evaluator service and ensure that it's evaluating zero alarms initially:
$ sudo service openstack-ceilometer-alarm-evaluator restart
$ tail -f /var/log/ceilometer/alarm-evaluator.log | grep 'initiating evaluation cycle'
... INFO ceilometer.alarm.service [-] initiating evaluation cycle on 0 alarms
Spin up a nova instance in the usual way if not already running, and record the instance ID:
$ INSTANCE_ID=`nova list | awk '/ACTIVE/ {print $2}' | head -1`
For speed of testing, modify the interval on the cpu_source in /etc/ceilometer/pipeline.yaml from 600s to 60s, then restart the compute agent:
$ sudo service openstack-ceilometer-compute restart
Create a sequence of alarms based on the CPU utilization of that instance:
$ for i in {0..20}
do
ceilometer alarm-threshold-create --name cpu_high_$i --description 'instance running hot' \
--meter-name cpu_util --threshold 70.0 --comparison-operator gt \
--statistic avg --period 60 --evaluation-periods 1 \
--alarm-action 'log://' --query resource_id=$INSTANCE_ID
done
Check that all 21 alarms are being evaluated by the single evaluator:
$ tail -f /var/log/ceilometer/alarm-evaluator.log | grep 'initiating evaluation cycle'
... INFO ceilometer.alarm.service [-] initiating evaluation cycle on 21 alarms
Ensure that the alarm state flips to ok (or alarm if the instance is indeed running hot) within 60s:
$ sleep 60 ; ceilometer alarm-list | grep cpu_high
Either install & run the openstack-ceilometer-alarm-evaluator on a second node, or as shortcut simply launch a second process on the controller:
$ sudo /usr/bin/python /usr/bin/ceilometer-alarm-evaluator --logfile /var/log/ceilometer/alarm-evaluator-2.log &
Check that an approximately even split of alarms is allocated to each evaluator:
$ tail -f /var/log/ceilometer/alarm-evaluator.log | grep 'initiating evaluation cycle'
$ tail -f /var/log/ceilometer/alarm-evaluator-2.log | grep 'initiating evaluation cycle'
To ensure every alarm is being actively evaluated, flip all alarms back to 'insufficient data' and then expect them all to revert to the appropriate state (ok or alarm) on the next evaluation cycle.
$ for a in `ceilometer alarm-list | awk '/cpu_high_/ {print $2}'`
do
ceilometer alarm-state-set -a $a --state "insufficient data"
done
$ sleep 60 ; ceilometer alarm-list | grep cpu_high
Kill one of alarm-evaluators and ensure that all alarms are then allocated to the surviving evaluator. Repeat the step above to flip all alarms to 'insufficient data' and check that all are being actively evaluated as they revert to the appropriate state (ok or alarm) on the next evaluation cycle.
One additional setup step for the above, before restarting the alarm-evalator service: Ensure that the alarm.evaluation_service config option is set to the upstream default (and not overridden to "ceilometer.alarm.service.SingletonAlarmService"), i.e. in /etc/ceilometer/ceilometer.conf: [alarm] # ... evaluation_service=default In addition to the doc notes above, please note that of the configuration settings described above this one could do with some adjustment:
[coordination]
backend_url = memcached://CONTROLLER_HOSTNAME
The preferred backend is now redis and where possible redis+sentinel.
This bug has been closed as a part of the RHEL-OSP 6 general availability release. For details, see https://rhn.redhat.com/errata/rhel7-rhos-6-errata.html |