Bug 986381
Summary: | [RFE] Alarm partitioning over multiple threshold evaluators | ||
---|---|---|---|
Product: | Red Hat OpenStack | Reporter: | Eoghan Glynn <eglynn> |
Component: | openstack-ceilometer | Assignee: | Eoghan Glynn <eglynn> |
Status: | CLOSED ERRATA | QA Contact: | Kevin Whitney <kwhitney> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 4.0 | CC: | ajeain, eglynn, jruzicka, mlopes, pbrady, sgordon, sradvan, srevivo |
Target Milestone: | rc | Keywords: | FutureFeature, OtherQA |
Target Release: | 4.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
URL: | https://blueprints.launchpad.net/ceilometer/+spec/alarm-service-partitioner | ||
Whiteboard: | |||
Fixed In Version: | openstack-ceilometer-2013.2-0.12.rc2.el6ost | Doc Type: | Enhancement |
Doc Text: |
Feature: Partitioning of alarm evaluation over a horizontally scaled out dynamic pool of workers.
Reason: This enhancement allows the evaluation workload to scale up to encompass many alarms, and also avoids a singleton evaluator becoming a single point of failure.
Result: The alarm.evaluation_service configuration option may be set to ceilometer.alarm.service.PartitionedAlarmService, in which case multiple ceilometer-alarm-evaluator service instances can be started up on different hosts. These replicas will self-organize and divide the evaluation workload among themselves via a group co-ordination protocol based on fanout RPC.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-12-20 00:14:17 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 973191, 975499, 986378, 1055813 |
Description
Eoghan Glynn
2013-07-19 15:58:06 UTC
Merged upstream as an FFE for RC1, so was not in the packages based on havana-3 but will be in the packages rebuilt for Havana RC1: https://github.com/openstack/ceilometer/commit/ede2329e (Note that the above logic also enables the widely scaled threshold evaluation required in BZ 986378). I've been waiting for RC1 to be cut upstream, and then follow-on changes to the re-built openstack-ceilometer-* packages, before writing up a test approach in the BZ. The reason upstream RC1 is a blocker is that an integral part of the mechanism to be tested here landed as an FFE post Havana-3, so is not present in our RPMs as things stand. Now the upstream RC1 was due to be cut late last week, but was delayed to today (Oct 2nd) due to a couple of lagard bug fixes and problems in the tempest gate adding hugely to the gerrit turnaround time. However everything is new landed as of late yesterday, and the release candidate will be cut shortly. I have the packaging folks on notice as to the changes that'll be required, so we should have rebuilt RPMs by end of week, at which point testing can commence. Further information to follow once the new packages are available. New puddle contains the required version of ceilometer: http://download.lab.bos.redhat.com/rel-eng/OpenStack/4.0/2013-10-03.3 How To Test =========== 0. Install packstack allinone, and also on an additional compute node. Ensure the compute agent is gathering metrics at a reasonable cadence (every 60s for example instead of every 10mins as per the default): sudo sed -i '/^ *name: cpu_pipeline$/ { n ; s/interval: 600$/interval: 60/ }' /etc/ceilometer/pipeline.yaml sudo service openstack-ceilometer-compute restart 1. Ensure the ceilometer-alarm-evaluator and ceilometer-alarm-notifier services are running on the controller node: sudo yum install -y openstack-ceilometer-alarm export CEILO_ALARM_SVCS='evaluator notifier' for svc in $CEILO_ALARM_SVCS; do sudo service openstack-ceilometer-alarm-$svc start; done 2. Ensure a second ceilometer-alarm-evaluator service is running on the compute node: sudo yum install -y openstack-ceilometer-alarm export CEILO_ALARM_SVCS='evaluator' for svc in $CEILO_ALARM_SVCS; do sudo service openstack-ceilometer-alarm-$svc start; done 3. Spin up an instance in the usual way: nova boot --image $IMAGE_ID --flavor 1 test_instance 4. Create multiple alarms with thresholds sufficiently low that they are guaranteed to go into alarm: for i in $(seq 10) do ceilometer alarm-threshold-create --name high_cpu_alarm_${i} --description 'instance running hot' \ --meter-name cpu_util --threshold 0.01 --comparison-operator gt --statistic avg \ --period 60 --evaluation-periods 1 \ --alarm-action 'log://' \ --query resource_id=$INSTANCE_ID done 5. Ensure that the alarms are partitioned over the multiple evaluators: tail -f /var/log/alarm-evaluator.log | grep 'initiating evaluation cycle' On each host, expect approximately half the alarms to be evaluated, i.e. '... initiating evaluation cycle on 5 alarms' 6. Ensure all alarms have transitioned to the 'alarm' state: ceilometer alarm-list 7. Create some more alarms: for i in $(seq 10) do ceilometer alarm-threshold-create --name low_cpu_alarm_${i} --description 'instance running cold' \ --meter-name cpu_util --threshold 99.9 --comparison-operator le --statistic avg \ --period 60 --evaluation-periods 1 \ --alarm-action 'log://' \ --query resource_id=$INSTANCE_ID done and also delete a few alarms: ceilometer delete-alarm -a $ALARM_ID and ensure that the alarm allocation is still roughly even between the evaluation services: tail -f /var/log/alarm-evaluator.log | grep 'initiating evaluation cycle' Addition to steps #1 & #2 above: *Before* restarting the ceilometer-alarm-evaluator service, ensure that the partitioned evaluation service is configured: sudo openstack-config --set /etc/ceilometer/ceilometer.conf alarm evaluation_service ceilometer.alarm.service.PartitionedAlarmService Pending the fix for: https://bugzilla.redhat.com/1040404 testing this requires that a less constrained firewall rule is added for the ceilometer-api service: $ INDEX=$(sudo iptables -L | grep -A 20 'INPUT.*policy ACCEPT' | grep -- -- | grep -n ceilometer-api | cut -f1 -d:) $ sudo iptables -I INPUT $INDEX -p tcp --dport 8777 -j ACCEPT This bug can now transition to VERIFIED as the iptables rule workaround is no longer required since openstack-packstack-2013.2.1-0.18.dev934.el6ost was built. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHEA-2013-1859.html |