Bug 1038704 (ceil-bp-central-agent-improve)

Summary: [RFE][ceilometer]: improve the ceilometer central agent
Product: Red Hat OpenStack Reporter: Stephen Gordon <sgordon>
Component: openstack-ceilometerAssignee: Eoghan Glynn <eglynn>
Status: CLOSED NOTABUG QA Contact: Shai Revivo <srevivo>
Severity: unspecified Docs Contact:
Priority: high    
Version: unspecifiedCC: jruzicka, marius.borze, markmc, nbarcet, pbrady, yeylon
Target Milestone: ---Keywords: FutureFeature
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
URL: https://blueprints.launchpad.net/ceilometer/+spec/central-agent-improvement
Whiteboard: upstream_milestone_none upstream_status_not-started upstream_definition_obsolete
Fixed In Version: Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-08-10 14:00:41 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 799011, 1038706    

Description Stephen Gordon 2013-12-05 16:21:38 UTC
Cloned from launchpad blueprint https://blueprints.launchpad.net/ceilometer/+spec/central-agent-improvement.

Description:

This is the umbrella blueprint as the result of OpenStack HongKong summit design session https://etherpad.openstack.org/p/icehouse-summit-ceilometer-central-agent

Specification URL (additional information):

None

Comment 2 Eoghan Glynn 2014-01-28 13:37:44 UTC
To clarify, the main focus of this RFE is the ability to horizontally scale the ceilometer-central agent.

Currently, such scale-out is not possible without duplicating the polling of the public REST APIs from which the central agent generates sample data. Such duplication would add unnecessary traffic on the message bus, result in duped samples in the metering store and potentially lead ultimately to double-charging the user.

Horizontal scale-out is not possible currently because there is no co-ordination mechanism to divide the workload among multiple central agents (unlike say the partitioned alarm evaluator).

So the focus of the implementation will be rebasing the central agent on some co-ordination protocol, with the intention that this would be sufficiently general to be implemented in Oslo[1] and shared among several services.

Tooz[2] was one potential concrete protocol considered as the underpinnings for the generic synchronization service.

In terms of testing these improvements to the central agent, the key would be to spin up multiple instances of the agent and then check for effective duplication in the samples gathered for the meters for which the central is responsible for gathering.

For example, the image meter should be gathered once for every existing image glance in every polling period (default 600s). The samples observed being more frequent that the interval defined in the pipeline.yaml would suggest duplication, hence a lack of correct support for horizontal scaling.

Depending on the detailed mechanism used for co-ordination, testing should also assert that the pool of central agents can be grown and shrunk dynamically without causing duplication or starvation of any meters.

[1] https://blueprints.launchpad.net/oslo/+spec/service-sync 
[2] https://github.com/stackforge/tooz

Comment 4 Nick Barcet 2015-08-10 14:00:41 UTC
This was implemented via another RFE.