Bug 1572341 - ceilometer arithmetic transformer consistently dropping samples
Summary: ceilometer arithmetic transformer consistently dropping samples
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ceilometer
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 10.0 (Newton)
Assignee: Julien Danjou
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-26 18:14 UTC by Andrew Ludwar
Modified: 2022-08-09 10:52 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-09-03 14:59:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-8991 0 None None None 2022-08-09 10:52:04 UTC

Description Andrew Ludwar 2018-04-26 18:14:55 UTC
Description of problem:

Customer has configured the overcloud ceilometer services to perform hardware polling via SNMP of the overcloud compute nodes. They have a requirement to have SNMP hardware data queryable by gnocchi API. They've configured an arithmetic transformer to gather this data, but its consistently dropping samples. We noticed a method in arithmetic.py that seems to be missing returning a sample. We've added this and the samples are no longer being dropped and are appearing in gnocchi measures show.


Version-Release number of selected component (if applicable):

OSP10

openstack-ceilometer-api-7.1.1-6.el7ost.noarch
openstack-ceilometer-central-7.1.1-6.el7ost.noarch
openstack-ceilometer-collector-7.1.1-6.el7ost.noarch
openstack-ceilometer-common-7.1.1-6.el7ost.noarch
openstack-ceilometer-compute-7.1.1-6.el7ost.noarch
openstack-ceilometer-notification-7.1.1-6.el7ost.noarch
openstack-ceilometer-polling-7.1.1-6.el7ost.noarch
puppet-ceilometer-9.5.0-4.el7ost.noarch
python-ceilometer-7.1.1-6.el7ost.noarch
python-ceilometerclient-2.6.2-1.el7ost.noarch
python-ceilometermiddleware-0.5.2-1.el7ost.noarch

How reproducible:

Every time.

Steps to Reproduce:
1. Deploy OSP10 and setup SNMP hardware polling with new sinks and transforms (details in below comment).
2. Observe in ceilometer agent-notification.log that samples of this transformer are always being dropped.

Actual results:

Samples with arithmetic transformer are consistently being dropped.

Expected results:

Samples with arithmetic transformer should not be dropped, and passed onto storage into gnocchi.


Additional info:

We've seemed to alleviate the sample dropping issue with the following code change:

~~~
  - /usr/lib/python2.7/site-packages/ceilometer/pipeline.py
  
    443 class SampleSink(Sink):
    444 
    445     NAMESPACE = 'ceilometer.publisher'
    446 
    447     def _transform_sample(self, start, sample):
    448         try:
    449             for transformer in self.transformers[start:]:
    450                 sample = transformer.handle_sample(sample)              <=== Expects return of sample.
    451                 if not sample:
    452                     LOG.debug(
    453                         "Pipeline %(pipeline)s: Sample dropped by "
    454                         "transformer %(trans)s", {'pipeline': self,
    455                                                   'trans': transformer})
    456                     return
    457             return sample

  - /usr/lib/python2.7/site-packages/ceilometer/transformer/arithmetic.py
  
     99     def handle_sample(self, _sample):                                    <==== Not returning any value.
    100         self._update_cache(_sample)
    101         self.latest_timestamp = _sample.timestamp
    102
~~~

with the code change: (No drops)

~~~
    def handle_sample(self, _sample):
        self._update_cache(_sample)
        self.latest_timestamp = _sample.timestamp
        return _sample                                                           <====  change
~~~

Comment 2 Julien Danjou 2018-05-04 14:04:15 UTC
I don't think the sample is supposed to be returned by this here: it's cached and returned on flush() when actually computing.

The problem with transformers is that if the samples are handled by different agents, they can't compute correctly the transformation. That mechanism is by the way being deprecated in OSP14 for that reasons.

You could try to use workload_partitioning=true (see https://docs.openstack.org/ceilometer/pike/admin/telemetry-best-practices.html) but this is also a quite fragile feature AFAIK.

In summary, I'd advise to build something without leveraging transformers if possible.


Note You need to log in before you can comment on or make changes to this bug.