Bug 1572341

Summary: ceilometer arithmetic transformer consistently dropping samples
Product: Red Hat OpenStack Reporter: Andrew Ludwar <aludwar>
Component: openstack-ceilometerAssignee: Julien Danjou <jdanjou>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Sasha Smolyak <ssmolyak>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: aludwar, jdanjou, jruzicka, srevivo
Target Milestone: ---Keywords: Triaged, ZStream
Target Release: 10.0 (Newton)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-03 14:59:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrew Ludwar 2018-04-26 18:14:55 UTC
Description of problem:

Customer has configured the overcloud ceilometer services to perform hardware polling via SNMP of the overcloud compute nodes. They have a requirement to have SNMP hardware data queryable by gnocchi API. They've configured an arithmetic transformer to gather this data, but its consistently dropping samples. We noticed a method in arithmetic.py that seems to be missing returning a sample. We've added this and the samples are no longer being dropped and are appearing in gnocchi measures show.


Version-Release number of selected component (if applicable):

OSP10

openstack-ceilometer-api-7.1.1-6.el7ost.noarch
openstack-ceilometer-central-7.1.1-6.el7ost.noarch
openstack-ceilometer-collector-7.1.1-6.el7ost.noarch
openstack-ceilometer-common-7.1.1-6.el7ost.noarch
openstack-ceilometer-compute-7.1.1-6.el7ost.noarch
openstack-ceilometer-notification-7.1.1-6.el7ost.noarch
openstack-ceilometer-polling-7.1.1-6.el7ost.noarch
puppet-ceilometer-9.5.0-4.el7ost.noarch
python-ceilometer-7.1.1-6.el7ost.noarch
python-ceilometerclient-2.6.2-1.el7ost.noarch
python-ceilometermiddleware-0.5.2-1.el7ost.noarch

How reproducible:

Every time.

Steps to Reproduce:
1. Deploy OSP10 and setup SNMP hardware polling with new sinks and transforms (details in below comment).
2. Observe in ceilometer agent-notification.log that samples of this transformer are always being dropped.

Actual results:

Samples with arithmetic transformer are consistently being dropped.

Expected results:

Samples with arithmetic transformer should not be dropped, and passed onto storage into gnocchi.


Additional info:

We've seemed to alleviate the sample dropping issue with the following code change:

~~~
  - /usr/lib/python2.7/site-packages/ceilometer/pipeline.py
  
    443 class SampleSink(Sink):
    444 
    445     NAMESPACE = 'ceilometer.publisher'
    446 
    447     def _transform_sample(self, start, sample):
    448         try:
    449             for transformer in self.transformers[start:]:
    450                 sample = transformer.handle_sample(sample)              <=== Expects return of sample.
    451                 if not sample:
    452                     LOG.debug(
    453                         "Pipeline %(pipeline)s: Sample dropped by "
    454                         "transformer %(trans)s", {'pipeline': self,
    455                                                   'trans': transformer})
    456                     return
    457             return sample

  - /usr/lib/python2.7/site-packages/ceilometer/transformer/arithmetic.py
  
     99     def handle_sample(self, _sample):                                    <==== Not returning any value.
    100         self._update_cache(_sample)
    101         self.latest_timestamp = _sample.timestamp
    102
~~~

with the code change: (No drops)

~~~
    def handle_sample(self, _sample):
        self._update_cache(_sample)
        self.latest_timestamp = _sample.timestamp
        return _sample                                                           <====  change
~~~

Comment 2 Julien Danjou 2018-05-04 14:04:15 UTC
I don't think the sample is supposed to be returned by this here: it's cached and returned on flush() when actually computing.

The problem with transformers is that if the samples are handled by different agents, they can't compute correctly the transformation. That mechanism is by the way being deprecated in OSP14 for that reasons.

You could try to use workload_partitioning=true (see https://docs.openstack.org/ceilometer/pike/admin/telemetry-best-practices.html) but this is also a quite fragile feature AFAIK.

In summary, I'd advise to build something without leveraging transformers if possible.