Bug 1516285

Summary: collectd ceph plugin crashes
Product: Red Hat OpenStack Reporter: Matthias Runge <mrunge>
Component: collectdAssignee: Matthias Runge <mrunge>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: urgent Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: apannu, jbadiapa, jschluet, mmagr, mrunge, rmccabe
Target Milestone: Upstream M1Keywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: collectd-5.8.0-1.1.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:08:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1558015    
Bug Blocks:    

Description Matthias Runge 2017-11-22 12:17:45 UTC
Description of problem:
Nov 22 12:53:34 euler systemd: Starting Collectd statistics daemon...
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "syslog" successfully loaded.
Nov 22 12:53:34 euler collectd: [2017-11-22 12:53:35] plugin_load: plugin "logfile" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "logfile" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: syslog: invalid loglevel [debug] defaulting to 'info'
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "cpu" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "interface" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "load" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "memory" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "write_graphite" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "ceph" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "df" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "disk" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "virt" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: Systemd detected, trying to signal readyness.
Nov 22 12:53:34 euler systemd: Started Collectd statistics daemon.
Nov 22 12:53:34 euler collectd[5437]: virt plugin: reader virt-0 initialized
Nov 22 12:53:34 euler collectd[5437]: Initialization complete, entering read-loop.
Nov 22 12:53:34 euler kernel: reader#3[5446]: segfault at 0 ip 00007f1bf12698dd sp 00007f1be5272e00 error
 4 in ceph.so[7f1bf1268000+5000]
Nov 22 12:53:34 euler abrt-hook-ccpp: Process 5437 (collectd) of user 0 killed by SIGSEGV - ignoring (repeated crash)
Nov 22 12:53:34 euler libvirtd: 2017-11-22 11:53:34.570+0000: 1461: error : virNetSocketReadWire:1793 : Cannot recv data: Connection reset by peer
Nov 22 12:53:34 euler systemd: collectd.service: main process exited, code=killed, status=11/SEGV
Nov 22 12:53:34 euler systemd: Unit collectd.service entered failed state.


Version-Release number of selected component (if applicable):
collectd-5.8

Comment 3 Matthias Runge 2017-11-27 13:36:49 UTC
There is a commit message in collect-5.8: https://github.com/collectd/collectd/commit/647ac31bf9db60b1685d6d8d25be65375ba85891#diff-20b37368527caaa7f0318870e8cefd51

"""
This patch is not backward compatible with previous ceph versions.
"""

Comment 5 Matthias Runge 2017-11-29 09:12:29 UTC
It seems, the crash only happens, when the option 
ConvertSpecialMetricTypes is set to true. Explicitly setting it to false, makes the plugin work even with older ceph releases.

Comment 6 Matthias Runge 2017-12-05 07:27:28 UTC
Proposed fix usptream: https://github.com/collectd/collectd/commit/de05fb53fad6bc998f585b704ca0caeadc14a035

Comment 14 Leonid Natapov 2018-02-19 09:01:49 UTC
Please,provide instructions how to test/configure

Thank you,

Comment 15 Matthias Runge 2018-02-19 10:54:36 UTC
https://collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_ceph

In my case, the ceph config file looks like:

<LoadPlugin ceph>
  Globals false
</LoadPlugin>  
<Plugin "ceph">
    LongRunAvgLatency false
    ConvertSpecialMetricTypes false
    <Daemon "osd.0">
      SocketPath "/var/run/ceph/ceph-osd.0.asok"
    </Daemon>
</Plugin>

You'd probably need to figure out, where your ceph*.asok is stored.

and tons of ceph related metrics will show up in grafana, all beginning with "ceph_"

Comment 17 Leonid Natapov 2018-05-30 07:31:33 UTC
[2018-05-30 07:00:39] plugin_load: plugin "ceph" successfully loaded.

Comment 19 errata-xmlrpc 2018-06-27 13:08:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2084