Bug 1581469

Summary: [RFE] collect-procevent plugin
Product: Red Hat OpenStack Reporter: Martin Magr <mmagr>
Component: collectdAssignee: Matthias Runge <mrunge>
Status: CLOSED ERRATA QA Contact: Leonid Natapov <lnatapov>
Severity: medium Docs Contact:
Priority: medium    
Version: 14.0 (Rocky)CC: abays, achernet, apannu, jbadiapa, lars, marjones, mmagr, mrunge, pkilambi, rmccabe
Target Milestone: Upstream M3Keywords: FutureFeature, Triaged
Target Release: 15.0 (Stein)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: collectd-5.8.0-11.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-21 11:16:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1637778    
Bug Blocks: 1566081    

Description Martin Magr 2018-05-22 20:44:02 UTC
We need collectd-procevent plugin to be available out of the box.

Comment 2 Matthias Runge 2018-06-14 10:08:22 UTC
=head2 Plugin C<procevent>
+ 
+The I<procevent> plugin monitors when processes start (EXEC) and stop (EXIT).
+ 
+B<Synopsis:>
+ 
+  <Plugin procevent>
+    BufferLength 10
+    Process "name"
+    RegexProcess "regex"
+  </Plugin>
+ 
+B<Options:>
+ 
+=over 4
+ 
+=item B<BufferLength> I<length>
+ 
+Maximum number of process events that can be stored in plugin's ring buffer.
+By default, this is set to 10.  Once an event has been read, its location
+becomes available for storing a new event.
+ 
+=item B<Process> I<name>
+ 
+Enumerate a process name to monitor.  All processes that match this exact
+name will be monitored for EXECs and EXITs.
+
+=item B<RegexProcess> I<regex>
+ 
+Enumerate a process pattern to monitor.  All processes that match this 
+regular expression will be monitored for EXECs and EXITs.

Comment 6 Leonid Natapov 2018-06-19 14:27:22 UTC
Hey,Andrew ! 

You can provide a conf file for this plugin along with the test instructions as a comment to this RFE.

Thanks,
Leonid.

Comment 14 Matthias Runge 2019-01-07 07:24:28 UTC
I don't know, why it was moved from on_qa to post, but the fixed-in version is still correct (or any later version).

Comment 18 Leonid Natapov 2019-07-22 11:10:45 UTC
Seems that plugin reports about events that it should not.
The conf files I have used is:
----------------------------------------
LoadPlugin procevent
<Plugin "procevent">
  BufferLength 100
  RegexProcess "/^.*nova.*$/"
  RegexProcess "/^.*neutron.*$/"
</Plugin>
----------------------------------------

I have the following logs in the collectd.log fie:

[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = iptables, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = iptables, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = ip6tables, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = ip6tables, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = podman, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = podman, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = podman, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = podman, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = iptables, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = iptables, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = ip6tables, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = ip6tables, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = runc, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = exe, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = 4, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = 4, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = true, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = true, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = runc, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = podman, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = podman, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = clustercheck, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = mysql, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = tail, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = mysql, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = tail, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = clustercheck, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = ovndb-servers, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = basename, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = basename, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = ovn-ctl, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = tty, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = grep, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = tty, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = grep, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = cat, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = cat, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = cat, type = gauge, type_instance = process_status
[2019-07-22 09:21:21] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = cat, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = ovs-appctl, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = awk, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = ovs-appctl, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = awk, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = ovn-ctl, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = ovn-ctl, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = sed, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = sed, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = tty, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = grep, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = grep, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = tty, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = cat, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = cat, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = cat, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = cat, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = ovs-appctl, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = awk, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = ovs-appctl, type = gauge, type_instance = process_status
[2019-07-22 09:21:22] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = awk, type = gauge, type_instance = process_status

Comment 19 Andrew Bays 2019-07-22 11:23:22 UTC
The filter name has changed as per Collectd community standards: ProcessRegex.  It looks like you're using the old format, "RegexProcess".  I don't know where the source link is for the code you're using, but it should be documented just like the following:

https://github.com/abays/collectd/blob/procevent/src/collectd.conf.pod#plugin-procevent

Comment 21 Andrew Bays 2019-07-22 13:04:42 UTC
My apologies on the confusion.  This plugin and the other two (connectivity and sysevent) have undergone significant refactoring since that document was written.  I will circle back and update it.

I'm not sure what the Collectd standard is for bogus configuration parameter names.  Technically this could happen with any plugin.  It seems like the default behavior is just to ignore invalid names.

Comment 23 Andrew Bays 2019-07-23 10:35:06 UTC
Those are just warnings that those PIDs, which were served to procevent via netlink, did not have an associated comm file.  Procevent needs to read the comm file to know what the name of the associated process is.  So those aren't a problem in and of themselves.  If, however, those were the PIDs for the "nova" and "neutron" processes you were trying to monitor, then it's an issue.  Even still, it's not something over which procevent has any control.  If the netlink socket says there was a PID event, yet there's no comm file for that PID on the file system, there is no action the plugin can take other than to report the discrepancy.

Comment 24 Leonid Natapov 2019-08-07 12:34:18 UTC
collectd.log:[2019-08-07 00:03:11] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = nova-manage, type = gauge, type_instance = process_status
collectd.log:[2019-08-07 00:03:15] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = nova-manage, type = gauge, type_instance = process_status
collectd.log.1:[2019-08-06 00:30:39] Notification: severity = OKAY, host = controller-0.localdomain, plugin = procevent, plugin_instance = nova-manage, type = gauge, type_instance = process_status
collectd.log.1:[2019-08-06 00:30:43] Notification: severity = FAILURE, host = controller-0.localdomain, plugin = procevent, plugin_instance = nova-manage, type = gauge, type_instance = process_status

Comment 26 errata-xmlrpc 2019-09-21 11:16:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:2811