Bug 1579876

Summary: MMV stats disappear rendering pmlogger unable to restart
Product: Red Hat Satellite Reporter: Sanket Jagtap <sjagtap>
Component: LoggingAssignee: satellite6-bugs <satellite6-bugs>
Status: CLOSED ERRATA QA Contact: Sanket Jagtap <sjagtap>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.4CC: lzap, sghai
Target Milestone: 6.5.0Keywords: Triaged
Target Release: Unused   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-05-14 12:37:19 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1586051    
Bug Blocks: 1537078    

Description Sanket Jagtap 2018-05-18 14:24:46 UTC
Description of problem:
Problem is related to the new telemetry support that satellite 6.4.0 introduces

Version-Release number of selected component (if applicable):
Build: Satellite 6.4.0 snap3

How reproducible:


Steps to Reproduce:
1. Configure telemetry on the satellite 
2. Watch mmv.* metrics
3.

Actual results:
MMV stats disappear 

Expected results:
MMV stats should not disapper

Additional info:

Comment 2 Lukas Zapletal 2018-05-22 08:41:12 UTC
Yes, as a workaround delete all logs in /var/log/pcp/HOSTNAME/* I need to fix this.

Comment 4 Lukas Zapletal 2018-06-05 12:23:00 UTC
So Nathan from PCP identified the issue in MVV/PCP codebase, we have a patch that was merged upstream. I asked PCP guys to backport it into RHEL 7.5:

https://bugzilla.redhat.com/show_bug.cgi?id=1586051

In the meantime, you can continue testing with this PCP version:

https://copr.fedorainfracloud.org/coprs/lzap/pcp/

Just drop the repo file and upgrade all pcp packages, restart all services:

rm -rf /var/log/pcp/pmlogger/*/*
systemctl restart pmcd pmlogger pmie pmwebd

And start over with testing. Make sure this command does not print any error after one hour or one week of uptime:

echo "log mandatory on 30seconds mmv" | /usr/bin/pmlc -P
Connected to primary pmlogger at local:

You should see lots of mmv metrics:

pminfo | grep mmv

Also Grafana should work fine. If you don't see a metric which you expect, just run the "pmlc" command from above and it will show up.

Comment 8 Sanket Jagtap 2018-09-11 05:55:53 UTC
Build: Satellite 6.4.0 snap21

[root@smqa-x3550m3-03 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.6 Beta (Maipo)
[root@smqa-x3550m3-03 ~]# rpm -qa | grep pcp
pcp-conf-4.1.0-4.el7.x86_64
pcp-mmvstatsd-0.4-1.el7sat.x86_64
pcp-webapp-grafana-3.12.2-5.el7.noarch
pcp-pmda-apache-4.1.0-4.el7.x86_64
pcp-webapp-vector-3.12.2-5.el7.noarch
pcp-selinux-4.1.0-4.el7.x86_64
pcp-4.1.0-4.el7.x86_64
pcp-webapi-4.1.0-4.el7.x86_64
pcp-libs-4.1.0-4.el7.x86_64


I tested telemetry with 7.6 Beta Vault Build, Pmlogger was still functioning as expected after 3 days. No errors were observed in log. 

I feel this bug should still kept ON_QA until tested with 7.6 GA build, to be sure we don't miss anything when we announce the feature after/for 7.6 GA

Comment 11 Sanket Jagtap 2018-12-18 06:58:36 UTC
Build: Satellite 6.5.0 snap 8 on RHEL7.6

 rpm -qa | grep pcp
pcp-4.1.0-5.el7_6.x86_64
pcp-conf-4.1.0-5.el7_6.x86_64
pcp-mmvstatsd-0.4-2.el7sat.x86_64
pcp-libs-4.1.0-5.el7_6.x86_64
pcp-webapp-grafana-4.1.0-5.el7_6.noarch
pcp-selinux-4.1.0-5.el7_6.x86_64
pcp-webapp-vector-4.1.0-5.el7_6.noarch
pcp-pmda-apache-4.1.0-5.el7_6.x86_64
pcp-webapi-4.1.0-5.el7_6.x86_64


The logger didn't error and was functioning after 48 hours. No errors were recorded in Logs

 pmval mmv.fm_rails_http_request_total_duration.hosts_controller.new

metric:    mmv.fm_rails_http_request_total_duration.hosts_controller.new
host:      ---
semantics: instantaneous value
units:     millisec
samples:   all

                 mean                   min                   max              variance    standard_deviation 
             921.0                 606.0                1236.                 9.922E+04              315.0    
             921.0                 606.0                1236.                 9.922E+04              315.0    
             921.0                 606.0                1236.                 9.922E+04              315.0    
             921.0                 606.0                1236.                 9.922E+04              315.0    
             921.0                 606.0                1236.                 9.922E+04              315.0    
             921.0                 606.0                1236.                 9.922E+04              315.0    
             921.0                 606.0                1236.                 9.922E+04              315.0

Comment 14 errata-xmlrpc 2019-05-14 12:37:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1222