Bug 2026726
| Summary: | pmproxy: PCP archives are not discovered when they are copied into a subdirectory of PCP_ARCHIVE_DIR | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Rajesh Dulhani <rdulhani> | |
| Component: | pcp | Assignee: | pcp-maint <pcp-maint> | |
| Status: | CLOSED ERRATA | QA Contact: | Jan Kurik <jkurik> | |
| Severity: | unspecified | Docs Contact: | Jacob Taylor Valdez <jvaldez> | |
| Priority: | unspecified | |||
| Version: | 8.4 | CC: | agerstmayr, jkurik, nathans, peter.vreman | |
| Target Milestone: | rc | Keywords: | Bugfix, Triaged | |
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
|
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | pcp-5.3.7-1.el8 | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2027430 (view as bug list) | Environment: | ||
| Last Closed: | 2022-11-08 09:21:23 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2027430 | |||
|
Description
Rajesh Dulhani
2021-11-25 16:00:18 UTC
Hi Rajesh, (In reply to Rajesh Dulhani from comment #0) > Description of problem: > > I have PCP - Redis - Grafana working. There are clients connected with > remote pmcd and there it works only a can show new received > > Now I got the question to graph PCP data I downloaded from another server > that is not connected with pmcd. > already copied the data to /var/log/pcp/pmlogger/remote/<fqdn> directories > just liek the other connected direct pmcd connect hosts. You're copying PCP archives from another host manually? pmlogger stores archives in PCP_ARCHIVE_DIR/<host>/YYYYmmdd.*, for example /var/log/pcp/pmlogger/agerstmayr-thinkpad/20211126.* pmproxy is looking for archives in this directory. It does not look in /var/log/pcp/pmlogger/remote/<fqdn>/YYYYmmdd.* You can manually import PCP archives by using the pmseries command, see the man page of pmseries(1) for instructions: https://man7.org/linux/man-pages/man1/pmseries.1.html#TIMESERIES_LOADING That said, having greater flexibility for discovering metrics, i.e. looking recursively in PCP_ARCHIVE_DIR for new metrics sounds like a good feature request. Can you open a RFE for this new feature (recursively discovering metrics in PCP_ARCHIVE_DIR, and possibly letting users configure the path where pmproxy is looking for PCP archives)? > > How can I graph in grafana that set of PCP data? > > Also note that when I delete the Redis database (yes, I have to do that once > in a while because of the unstoppable memory growing of Redis). And when > restarting Redis + pmproxy I also do not have data. That's expected, Grafana is connecting to the pmproxy daemon, which is reading the data from the Redis database. If you delete the Redis database, there won't be any metrics for Grafana to show. The pmproxy discovery watches on file system events in the PCP archive directory, and imports new PCP archives if it sees that a PCP archive file has changed. For performance reasons, it does not check all old PCP archives and does not synchronize them. It only loads newly changed archives into Redis. You can manually import PCP archives though, see the pmseries(1) command mentioned above. Deleting the Redis database should not be necessary, there are options to configure the retention of metrics: the 'stream.expire' and 'stream.maxlen' options in the pmproxy config file at /etc/pcp/pmproxy/pmproxy.conf I also recommend the PCP section in the RHEL docs: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/monitoring_and_managing_system_status_and_performance/setting-up-pcp_monitoring-and-managing-system-status-and-performance#pcp-deployment-architectures_setting-up-pcp, in particular Section 6.8. Sizing factors and 6.9. Configuration options for PCP scaling which explain the sizing factors and configuration settings for the PCP deployment. Cheers, Andreas pmlogger writes in the directory that you configure yourself in the control.d/ files. This split the local archives from remote archives on purpose that i can apply different rules (e,g, i still have a cronb job required in the past doing 'find -mtime +X -delete' housekeeping as safety net in case pcp fails to do it nicely) For remote loggers i use the following line and it writes the archives nicely in PCP_LOG_DIR/pmlogger/remote/ and there was never any issue with it. remotehost.example.com n n PCP_LOG_DIR/pmlogger/remote/remotehost.example.com -r -T24h10m -c /var/lib/pcp/config/pmlogger/remote/config.remotehost.example.com -v 100Mb This setup using PCP_LOG_DIR/pmlogger/remote/ for remote hosts working with pmproxy for real-time data. I was under the impression that pmproxy was reading the data from the pmlogger written files The old pmwebd had such option: ~~~ -A archdir Limit remote access to archives to only those beneath the given directory. For performance, symbolic links to other directories may not be followed. By default, only files beneath the initial pmwebd working directory may be accessed. ~~~ It looks like the pmproxy replacement of pmwebd is not a like-to-like replacement and serves a different purpose than the old pmwebd did. The pmseries commands is a bit vague, e.g. it contains many complex query commands, but basic functionality to load data manual is hidden. Example of unclearity: - does pmseries work without pmproxy? - configure file is an option, but it is not documented the format. Searching with mlocate reveals that pmseries.conf is a synmlink to pmproxy.conf ?!? Deleting redis is the only option for me to get things back to work. The Redis is constantly OOM-killed leaving half-corrupted database. I now see in section 6.9 that PCP memory requirements for redis are documented at 500MB/host/day for 10seconds. And i have mixture of 10s and 60s of pmcd collections and this must stay on the clients for local historical analysis with 'pcp atop' and cannot be change just for Redis. The Redis is the nice to have on top. The core functionality is the archives for the sosreport and pcp-atop. How shall stream.maxlen and stream.duration be set to load historical data of 2 weeks of data that is 1 week old? The stream.maxlen and stream.duration look to be global only and also applicable for real-time loaded data from pmproxy. And also how can i unable the unload timeseries of a host when i am done with analysis to free up the memory? I have the impression that my use case of adhoc analysis of a random hosts and real-time monitoring are conflicting. Maybe i should disable pmproxy and switch only to use adhoc analsysi with loading using pmseries. Loadign 2 weeks of data with pmseries --load does not work, it segfaults: ~~~ [cb/Azure] root@li-lc-2635:~# pmseries --load /var/log/pcp/pmlogger/remote/li-ld-2029 Segmentation fault (core dumped) [cb/Azure] root@li-lc-2635:~# ~~~ The memory usage of pmseries is also huge: ~~~ [cb/Azure] root@li-lc-2635:# ps auxf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND ... root 1586582 99.1 32.9 10864596 10763968 pts/0 R+ 16:10 7:56 | \_ /usr/bin/pmseries --load /var/log/pcp/pmlogger/remote/li-ld-2029 ~~~ Is pmseries --load really tested on large archive sets? I run it in gdb and aborting it after running 10 minutes and showing a backtrace shows there is a recursion issue: ~~~ (gdb) bt #0 0x00007ffff6b0c2c6 in __pmTimevalCmp () from /lib64/libpcp.so.3 #1 0x00007ffff6b10967 in searchindom () from /lib64/libpcp.so.3 #2 0x00007ffff6b11538 in __pmLogGetInDom () from /lib64/libpcp.so.3 #3 0x00007ffff6af8041 in pmGetInDom () from /lib64/libpcp.so.3 #4 0x00007ffff7b875cd in pmwebapi_add_indom_instances () from /lib64/libpcp_web.so.1 #5 0x00007ffff7b82308 in series_cache_update () from /lib64/libpcp_web.so.1 #6 0x00007ffff7b8261d in server_cache_window () from /lib64/libpcp_web.so.1 #7 0x00007ffff7b81d76 in doneSeriesGetContext () from /lib64/libpcp_web.so.1 #8 0x00007ffff7b8202b in series_cache_update () from /lib64/libpcp_web.so.1 #9 0x00007ffff7b8261d in server_cache_window () from /lib64/libpcp_web.so.1 #10 0x00007ffff7b81d76 in doneSeriesGetContext () from /lib64/libpcp_web.so.1 #11 0x00007ffff7b8202b in series_cache_update () from /lib64/libpcp_web.so.1 #12 0x00007ffff7b8261d in server_cache_window () from /lib64/libpcp_web.so.1 #13 0x00007ffff7b81d76 in doneSeriesGetContext () from /lib64/libpcp_web.so.1 #14 0x00007ffff7b8202b in series_cache_update () from /lib64/libpcp_web.so.1 #15 0x00007ffff7b8261d in server_cache_window () from /lib64/libpcp_web.so.1 #16 0x00007ffff7b81d76 in doneSeriesGetContext () from /lib64/libpcp_web.so.1 #17 0x00007ffff7b8202b in series_cache_update () from /lib64/libpcp_web.so.1 #18 0x00007ffff7b8261d in server_cache_window () from /lib64/libpcp_web.so.1 #19 0x00007ffff7b81d76 in doneSeriesGetContext () from /lib64/libpcp_web.so.1 #20 0x00007ffff7b8202b in series_cache_update () from /lib64/libpcp_web.so.1 #21 0x00007ffff7b8261d in server_cache_window () from /lib64/libpcp_web.so.1 #22 0x00007ffff7b81d76 in doneSeriesGetContext () from /lib64/libpcp_web.so.1 #23 0x00007ffff7b8202b in series_cache_update () from /lib64/libpcp_web.so.1 #24 0x00007ffff7b8261d in server_cache_window () from /lib64/libpcp_web.so.1 #25 0x00007ffff7b81d76 in doneSeriesGetContext () from /lib64/libpcp_web.so.1 #26 0x00007ffff7b8202b in series_cache_update () from /lib64/libpcp_web.so.1 #27 0x00007ffff7b8261d in server_cache_window () from /lib64/libpcp_web.so.1 ... #29498 0x00007ffff7b8202b in series_cache_update () from /lib64/libpcp_web.so.1 #29499 0x00007ffff7b8261d in server_cache_window () from /lib64/libpcp_web.so.1 #29500 0x00007ffff7b81d76 in doneSeriesGetContext () from /lib64/libpcp_web.so.1 #29501 0x00007ffff7b8202b in series_cache_update () from /lib64/libpcp_web.so.1 #29502 0x00007ffff7b8261d in server_cache_window () from /lib64/libpcp_web.so.1 #29503 0x00007ffff7b81d76 in doneSeriesGetContext () from /lib64/libpcp_web.so.1 #29504 0x00007ffff7b8202b in series_cache_update () from /lib64/libpcp_web.so.1 #29505 0x00007ffff7b8261d in server_cache_window () from /lib64/libpcp_web.so.1 #29506 0x00007ffff7b81d76 in doneSeriesGetContext () from /lib64/libpcp_web.so.1 #29507 0x00007ffff7b8202b in series_cache_update () from /lib64/libpcp_web.so.1 #29508 0x00007ffff7b8261d in server_cache_window () from /lib64/libpcp_web.so.1 #29509 0x00007ffff7b81d76 in doneSeriesGetContext () from /lib64/libpcp_web.so.1 #29510 0x00007ffff7b8202b in series_cache_update () from /lib64/libpcp_web.so.1 #29511 0x00007ffff7b8261d in server_cache_window () from /lib64/libpcp_web.so.1 #29512 0x00007ffff7b81d76 in doneSeriesGetContext () from /lib64/libpcp_web.so.1 #29513 0x00007ffff7b8202b in series_cache_update () from /lib64/libpcp_web.so.1 #29514 0x00007ffff7b8261d in server_cache_window () from /lib64/libpcp_web.so.1 #29515 0x00007ffff7b82738 in series_cache_metrics () from /lib64/libpcp_web.so.1 #29516 0x00007ffff7b88f5c in redisSlotsReplyCallback () from /lib64/libpcp_web.so.1 #29517 0x00007ffff7b9d4d0 in redisClusterAsyncCallback () from /lib64/libpcp_web.so.1 #29518 0x00007ffff7b92c95 in redisProcessCallbacks () from /lib64/libpcp_web.so.1 #29519 0x00007ffff7b88539 in redisLibuvPoll () from /lib64/libpcp_web.so.1 #29520 0x00007ffff71cbd15 in uv.io_poll () from /lib64/libuv.so.1 #29521 0x00007ffff71baa74 in uv_run () from /lib64/libuv.so.1 #29522 0x00005555555569e5 in main () (gdb) ~~~ Hi Peter, (In reply to Peter Vreman from comment #2) > pmlogger writes in the directory that you configure yourself in the > control.d/ files. This split the local archives from remote archives on > purpose that i can apply different rules (e,g, i still have a cronb job > required in the past doing 'find -mtime +X -delete' housekeeping as safety > net in case pcp fails to do it nicely) > > For remote loggers i use the following line and it writes the archives > nicely in PCP_LOG_DIR/pmlogger/remote/ and there was never any issue with it. > > remotehost.example.com n n > PCP_LOG_DIR/pmlogger/remote/remotehost.example.com -r -T24h10m -c > /var/lib/pcp/config/pmlogger/remote/config.remotehost.example.com -v 100Mb You're right, users can configure the pmlogger output directory for each pmlogger process individually. It's just a convention to use PCP_ARCHIVE_DIR/<fqdn> > This setup using PCP_LOG_DIR/pmlogger/remote/ for remote hosts working with > pmproxy for real-time data. I tested it again and came to the following conclusion: While pmproxy is running, - manually copied PCP archives into /var/log/pcp/pmlogger/<host> are discovered (loaded into Redis) - manually copied PCP archives into /var/log/pcp/pmlogger/remote/<host> are not discovered (not loaded into Redis) However, when you run "touch /var/log/pcp/pmlogger/remote/<host>/YYYYmmdd.*" pmproxy receives the file system event of a changed file and is correctly loading the archive into Redis. That explains why this setup works when pmlogger is configured to log into this directory (pmlogger keeps updating this file, triggering the discovery in pmproxy), but it doesn't discover it when copied manually into that directory. That's a bug, we'll track progress in this BZ to fix it. > I was under the impression that pmproxy was reading the data from the > pmlogger written files > The old pmwebd had such option: > ~~~ > -A archdir > Limit remote access to archives to only those beneath the given directory. > For performance, symbolic links to other directories may not be followed. By > default, only files beneath the initial pmwebd working directory may be > accessed. > ~~~ > > It looks like the pmproxy replacement of pmwebd is not a like-to-like > replacement and serves a different purpose than the old pmwebd did. pmproxy is still reading PCP archives which are written by pmlogger. pmproxy is loading them in a Redis database (which allows us to make queries spanning multiple PCP archive files, i.e. multiple hosts), while pmwebd was reading each file individually on-the-fly afaics. > The pmseries commands is a bit vague, e.g. it contains many complex query > commands, but basic functionality to load data manual is hidden. Example of > unclearity: > - does pmseries work without pmproxy? Some technical background: - we have a libpcp_web library, which handles the interaction with Redis - both pmseries and pmproxy use the same library to interact with Redis - pmseries is a CLI to query the Redis database, and also to (manually) load PCP archives into Redis - pmproxy reads the metrics stored in Redis using libpcp_web and exposes them over a REST API, to be consumed by other client applications (for example Grafana through the grafana-pcp datasource) To answer the question: Yes, pmseries works without pmproxy. It connects directly to Redis. > - configure file is an option, but it is not documented the format. > Searching with mlocate reveals that pmseries.conf is a synmlink to > pmproxy.conf ?!? Yep, they're using the same configuration file. The configuration file stores the location of the Redis server for example, which should be the same for pmseries and pmproxy. > Deleting redis is the only option for me to get things back to work. The > Redis is constantly OOM-killed leaving half-corrupted database. I now see in > section 6.9 that PCP memory requirements for redis are documented at > 500MB/host/day for 10seconds. And i have mixture of 10s and 60s of pmcd > collections and this must stay on the clients for local historical analysis > with 'pcp atop' and cannot be change just for Redis. The Redis is the nice > to have on top. The core functionality is the archives for the sosreport and > pcp-atop. > > > How shall stream.maxlen and stream.duration be set to load historical data > of 2 weeks of data that is 1 week old? The stream.maxlen and > stream.duration look to be global only and also applicable for real-time > loaded data from pmproxy. Yes, they're global and apply to all metric values in the Redis database. For example the 'stream.maxlen' setting limits the number of metric values per metric per host, irrespective of the timestamp when this metric value was recorded. In the regular case of continuously logging metrics and storing them in the Redis database, the correct number would be the desired retention time divided by the logging interval. > And also how can i unable the unload timeseries of a host when i am done > with analysis to free up the memory? That's not possible yet. We plan to support fine-grained metric retention settings and downsampling of historical metric values (to reduce database size/memory usage) in the future. > I have the impression that my use case of adhoc analysis of a random hosts > and real-time monitoring are conflicting. Maybe i should disable pmproxy and > switch only to use adhoc analsysi with loading using pmseries. You're right, the current architecture is designed for real-time monitoring of multiple hosts and not ad-hoc analysis of single PCP archives. We had a conversation and plans about ad-hoc analysis of PCP archives with Grafana internally already, I've created BZ 2027428 to track this RFE. > Loadign 2 weeks of data with pmseries --load does not work, it segfaults: > Is pmseries --load really tested on large archive sets? That's an unfortunate bug. I've opened BZ 2027430 to track this bug. I can't reproduce it locally, would it be possible to attach the PCP archive which crashes the pmseries --load process to BZ 2027430? Or if it's too big to attach, either mail it to us or send us a link? Thanks for your detailed reply, that's very much appreciated and helps us prioritizing our work accordingly - from the initial description the use-case of ad-hoc analysis wasn't clear. Cheers, Andreas Let's use BZ 2027428 for the "[RFE] graphing historical metrics from any PCP archive using Grafana" use case, and use this BZ only to track a bug where discovery doesn't pick up PCP archives when users manually copy them into a subdirectory of /var/log/pcp/pmlogger (see #c5). I've updated the title to resolve any future confusion about these two independent bugs/RFEs. The pmseries --load option is the supported way to manually load archive(s) into a redis instance, subject to the performance and segfault issues addressed in BZ#2027430. Scripts for ad-hoc ingest into a redis instance (see feature request in BZ#2027428) should use this to load up a dedicated redis instance and configure a grafana-pcp datasource to use it. This can all be scripted. The libpcp_web discovery module used by pmproxy is intended for "live" log-tailing of growing archives, e.g. as collected by a pmlogger farm. Compressed log volumes are ignored because they can't grow - we do not support transparent compression for archive writes. If all the logvols for an archive are compressed, the archive is ignored and the host/source will not show up in the redis instance being used by pmproxy. Ad-hoc ingest is a separate use-case and should use pmseries --load with a dedicated redis instance. Also note, just to be clear: the discovery module will dynamically find archives at any directory depth, not just directories immediately below $PCP_ARCHIVE_DIR. So IMO this BZ should be closed NOTABUG - it's a misundertanding of the intended use case for pmproxy archive discovery. The desired feature should be implemented as a series of scripts over in BZ#2027428. Most of this BZ is superseded by the more specific BZs: - BZ#2027428 is the describes the use case better how to graph an ad-hoc system - BZ#2027430 is created for the segfault of pmseries --load I still think this BZ is valid, but its contents is change to improve the documentation on what pmproxy is doing. For me based on the current (RHEL8.5) manpage the pmproxy was proving the REST API and allowed querying all archives it found. Now based on the previous comment 'The libpcp_web discovery module used by pmproxy is intended for "live" log-tailing of growing archives, e.g. as collected by a pmlogger farm. Compressed log volumes are ignored because they can't grow' i understand that it works only for the live use case. It will be usefull have the comment on the 'live' tailing of growing archives added to the pmproxy manpage to understand the process it uses to discover/reads the pcp data
Resolved upstream in pcp-6.0.0 (current devel):
commit c40983b44587027f8069df15690abe3e78cf7a34
Author: Mark Goodwin <mgoodwin>
Date: Fri Feb 11 12:09:50 2022 +1100
docs: improve pmproxy --timeseries and pmseries --load documentation
Clarify documentation for pmproxy archive discovery and "log-tailing",
and that pmseries --load is the supported way to manually load previously
collected archive data into a redis-server, whether compressed or not.
Updated --timeseries description in pmproxy(1), --load in pmseries(1)
and pmdiscoversetup(3).
Resolves: RHBZ#2026726
Related: RHBZ#2027430, RHBZ#2027428.
c40983b44587027f8069df15690abe3e78cf7a34 (from comment#10) cherry-picked and pushed to upstream stable branch Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pcp bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:7474 |