Created attachment 1502935 [details] /etc/pcp/pmlogger/control.d/local Description of problem: PCP is consuming 9.3 GB of logs space for < 2 days of logging. Version-Release number of selected component (if applicable): $ rpm -qa |grep pcp pcp-gui-4.1.3-1.fc28.x86_64 pcp-zeroconf-4.1.3-1.fc28.x86_64 pcp-pmda-nfsclient-4.1.3-1.fc28.x86_64 pcp-doc-4.1.3-1.fc28.noarch pcp-4.1.3-1.fc28.x86_64 python3-pcp-4.1.3-1.fc28.x86_64 pcp-pmda-dm-4.1.3-1.fc28.x86_64 pcp-conf-4.1.3-1.fc28.x86_64 python2-pcp-4.1.3-1.fc28.x86_64 cockpit-pcp-180-1.fc28.x86_64 pcp-system-tools-4.1.3-1.fc28.x86_64 pcp-selinux-4.1.3-1.fc28.x86_64 pcp-libs-4.1.3-1.fc28.x86_64 How reproducible: I have set $PCP_COMPRESSAFTER=1 in both of these config files (because I can never remember which config file to use). /etc/pcp/pmlogger/control /etc/pcp/pmlogger/control.d/local # attached to BZ This same problem happened to me 2 days ago while I had $PCP_COMPRESSAFTER=0. It was not consolidating logs, nor cleaning up old logs, some of which were older than 30 days. So I manually deleted all old logs for Nov, Oct, and Sept. Steps to Reproduce: 1. 2. 3. Actual results: $ du -h 9.3G . $ du -h * | sort -h 4.0K Latest 36K pmlogger.log 36K pmlogger.log.prior 48K 20181107.00.10.index 176K 20181101.index 176K 20181106.index 180K 20181103.index 180K 20181105.index 184K 20181104.index 220K 20181102.index 336K 20181101.meta.xz 468K 20181102.meta.xz 472K 20181103.meta.xz 4.9M 20181101.22.xz 9.3M 20181102.25.xz 11M 20181101.0.xz 11M 20181101.10.xz 11M 20181101.11.xz 11M 20181101.12.xz 11M 20181101.13.xz 11M 20181101.14.xz 11M 20181101.15.xz 11M 20181101.16.xz 11M 20181101.17.xz 11M 20181101.18.xz 11M 20181101.19.xz 11M 20181101.1.xz 11M 20181101.21.xz 11M 20181101.2.xz 11M 20181101.3.xz 11M 20181101.4.xz 11M 20181101.5.xz 11M 20181101.6.xz 11M 20181101.7.xz 11M 20181101.8.xz 11M 20181101.9.xz 11M 20181102.0.xz 11M 20181102.10.xz 11M 20181102.11.xz 11M 20181102.12.xz 11M 20181102.1.xz 11M 20181102.2.xz 11M 20181102.3.xz 11M 20181102.4.xz 11M 20181102.5.xz 11M 20181102.6.xz 11M 20181102.7.xz 11M 20181102.8.xz 11M 20181102.9.xz 11M 20181103.24.xz 11M 20181103.7.xz 12M 20181101.20.xz 12M 20181102.13.xz 12M 20181102.14.xz 12M 20181102.15.xz 12M 20181102.16.xz 12M 20181102.17.xz 12M 20181102.18.xz 12M 20181102.19.xz 12M 20181102.20.xz 12M 20181102.21.xz 12M 20181102.22.xz 12M 20181102.23.xz 12M 20181102.24.xz 12M 20181103.0.xz 12M 20181103.10.xz 12M 20181103.11.xz 12M 20181103.12.xz 12M 20181103.13.xz 12M 20181103.14.xz 12M 20181103.15.xz 12M 20181103.16.xz 12M 20181103.17.xz 12M 20181103.18.xz 12M 20181103.19.xz 12M 20181103.1.xz 12M 20181103.20.xz 12M 20181103.21.xz 12M 20181103.22.xz 12M 20181103.23.xz 12M 20181103.2.xz 12M 20181103.3.xz 12M 20181103.4.xz 12M 20181103.5.xz 12M 20181103.6.xz 12M 20181103.8.xz 12M 20181103.9.xz 15M 20181107.00.10.meta 33M 20181106.26 54M 20181106.meta 59M 20181104.meta 63M 20181105.meta 100M 20181107.00.10.6 101M 20181106.0 101M 20181106.1 101M 20181106.10 101M 20181106.11 101M 20181106.12 101M 20181106.13 101M 20181106.14 101M 20181106.15 101M 20181106.16 101M 20181106.17 101M 20181106.18 101M 20181106.19 101M 20181106.2 101M 20181106.20 101M 20181106.21 101M 20181106.22 101M 20181106.23 101M 20181106.24 101M 20181106.25 101M 20181106.3 101M 20181106.4 101M 20181106.5 101M 20181106.6 101M 20181106.7 101M 20181106.8 101M 20181106.9 101M 20181107.00.10.0 101M 20181107.00.10.1 101M 20181107.00.10.2 101M 20181107.00.10.3 101M 20181107.00.10.4 101M 20181107.00.10.5 484M 20181105.1 521M 20181104.1 2.0G 20181104.0 2.0G 20181105.0 Expected results: All of the logs prior to today (Nov 7th) should be consolidated and compressed. Additional info:
grep -ie pcp -e pmlogg -e pmie /var/log/messages Nov 7 06:58:01 tbowling systemd[1]: Created slice User Slice of pcp. Nov 7 06:58:01 tbowling systemd[1]: Started Session 1022 of user pcp. Nov 7 06:58:01 tbowling audit[32250]: USER_ACCT pid=32250 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Nov 7 06:58:01 tbowling audit[32250]: USER_START pid=32250 uid=0 auid=985 ses=1023 subj=system_u:system_r:init_t:s0 msg='op=PAM:session_open grantors=pam_selinux,pam_selinux,pam_loginuid,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Nov 7 06:58:01 tbowling audit[32247]: USER_START pid=32247 uid=0 auid=985 ses=1022 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_loginuid,pam_keyinit,pam_limits,pam_systemd acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 7 06:58:01 tbowling audit[32247]: CRED_REFR pid=32247 uid=0 auid=985 ses=1022 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 7 06:58:01 tbowling audit[32247]: CRED_DISP pid=32247 uid=0 auid=985 ses=1022 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 7 06:58:01 tbowling audit[32247]: USER_END pid=32247 uid=0 auid=985 ses=1022 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_close grantors=pam_loginuid,pam_keyinit,pam_limits,pam_systemd acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 7 06:58:01 tbowling systemd[1]: Removed slice User Slice of pcp. # systemctl -l status pm* ● pmcd.service - Performance Metrics Collector Daemon Loaded: loaded (/usr/lib/systemd/system/pmcd.service; enabled; vendor preset: enabled) Active: active (running) since Wed 2018-11-07 06:49:56 EST; 21min ago Docs: man:pmcd(8) Process: 24051 ExecStop=/usr/share/pcp/lib/pmcd stop (code=exited, status=0/SUCCESS) Process: 24687 ExecStart=/usr/share/pcp/lib/pmcd start (code=exited, status=0/SUCCESS) Main PID: 24789 (pmcd) Tasks: 9 (limit: 4915) Memory: 24.3M CGroup: /system.slice/pmcd.service ├─24789 /usr/libexec/pcp/bin/pmcd ├─24792 /var/lib/pcp/pmdas/root/pmdaroot ├─24793 /var/lib/pcp/pmdas/proc/pmdaproc -d 3 ├─24794 /var/lib/pcp/pmdas/xfs/pmdaxfs -d 11 ├─24795 /var/lib/pcp/pmdas/linux/pmdalinux ├─24796 python3 /var/lib/pcp/pmdas/nfsclient/pmdanfsclient.python ├─24800 /var/lib/pcp/pmdas/kvm/pmdakvm -d 95 └─24801 /var/lib/pcp/pmdas/dm/pmdadm -d 129 Nov 07 06:49:56 tbowling.localdomain systemd[1]: Starting Performance Metrics Collector Daemon... Nov 07 06:49:56 tbowling.localdomain pmcd[24687]: Starting pmcd ... Nov 07 06:49:56 tbowling.localdomain systemd[1]: Started Performance Metrics Collector Daemon. ● pmproxy.service - Proxy for Performance Metrics Collector Daemon Loaded: loaded (/usr/lib/systemd/system/pmproxy.service; disabled; vendor preset: disabled) Active: active (running) since Wed 2018-11-07 06:49:56 EST; 21min ago Docs: man:pmproxy(8) Process: 24052 ExecStop=/usr/share/pcp/lib/pmproxy stop (code=exited, status=0/SUCCESS) Process: 24814 ExecStart=/usr/share/pcp/lib/pmproxy start (code=exited, status=0/SUCCESS) Main PID: 24963 (pmproxy) Tasks: 2 (limit: 4915) Memory: 2.7M CGroup: /system.slice/pmproxy.service └─24963 /usr/libexec/pcp/bin/pmproxy Nov 07 06:49:56 tbowling.localdomain systemd[1]: Starting Proxy for Performance Metrics Collector Daemon... Nov 07 06:49:56 tbowling.localdomain pmproxy[24814]: Starting pmproxy ... Nov 07 06:49:56 tbowling.localdomain systemd[1]: Started Proxy for Performance Metrics Collector Daemon. ● pmie.service - Performance Metrics Inference Engine Loaded: loaded (/usr/lib/systemd/system/pmie.service; enabled; vendor preset: enabled) Active: active (exited) since Wed 2018-11-07 06:49:57 EST; 21min ago Docs: man:pmie(1) Process: 24053 ExecStop=/usr/share/pcp/lib/pmie stop (code=exited, status=0/SUCCESS) Process: 24815 ExecStart=/usr/share/pcp/lib/pmie start (code=exited, status=0/SUCCESS) Main PID: 24815 (code=exited, status=0/SUCCESS) Tasks: 1 (limit: 4915) Memory: 3.2M CGroup: /system.slice/pmie.service └─25282 /usr/bin/pmie -b -h local: -l /var/log/pcp/pmie/tbowling.localdomain/pmie.log -c config.default Nov 07 06:49:56 tbowling.localdomain systemd[1]: Starting Performance Metrics Inference Engine... Nov 07 06:49:57 tbowling.localdomain pmie[24815]: Starting pmie ... Nov 07 06:49:57 tbowling.localdomain systemd[1]: Started Performance Metrics Inference Engine. ● pmlogger.service - Performance Metrics Archive Logger Loaded: loaded (/usr/lib/systemd/system/pmlogger.service; enabled; vendor preset: enabled) Active: active (exited) since Wed 2018-11-07 06:49:56 EST; 21min ago Docs: man:pmlogger(1) Process: 24054 ExecStop=/usr/share/pcp/lib/pmlogger stop (code=exited, status=0/SUCCESS) Process: 24813 ExecStart=/usr/share/pcp/lib/pmlogger start (code=exited, status=0/SUCCESS) Main PID: 24813 (code=exited, status=0/SUCCESS) Tasks: 1 (limit: 4915) Memory: 52.8M CGroup: /system.slice/pmlogger.service └─30382 /usr/libexec/pcp/bin/pmlogger -P -r -T24h10m -c config.default -v 100mb -m pmlogger_check 20181107.06.49 Nov 07 06:49:56 tbowling.localdomain systemd[1]: Starting Performance Metrics Archive Logger... Nov 07 06:49:56 tbowling.localdomain pmlogger[24813]: Starting pmlogger ... Nov 07 06:49:56 tbowling.localdomain systemd[1]: Started Performance Metrics Archive Logger.
manually executing this command is not helping either: /usr/libexec/pcp/bin/pmie_daily -V -c /etc/pcp/pmie/control.d/local -x 1
Terry, can you please attach /var/log/pcp/pmlogger/*.{log,prev} That's where the daily cron driven pmlogger_daily logs are written. Looks like something prevented log rotation and compression on (and after) Nov 4th 2018, but we'll need to see the messages. It's possible you've hit the following corner case identified and fixed by Kenj for pcp-4.2.0 : commit ccafcd170d31e0ee56d7c9a3841ee9714457607a Author: Ken McDonell <kenj.au> Date: Sun Oct 14 15:51:46 2018 +1100 pmlogger_daily: change workflow to cull early Change the workflow to move culling of old files earlier in the pipeline. So if there is a problem with log rewriting or merging that is unattended for a log time, we'll eventually cull the offending archives rather than having the rewriting or mering failure block the culling and (in extreme cases) lead to full flesystems Also add qa/686 (new) to check all of this out. Cheers
Created attachment 1503308 [details] pmlogger log files attached a tarball of pmlogger log files. I'm not sure how helpful it will be. I had already deleted everything before Nov 3, and then I had moved Nov 4 & 5 logs to my /home dir to save space. So when it ran this morning they were not there. I moved the Nov 4 & 5 logs back to the proper location and ran the daily cron job. Omitting the -N flag did not seem to have an effect and outputs no verbose messaging. It just immediately returns. [root@tbowling tbowling.localdomain]# /usr/libexec/pcp/bin/pmlogger_daily -VN -c /etc/pcp/pmlogger/control.d/local + date-and-timestamp 2018-11-08 07:20:29 1541679629 # $version=1.1 + export version; version=1.1 # $PCP_COMPRESSAFTER=1 + export PCP_COMPRESSAFTER; PCP_COMPRESSAFTER=1 + cd /var/log/pcp/pmlogger/tbowling.localdomain === daily maintenance of PCP archives for host local: === + get mutex lock pmlogger_daily: [control.d/local:26] Error: no pmlogger instance running for host "local:" ... logging for host "local:" unchanged pmlogger_daily: [control.d/local:26] Warning: skipping pmnewlog because we don't know which pmlogger to stop pmlogger_daily: Warning: no archives found to merge pmlogger_daily: [control.d/local:26] Warning: current volume of current pmlogger not known, compression skipped
Created attachment 1503310 [details] updated pmlogger logs after nov 4 & 5 logs restored update - after restoring the Nov 4 & 5 logs, it looks like the regular cron jobs ran and compressed them. However Nov 6th logs are still not combined and compressed. Here's the updated cron logs. # du -h * |sort -h; du -h 4.0K Latest 8.0K 20181108.07.25.index 32K pmlogger.log 36K pmlogger.log.prior 56K 20181108.00.10.index 128K 20181107.index 176K 20181106.index 180K 20181105.index 184K 20181104.index 464K 20181104.meta.xz 572K 20181105.meta.xz 2.0M 20181108.07.25.meta 11M 20181106.0.xz 11M 20181106.1.xz 11M 20181106.2.xz 11M 20181106.3.xz 11M 20181106.4.xz 11M 20181106.5.xz 11M 20181106.6.xz 12M 20181106.7.xz 14M 20181108.00.10.meta 23M 20181108.00.10.7 33M 20181106.26 37M 20181107.meta 53M 20181105.1.xz 54M 20181106.meta 59M 20181104.1.xz 64M 20181108.07.25.0 101M 20181106.10 101M 20181106.11 101M 20181106.12 101M 20181106.13 101M 20181106.14 101M 20181106.15 101M 20181106.16 101M 20181106.17 101M 20181106.18 101M 20181106.19 101M 20181106.20 101M 20181106.21 101M 20181106.22 101M 20181106.23 101M 20181106.24 101M 20181106.25 101M 20181106.8 101M 20181106.9 101M 20181108.00.10.0 101M 20181108.00.10.1 101M 20181108.00.10.2 101M 20181108.00.10.3 101M 20181108.00.10.4 101M 20181108.00.10.5 101M 20181108.00.10.6 228M 20181105.0.xz 232M 20181104.0.xz 1.7G 20181107.0 5.0G .
pmlogger_daily.log says : === daily maintenance of PCP archives for host local: === pmlogger_daily: [control.d/local:26] Error: no pmlogger instance running for host "local:" ... logging for host "local:" unchanged pmlogger_daily: [control.d/local:26] Warning: skipping pmnewlog because we don't know which pmlogger to stop pmlogger_daily: Warning: no archives found to merge pmlogger_daily: [control.d/local:26] Warning: current volume of current pmlogger not known, compression skipped So compression and culling is being skipped. The fix mentioned in Comment #3 should avoid that problem. But I'm not sure why it can't find the pid for the pmlogger for host "local:" - when that happens, it skips compression. That would also be avoided by the patch in Comment #3. /var/lib/pcp/config/pmlogger is mode 775 pcp/pcp, but strangely, /var/lib/pcp/config/pmlogger/config.default is mode 644 root/root. My system is the same ... this seems wrong and means any subsequent logging config change can't be changed by pmlogconf when driven from a crontab script (they all run as user pcp). THis is backed up by Terry's pmlogger_check.log.prev, which says : cp: cannot create regular file '/var/lib/pcp/config/pmlogger/config.default.tmp': Permission denied pmlogger_check [/etc/pcp/pmlogger/control.d/local:26] Warning: pmlogconf failed to reconfigure "/var/lib/pcp/config/pmlogger/config.default" pmlogconf: Error: existing config file "/var/lib/pcp/config/pmlogger/config.default" is not writeable So there are a few unexplained anomalies that only seem to be affecting Terry! Can you paste your /etc/pcp/pmlogger/control.d/local here please? thanks
Terry, please update your F28 system to pcp-4.2.0-1. It's currently in F28 updates-testing repo, so use something like: # dnf --enablerepo=updates-testing update pcp We are very interested if this resolves the original issue as reported - as previously noted, there are new patches in v4.2 related to this issue. Thanks -- Mark
Upgraded to F29 and latest pcp-4.2 from fedora testing. Still has not compressed yesterday's logs. Is this only me? Is there something wrong with my system or do you see the same on yours? Or is this just the selinux errors? I see some selinux errors during the upgrading of the packages, as well as tons of errors related to the pcp cron tasks after Nov 21 13:41:20 tbowling pmcd[9406]: Starting pmcd ... Nov 21 13:41:20 tbowling pmcd[9406]: Installing dm PMDA ... Nov 21 13:41:20 tbowling audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmcd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Nov 21 13:41:24 tbowling audit[9892]: AVC avc: denied { search } for pid=9892 comm="ps" name="lightdm" dev="dm-1" ino=51041272 scontext=system_u:system_r:pcp_pmlogger_t:s0 tcontext=system_u:object_r:xserver_log_t:s0 tclass=dir permissive=0 Nov 21 13:41:24 tbowling pmlogger[9862]: Starting pmlogger ... Nov 21 13:41:24 tbowling audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmlogger comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Nov 21 13:41:25 tbowling audit[9527]: AVC avc: denied { write } for pid=9527 comm="Install" name="pmcd.conf" dev="dm-1" ino=17860410 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=file permissive=0 Nov 21 13:41:25 tbowling pmcd[9406]: Installing nfsclient PMDA ... Nov 21 13:41:27 tbowling audit[11618]: AVC avc: denied { write } for pid=11618 comm="Install" name="pmcd.conf" dev="dm-1" ino=17860410 scontext=system_u:system_r:init_t:s0 tcontext=system_u:object_r:etc_t:s0 tclass=file permissive=0 Nov 21 13:41:29 tbowling audit[15624]: AVC avc: denied { search } for pid=15624 comm="ps" name="lightdm" dev="dm-1" ino=51041272 scontext=system_u:system_r:pcp_pmlogger_t:s0 tcontext=system_u:object_r:xserver_log_t:s0 tclass=dir permissive=0 Nov 21 13:41:31 tbowling audit[15877]: AVC avc: denied { search } for pid=15877 comm="ps" name="lightdm" dev="dm-1" ino=51041272 scontext=system_u:system_r:pcp_pmlogger_t:s0 tcontext=system_u:object_r:xserver_log_t:s0 tclass=dir permissive=0 Nov 21 13:41:38 tbowling setroubleshoot[13343]: SELinux is preventing Install from write access on the file /etc/pcp/pmcd/pmcd.conf. For complete SELinux messages run: sealert -l 7319dc75-1176-4d1e-b50c-2e51d8b863f9 Nov 21 13:41:38 tbowling python3[13343]: SELinux is preventing Install from write access on the file /etc/pcp/pmcd/pmcd.conf.#012#012***** Plugin catchall (100. confidence) suggests **************************#012#012If you believe that Install should be allowed write access on the pmcd.conf file by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'Install' --raw | audit2allow -M my-Install#012# semodule -X 300 -i my-Install.pp#012 Nov 21 13:41:38 tbowling setroubleshoot[13343]: SELinux is preventing Install from write access on the file /etc/pcp/pmcd/pmcd.conf. For complete SELinux messages run: sealert -l 7319dc75-1176-4d1e-b50c-2e51d8b863f9 Nov 21 13:41:38 tbowling python3[13343]: SELinux is preventing Install from write access on the file /etc/pcp/pmcd/pmcd.conf.#012#012***** Plugin catchall (100. confidence) suggests **************************#012#012If you believe that Install should be allowed write access on the pmcd.conf file by default.#012Then you should report this as a bug.#012You can generate a local policy module to allow this access.#012Do#012allow this access for now by executing:#012# ausearch -c 'Install' --raw | audit2allow -M my-Install#012# semodule -X 300 -i my-Install.pp#012 Nov 21 13:42:02 tbowling audit[15977]: AVC avc: denied { search } for pid=15977 comm="ps" name="lightdm" dev="dm-1" ino=51041272 scontext=system_u:system_r:pcp_pmie_t:s0 tcontext=system_u:object_r:xserver_log_t:s0 tclass=dir permissive=0 Nov 21 13:42:02 tbowling audit[15997]: AVC avc: denied { search } for pid=15997 comm="ps" name="lightdm" dev="dm-1" ino=51041272 scontext=system_u:system_r:pcp_pmie_t:s0 tcontext=system_u:object_r:xserver_log_t:s0 tclass=dir permissive=0 Nov 21 13:42:02 tbowling pmie[15947]: /usr/share/pcp/lib/pmie: Warning: Performance Co-Pilot Inference Engine (pmie) not permanently enabled. Nov 21 13:42:02 tbowling pmie[15947]: To enable pmie, run the following as root: Nov 21 13:42:02 tbowling pmie[15947]: # /usr/bin/systemctl enable pmie.service Nov 21 13:42:02 tbowling pmie[15947]: Starting pmie ... Nov 21 13:42:02 tbowling audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=pmie comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Nov 21 13:55:01 tbowling audit[16924]: USER_ACCT pid=16924 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_access,pam_unix,pam_localuser acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 21 13:55:01 tbowling audit[16924]: CRED_ACQ pid=16924 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 21 13:55:01 tbowling audit[16930]: USER_ACCT pid=16930 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Nov 21 13:55:01 tbowling audit[16930]: USER_START pid=16930 uid=0 auid=985 ses=540 subj=system_u:system_r:init_t:s0 msg='op=PAM:session_open grantors=pam_selinux,pam_selinux,pam_loginuid,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Nov 21 13:55:01 tbowling systemd[1]: Started Session 539 of user pcp. Nov 21 13:55:01 tbowling audit[16924]: USER_START pid=16924 uid=0 auid=985 ses=539 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_loginuid,pam_keyinit,pam_limits,pam_systemd acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 21 13:55:01 tbowling audit[16924]: CRED_REFR pid=16924 uid=0 auid=985 ses=539 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 21 13:55:02 tbowling audit[16924]: CRED_DISP pid=16924 uid=0 auid=985 ses=539 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 21 13:55:02 tbowling audit[16924]: USER_END pid=16924 uid=0 auid=985 ses=539 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_close grantors=pam_loginuid,pam_keyinit,pam_limits,pam_systemd acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 21 13:58:01 tbowling audit[17668]: USER_ACCT pid=17668 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_access,pam_unix,pam_localuser acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 21 13:58:01 tbowling audit[17668]: CRED_ACQ pid=17668 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 21 13:58:02 tbowling audit[17673]: USER_ACCT pid=17673 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Nov 21 13:58:02 tbowling audit[17673]: USER_START pid=17673 uid=0 auid=985 ses=542 subj=system_u:system_r:init_t:s0 msg='op=PAM:session_open grantors=pam_selinux,pam_selinux,pam_loginuid,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Nov 21 13:58:02 tbowling systemd[1]: Started Session 541 of user pcp. Nov 21 13:58:02 tbowling audit[17668]: USER_START pid=17668 uid=0 auid=985 ses=541 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_loginuid,pam_keyinit,pam_limits,pam_systemd acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 21 13:58:02 tbowling audit[17668]: CRED_REFR pid=17668 uid=0 auid=985 ses=541 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 21 13:58:02 tbowling audit[17668]: CRED_DISP pid=17668 uid=0 auid=985 ses=541 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 21 13:58:02 tbowling audit[17668]: USER_END pid=17668 uid=0 auid=985 ses=541 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_close grantors=pam_loginuid,pam_keyinit,pam_limits,pam_systemd acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 21 14:25:01 tbowling audit[19925]: USER_ACCT pid=19925 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_access,pam_unix,pam_localuser acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 21 14:25:01 tbowling audit[19925]: CRED_ACQ pid=19925 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 21 14:25:01 tbowling audit[19930]: USER_ACCT pid=19930 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='op=PAM:accounting grantors=pam_unix,pam_localuser acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Nov 21 14:25:01 tbowling audit[19930]: USER_START pid=19930 uid=0 auid=985 ses=544 subj=system_u:system_r:init_t:s0 msg='op=PAM:session_open grantors=pam_selinux,pam_selinux,pam_loginuid,pam_keyinit,pam_limits,pam_systemd,pam_unix acct="pcp" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Nov 21 14:25:01 tbowling systemd[1]: Started Session 543 of user pcp. Nov 21 14:25:01 tbowling audit[19925]: USER_START pid=19925 uid=0 auid=985 ses=543 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_loginuid,pam_keyinit,pam_limits,pam_systemd acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' Nov 21 14:25:01 tbowling audit[19925]: CRED_REFR pid=19925 uid=0 auid=985 ses=543 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_unix acct="pcp" exe="/usr/sbin/crond" hostname=? addr=? terminal=cron res=success' pcp-4.2.0-1.fc29.x86_64 [root@tbowling tbowling.localdomain]# pwd /var/log/pcp/pmlogger/tbowling.localdomain [root@tbowling tbowling.localdomain]# du -h *|sort -h; du -h 4.0K Latest 36K pmlogger.log 36K pmlogger.log.prior 84K 20181121.index 128K 20181123.00.10.index 216K 20181122.index 22M 20181123.00.10.meta 29M 20181121.meta 56M 20181122.meta 60M 20181122.26 68M 20181123.00.10.9 101M 20181122.0 101M 20181122.1 101M 20181122.10 101M 20181122.11 101M 20181122.12 101M 20181122.13 101M 20181122.14 101M 20181122.15 101M 20181122.16 101M 20181122.17 101M 20181122.18 101M 20181122.19 101M 20181122.2 101M 20181122.20 101M 20181122.21 101M 20181122.22 101M 20181122.23 101M 20181122.24 101M 20181122.25 101M 20181122.3 101M 20181122.4 101M 20181122.5 101M 20181122.6 101M 20181122.7 101M 20181122.8 101M 20181122.9 101M 20181123.00.10.0 101M 20181123.00.10.1 101M 20181123.00.10.2 101M 20181123.00.10.3 101M 20181123.00.10.4 101M 20181123.00.10.5 101M 20181123.00.10.6 101M 20181123.00.10.7 101M 20181123.00.10.8 1.2G 20181121.0 4.8G .
I set selinux to permissive, reinstalled all of the pcp packages, and restarted the services. This results in far fewer errors and looks cleaner. I have pmlogger set to keep 1 day of logs. I still have a 1.2 GB file from the 21st laying around. Nov 22 has 2.6 GB of logs not compressed. I understand it should not compress until tomorrow, but still concerning that 1 day's logs are so large. Many customers might find this concerning as I assume at the end of today before they get compressed, Nov 22 + 23 will consume >5G of space for normal operations. This could be a problem in the cloud and large virt environments where space is a premium.
Terry, if things are working properly, the logs should be compressed within the same day. Well before the 1Gb+ sizes you're seeing. % date Fri Nov 23 10:16:38 EST 2018 % ls -lh /var/log/pcp/pmlogger/`hostname`/20181123* -rw-r--r--. 1 pcp pcp 70K Nov 23 09:31 /var/log/pcp/pmlogger/toium/20181123.09.30.0.xz -rw-r--r--. 1 pcp pcp 412 Nov 23 09:31 /var/log/pcp/pmlogger/toium/20181123.09.30.index -rw-r--r--. 1 pcp pcp 51K Nov 23 09:31 /var/log/pcp/pmlogger/toium/20181123.09.30.meta.xz -rw-r--r--. 1 pcp pcp 57M Nov 23 10:16 /var/log/pcp/pmlogger/toium/20181123.09.31.0 -rw-r--r--. 1 pcp pcp 5.7K Nov 23 10:16 /var/log/pcp/pmlogger/toium/20181123.09.31.index -rw-r--r--. 1 pcp pcp 1.7M Nov 23 10:16 /var/log/pcp/pmlogger/toium/20181123.09.31.meta
Hi Terry, I think the only way we'll figure this one out is to grab an extensive tarball of all the relevant directories on your system and then unpack it on an f28 dev machine to repro it. Can you give me a temp login to grab that tarball? Please give it whatever prio you can - we obviously need the log management scripts and compression/culling to be bullet-proof! Cheers -- Mark
Mark, As we discussed, I have since upgraded to F29. As you requested, I executed the following command: cd /; sudo tar czf /tmp/pcp-tarball.tgz etc/{cron.d,pcp,hosts,hostname,sysconfig} usr/lib/systemd/system/pm* var/lib/pcp usr/libexec/pcp usr/share/pcp var/lib/pcp var/log The resulting file pcp-tarball.tgz has been uploaded to guest.redhat.com and I have verified that md5sums match.
Thanks Terry - I've examined the tarball, with the following notes: 1. errors in some of the pmlogger daily/check cronjob logs: [mgoodwin@pcp pmlogger]$ cat pmlogger_daily.log.prev pmlogextract: Error: __pmLogRead[log 20181108.00.10]: Corrupted record in a PCP archive log pmlogextract: Error occurred at byte offset 0 into a file of 23552000 bytes. The last record, and the remainder of this file will not be extracted. [mgoodwin@pcp pmlogger]$ cat pmlogger_daily.log pmlogextract: New log volume 1, at 22:07:06.924 [mgoodwin@pcp pmlogger]$ cat pmlogger_check.log.prev Restarting primary pmlogger for host "local:" ... [process 8584] done Latest folio created for 20181126.07.01 cat: write error: Broken pipe The first one, "Corrupted record" can cause issues with the script completing it's log merging. This is probably consequential to running out of filesystem space on /var on or before Nov 8th. The second one, "cat: write error: Broken pipe" has been seen recently on other systems (including mine and nathan's!). Not sure if it's benign, but needs further investigation. 2. Other than the above, the only non-standard config changes in your PCP set up are : $PCP_COMPRESSAFTER=1 in /etc/pcp/pmlogger/control.d/local (this overrides the default) and /etc/sysconfig/pmlogger has PMLOGGER_LOCAL=1 enabled (i.e. comment removed) I thought this was supposed to be enabled anyway, but a virgin install on an f29 VM with pcp-4.2 still had it commented out. 3. I have made both changes listed in (2) on my f29 VM / pcp-4.2 and will let it soak for a few days Regards
Created attachment 1511187 [details] patch for whole day semantics for $PCP_COMPRESSAFTER The issue has been reproduced on the test VM after several days soak. The root cause appears to be the semantics of the find -mtime option, where the default semantics are N * 24hours before the current time. The desirable semantics are 24 hours before the start of the current day, i.e. "number of days") before archive files are compressed. The fix under test is to use find -daystart -mtime N where N is the desired number of days to hold off compressing PCP archive logs managed by the PCP log rotation scripts (pmlogger_daily et al). Since not all platforms support find -daystart, we test for it early in the script and only use it if it's available. For platforms that do not support this option (known to be BSD derivatives), we are already handling the find options specially anyway, so there will be no change on those platforms. For QA, all tests in groups 'sanity', 'pmlogger' and 'logutil' are passing with the change applied. Applicable new QA involves at least a week of soak testing on assorted platforms - starting with the test f29 VM.
Created attachment 1511470 [details] patch (v2) for whole day semantics for $PCP_COMPRESSAFTER Updated patch, quoting wasn't quite right.
Created attachment 1522563 [details] reduce per-process logging filesystem space requirements New patch, as committed upstream for pcp-4.3.0. This obsoletes the "whole day patch", which was faulty.
pcp-4.3.2-1.fc30 has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-89ed2e2383
pcp-4.3.2-1.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-30c683f0ae
pcp-4.3.2-1.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2019-2f04e77d53
pcp-4.3.2-1.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-89ed2e2383
pcp-4.3.2-1.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-2f04e77d53
pcp-4.3.2-1.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-30c683f0ae
This message is a reminder that Fedora 28 is nearing its end of life. On 2019-May-28 Fedora will stop maintaining and issuing updates for Fedora 28. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '28'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 28 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
pcp-4.3.2-1.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.
pcp-4.3.2-1.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.
pcp-4.3.2-1.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.