Bug 1832262
Summary: | pmie_daily.service runs as pcp but tries to do root things | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Uwe Menges <uwe.menges> |
Component: | pcp | Assignee: | Nathan Scott <nathans> |
Status: | CLOSED ERRATA | QA Contact: | Jan Kurik <jkurik> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 8.1 | CC: | agerstmayr, an.andreas+fedora, ariaanbruinsma, frank.ramsay, jkurik, kenj, mgoodwin, nathans, patrickm, rja |
Target Milestone: | rc | Keywords: | Bugfix, Triaged |
Target Release: | 8.1 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | pcp-5.1.1 | Doc Type: | No Doc Update |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2020-11-04 03:00:37 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Uwe Menges
2020-05-06 12:20:31 UTC
Thanks Uwe - I can see the problem, but don't understand why noone else (myself included) has ever observed this issue before! (makes me wonder if something has changed in systemd in this area, for this specific release). Anyway... The pmie_daily script changes to `pwd` in its inner loop and is indeed expecting to have been started with that set to somewhere readable for the user its run as (pcp). That would typically be: $ grep pcp: /etc/passwd pcp:x:386:386:Performance Co-Pilot:/var/lib/pcp:/sbin/nologin ... but clearly it has /root as `pwd`, so its in a half-baked state wrt running as pcp user. I see there is a way to direct systemd unit files to force the working directory, perhaps that's our best option here. I've pushed the fix below to upstream PCP and we'll ensure its in the next RHEL-8 release Uwe. In the meantime you should be able to workaround the issue locally by adding WorkingDirectory= lines to your local pmie_daily and pmie_check systemd unit files (/usr/lib/systemd/system/pmie_check.service and /usr/lib/systemd/system/pmie_daily.service) and then running 'systemctl daemon-reload'. Since its not failing for me, it would be excellent if you could confirm this resolves the problem as we expect. commit 9cc8e9ba741249d1d53813669ded2c247063d762 Author: Nathan Scott <nathans> Date: Thu May 7 12:54:32 2020 +1000 pmie,pmlogger_check,daily: set systemd working directory We've had a report of failure in pmie_daily related to attempting to set cwd to an unreadable directory (/root). Although not able to reproduce it locally, I believe the problem can be resolved by forcing systemd to set the working directory to pcp user accounts homedir (/var/lib/pcp) which is always readable by pcp. Resolves Red Hat BZ #1832262. Thanks! I edited both services with `systemctl edit`. Note that pmie_check.service wasn't listed in `systemctl --failed`. I no longer have the `cd` error message in /var/log/pcp/pmie/pmie_daily.log. However, the pmie_daily.service still fails and the log contains this: # cat /var/log/pcp/pmie/pmie_daily.log pmie_daily: Error: no pmie instance running for host "local:" [/etc/pcp/pmie/control.d/local:5] ... inference engine for host "local:" unchanged # systemctl status pmie_daily.service ● pmie_daily.service - Process PMIE logs Loaded: loaded (/usr/lib/systemd/system/pmie_daily.service; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/pmie_daily.service.d └─override.conf Active: failed (Result: exit-code) since Thu 2020-05-07 11:28:35 CEST; 57s ago Docs: man:pmie_daily(1) Process: 926361 ExecStart=/usr/libexec/pcp/bin/pmie_daily $PMIE_DAILY_PARAMS (code=exited, status=1/FAILURE) Main PID: 926361 (code=exited, status=1/FAILURE) May 07 11:28:35 (edited) systemd[1]: Starting Process PMIE logs... May 07 11:28:35 (edited) systemd[1]: pmie_daily.service: Main process exited, code=exited, status=1/FAILURE May 07 11:28:35 (edited) systemd[1]: pmie_daily.service: Failed with result 'exit-code'. May 07 11:28:35 (edited) systemd[1]: Failed to start Process PMIE logs. # systemctl cat pmie_daily.service # /usr/lib/systemd/system/pmie_daily.service [Unit] Description=Process PMIE logs Documentation=man:pmie_daily(1) ConditionPathExists=!/etc/cron.d/pcp-pmie [Service] Type=oneshot Environment="PMIE_DAILY_PARAMS=-X xz -x 3" EnvironmentFile=-/etc/sysconfig/pmie_timers ExecStart=/usr/libexec/pcp/bin/pmie_daily $PMIE_DAILY_PARAMS User=pcp [Install] RequiredBy=pmie.service # /etc/systemd/system/pmie_daily.service.d/override.conf [Service] WorkingDirectory=/var/lib/pcp This is saying that pmie service is enabled but the local pmie was noticed to be not running at daily log processing time. What's the output from 'systemctl status pmie pmie_check'? - these are the services that should be ensuring this pmie is running at the time daily log processing starts. I have good news. First I'll answer your question: # systemctl status pmie pmie_check ● pmie.service - Performance Metrics Inference Engine Loaded: loaded (/usr/lib/systemd/system/pmie.service; enabled; vendor preset: disabled) Active: active (running) since Thu 2020-05-07 12:19:15 CEST; 24h ago Docs: man:pmie(1) Process: 26304 ExecStart=/usr/share/pcp/lib/pmie start (code=exited, status=0/SUCCESS) Main PID: 27216 (pmie) Tasks: 1 (limit: 3355442) CGroup: /system.slice/pmie.service └─27216 /usr/bin/pmie -b -P -l /var/log/pcp/pmie/(edited)/pmie.log -c config.default May 07 12:19:14 (edited) systemd[1]: Starting Performance Metrics Inference Engine... May 07 12:19:15 (edited) pmie[26304]: Starting pmie ... May 07 12:19:15 (edited) systemd[1]: pmie.service: Can't open PID file /run/pcp/pmie.pid (yet?) after start: No such file or directory May 07 12:19:15 (edited) systemd[1]: pmie.service: Supervising process 27216 which is not our child. We'll most likely not notice when it exits. May 07 12:19:15 (edited) systemd[1]: Started Performance Metrics Inference Engine. ● pmie_check.service - Check PMIE instances are running Loaded: loaded (/usr/lib/systemd/system/pmie_check.service; static; vendor preset: disabled) Drop-In: /etc/systemd/system/pmie_check.service.d └─override.conf Active: inactive (dead) since Fri 2020-05-08 12:58:00 CEST; 1min 57s ago Docs: man:pmie_check(1) Process: 472455 ExecStart=/usr/libexec/pcp/bin/pmie_check $PMIE_CHECK_PARAMS (code=exited, status=0/SUCCESS) Main PID: 472455 (code=exited, status=0/SUCCESS) May 08 12:58:00 (edited) systemd[1]: Starting Check PMIE instances are running... May 08 12:58:00 (edited) systemd[1]: Started Check PMIE instances are running. Also the pmie_daily.service didn't fail when called by timer tonight: # systemctl status pmie_daily.timer pmie_daily.service ● pmie_daily.timer - Daily processing of PMIE logs Loaded: loaded (/usr/lib/systemd/system/pmie_daily.timer; disabled; vendor preset: disabled) Active: active (waiting) since Thu 2020-05-07 12:19:05 CEST; 24h ago Trigger: Sat 2020-05-09 00:08:00 CEST; 11h left May 07 12:19:05 (edited) systemd[1]: Started Daily processing of PMIE logs. ● pmie_daily.service - Process PMIE logs Loaded: loaded (/usr/lib/systemd/system/pmie_daily.service; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/pmie_daily.service.d └─override.conf Active: inactive (dead) since Fri 2020-05-08 00:08:00 CEST; 12h ago Docs: man:pmie_daily(1) Process: 263279 ExecStart=/usr/libexec/pcp/bin/pmie_daily $PMIE_DAILY_PARAMS (code=exited, status=0/SUCCESS) Main PID: 263279 (code=exited, status=0/SUCCESS) May 08 00:08:00 (edited) systemd[1]: Starting Process PMIE logs... May 08 00:08:00 (edited) systemd[1]: Started Process PMIE logs. The log `/var/log/pcp/pmie/pmie_daily.log` doesn't exist anymore, and yesterday's log has been rotated to `.prev`. So I think the change works! I don't know why it didn't work when (re)starting it manually yesterday. Yours, Uwe Verifid on pcp-5.1.1-3.el8 build. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (pcp bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4684 I seem to have the same issue in Fedora 38 Server. After I enabled it in Cockpit in: Overview -> Usage ("View metrics and history") -> Enable Metrics Haven't found a way to disable it though. (remove service through one click) (In reply to Ariaan from comment #12) > I seem to have the same issue in Fedora 38 Server. > > After I enabled it in Cockpit in: > > Overview -> Usage ("View metrics and history") -> Enable Metrics > > Haven't found a way to disable it though. (remove service through one click) I encountered the same problem. It seems to be related to the log rotation and one file in /var/log/pcp/pmie/YOURPCNAME/ ending up as being owned by root instead of pcp. After changing the owner accordingly, the service runs fine until at some point a new file is generated incorrectly again (after a server restart?). Hmm, thanks - can you say which specifically file it was in that directory, owned as root? (In reply to Nathan Scott from comment #14) > Hmm, thanks - can you say which specifically file it was in that directory, > owned as root? Sure, today it was: pmie.log.20230607 Previously, on May 31 to be exact, I even had the last two rotated log files owned by root: -rw-r--r--. 1 pcp pcp 100 31. Mai 11:13 pmie.log -rw-r--r--. 1 pcp pcp 158 30. Mai 00:08 pmie.log.20230529 -rw-r--r--. 1 root root 380 30. Mai 14:48 pmie.log.20230530 -rw-r--r--. 1 root root 194 31. Mai 11:13 pmie.log.20230531 Thanks Andreas, I'm auditing the code and discussing with upstream developers to see if we can find where/how this is happening. This is really a new issue, but I'll continue here as the discussion was started here. The pmie.log.YYYYMMDD renaming of pmie's log files was introduced via commit https://github.com/performancecopilot/pcp/commit/d85efb855b94615eebed7accf4c68d696ba1d25e on 21 Mar 2023 and this introduced a new bug as described above in https://bugzilla.redhat.com/show_bug.cgi?id=1832262#c12 What's being observed here is a corner-case when pmie_check is run (usually from systemd) and pmie.log exists, but there is not pmie.log.YYYYMMDD file for TODAY. In other circumstances pmie_check does the "right" thing, and the related pmie_daily script is also doing the "right" thing. The problem has been fixed upstream in commit https://github.com/performancecopilot/pcp/commit/5cab8aa53e204f3c2b67491b97851b0499816463 this morning. |