Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
.PCP now reports all process details on large systems
Previously, the Performance Co-Pilot (PCP) toolkit failed to report certain process details in some cases on very large systems. The code reading the process details files was changed so that it can read data of arbitrary length, instead of only the first 1024 bytes. As a result, the described PCP error can no longer happen.
Thanks for the report Kyle, it is indeed a bug - the buffer is only 1K. The simplistic patch would just increase that to 8K (or so), but a looping read would be better. Related code should also be audited, which I'll undertake too.
Regards
(In reply to Mark Goodwin from comment #2)
> Thanks for the report Kyle, it is indeed a bug - the buffer is only 1K. The
> simplistic patch would just increase that to 8K (or so), but a looping read
> would be better.
I agree. The looping read doesn't look like the simplest change based on the current implementation, but much more effective than increasing the buffer.
Thanks for picking it up so quickly!
- Kyle Walker
upstream fix posted (still need a QA test). See:
https://github.com/goodwinos/pcp/commit/73ce19502de10cad6c8a10b2cc7d787bb4717575
commit 73ce19502de10cad6c8a10b2cc7d787bb4717575
Author: Mark Goodwin <mgoodwin>
Date: Fri Jul 20 16:24:45 2018 +1000
pmdaproc: rework fetch_proc_pid_stat to use a looping read
RHBZ #1600262 https://bugzilla.redhat.com/show_bug.cgi?id=1600262
fetch_proc_pid_stat() was reading /proc/PID/status with a fixed
size 1024 byte buffer. This is no longer big enough on most
machines, causing some metrics to have incorrect values; notably
the Mems_allowed, Mems_allowed_list, voluntary_ctxt_switches and
nonvoluntary_ctxt_switches fields were not being read or parsed.
Rework this to use a looping read with a suitably sized malloc'd
buffer. QA test forthcoming.
modified: src/pmdas/linux_proc/proc_pid.c
Additional fixes and QA tests for this BZ that will be in pcp-4.1.1
commit a3f27c07ea7f480c16070ada22835375d7c311a0
Author: Mark Goodwin <mgoodwin>
Date: Sun Jul 29 16:22:52 2018 +1000
pmdaproc: tweak proc.memory.maps to return empty string for kernel workers
Recent changes broke qa/860 by returning -ENODATA when a process
has zero length /proc/PID/maps. WHilst this is technically correct,
it's not historically correct and we need to clear the error and
instead return a zero length string (as pmdaproc has always done).
commit a5a9195e2c16dd042e547001d67ab47c5b50b3ad
Author: Mark Goodwin <mgoodwin>
Date: Tue Jul 24 13:41:21 2018 +1000
pmdaproc: additional proc_pid rework to fix short buffer issues
RHBZ #1600262
Fix short buffer/read issues with some additional proc metrics.
proc.psinfo.environ (/proc/PID/environ), proc.memory.maps
(/proc/PID/maps), proc.psinfo.nvctxsw and proc.psinfo.vctxsw
(/proc/PID/status) and a few others.
modified: src/pmdas/linux_proc/proc_pid.c
commit 12304c9d71711951b763bbdfd03a4834e0d03114
Author: Mark Goodwin <mgoodwin>
Date: Sun Jul 29 16:01:03 2018 +1000
qa: remake qa/1350 to check voluntary_ctxt_switches
nonvoluntary_ctxt_switches in /proc/PID/status can sometimes be zero
commit 0b1fb9c7d8e04173e1036a2186b96393727223e3
Author: Mark Goodwin <mgoodwin>
Date: Sun Jul 29 12:11:04 2018 +1000
qa: add test 1350 to test pmdaproc reading /proc/PID/status
RHBZ #1600262 https://bugzilla.redhat.com/show_bug.cgi?id=1600262
new file: qa/1350
new file: qa/1350.out
modified: qa/group
Comment 6Fedora Update System
2018-08-04 00:21:52 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHBA-2019:2111
Description of problem: The pmdaproc collector agent only reads the first 1024 bytes of the /proc/*/status files. This results, on exceptionally large systems, in metrics at the end of the file being unavailable intermittently. This can result in metrics being unavailable even for smaller systems. For example: # diff -yW70 <(cat /proc/1/status) <(head -c 1024 /proc/1/status) Name: systemd Name: systemd Umask: 0000 Umask: 0000 State: S (sleeping) State: S (sleeping) Tgid: 1 Tgid: 1 Ngid: 0 Ngid: 0 Pid: 1 Pid: 1 PPid: 0 PPid: 0 TracerPid: 0 TracerPid: 0 Uid: 0 0 0 Uid: 0 0 0 Gid: 0 0 0 Gid: 0 0 0 FDSize: 128 FDSize: 128 Groups: Groups: VmPeak: 193568 kB VmPeak: 193568 kB VmSize: 128032 kB VmSize: 128032 kB VmLck: 0 kB VmLck: 0 kB VmPin: 0 kB VmPin: 0 kB VmHWM: 6624 kB VmHWM: 6624 kB VmRSS: 6308 kB VmRSS: 6308 kB RssAnon: 2548 kB RssAnon: 2548 kB RssFile: 3760 kB RssFile: 3760 kB RssShmem: 0 kB RssShmem: 0 kB VmData: 84220 kB VmData: 84220 kB VmStk: 132 kB VmStk: 132 kB VmExe: 1408 kB VmExe: 1408 kB VmLib: 3716 kB VmLib: 3716 kB VmPTE: 124 kB VmPTE: 124 kB VmSwap: 0 kB VmSwap: 0 kB Threads: 1 Threads: 1 SigQ: 0/3860 | SigQ: 1/3860 SigPnd: 0000000000000000 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 7be3c0fe28014a03 SigBlk: 7be3c0fe28014a03 SigIgn: 0000000000001000 SigIgn: 0000000000001000 SigCgt: 00000001800004ec SigCgt: 00000001800004ec CapInh: 0000000000000000 CapInh: 0000000000000000 CapPrm: 0000001fffffffff CapPrm: 0000001fffffffff CapEff: 0000001fffffffff CapEff: 0000001fffffffff CapBnd: 0000001fffffffff CapBnd: 0000001fffffffff CapAmb: 0000000000000000 CapAmb: 0000000000000000 Seccomp: 0 Seccomp: 0 Speculation_Store_Bypass: Speculation_Store_Bypass: Cpus_allowed: 3 Cpus_allowed: 3 Cpus_allowed_list: 0-1 Cpus_allowed_list: 0-1 Mems_allowed: 00000000,00000 / Mems_allowed: 00000000,00000 Mems_allowed_list: 0 < voluntary_ctxt_switches: < nonvoluntary_ctxt_switches: < In the above, the proc.psinfo.{nvctxsw,vctxsw} entries will be 0. Version-Release number of selected component (if applicable): pcp-3.12.2-5.el7.x86_64 How reproducible: Easily Steps to Reproduce: 1. Install the pcp-zeroconf package # yum install pcp-zeroconf -y 2. Instantiate a hotproc configuration to match "systemd" # pmstore hotproc.control.config 'fname == "systemd"' 3. Verify the contents of the proc.psinfo.nvctxsw entries against the underlying file contents: # pminfo -f hotproc.psinfo.nvctxsw Actual results: # tail -1 /proc/1/status nonvoluntary_ctxt_switches: 2348 # pminfo -f hotproc.psinfo.nvctxsw hotproc.psinfo.nvctxsw inst [1 or "000001 /usr/lib/systemd/systemd"] value 0 Expected results: # tail -1 /proc/1/status nonvoluntary_ctxt_switches: 2348 # pminfo -f hotproc.psinfo.nvctxsw hotproc.psinfo.nvctxsw inst [1 or "000001 /usr/lib/systemd/systemd"] value 2348 Additional info: Though the above example is simplest to achieve, the issue can present itself as PM_ERR_APPVERSION for string metrics such as {,hot}proc.psinfo.cpusallowed if the system is sufficiently large enough the that Cpus_allowed_list field is beyond the 1024 bytes limit. Issue seems to be present upstream.