Version-Release number of selected component: cockpit-pcp-0.60-1.fc22 Additional info: reporter: libreport-2.6.0 backtrace_rating: 4 cmdline: /usr/libexec/cockpit-pcp crash_function: __pmFindProfile executable: /usr/libexec/cockpit-pcp global_pid: 12215 kernel: 4.0.5-300.fc22.x86_64 runlevel: N 5 type: CCpp uid: 0 Truncated backtrace: Thread no. 1 (10 frames) #0 __pmFindProfile at profile.c:144 #1 __pmInProfile at profile.c:163 #2 __pmdaNextInst at callback.c:148 #3 pmdaFetch at callback.c:514 #4 linux_fetch at pmda.c:5714 #5 __pmFetchLocal at fetchlocal.c:131 #6 pmFetch at fetch.c:147 #7 cockpit_pcp_metrics_tick at src/bridge/cockpitpcpmetrics.c:349 #8 on_timeout_tick at src/bridge/cockpitmetrics.c:178 #13 g_main_context_iteration at gmain.c:3869
Created attachment 1043431 [details] File: backtrace
Created attachment 1043432 [details] File: cgroup
Created attachment 1043433 [details] File: core_backtrace
Created attachment 1043434 [details] File: dso_list
Created attachment 1043435 [details] File: environ
Created attachment 1043436 [details] File: limits
Created attachment 1043437 [details] File: maps
Created attachment 1043438 [details] File: mountinfo
Created attachment 1043439 [details] File: namespaces
Created attachment 1043440 [details] File: open_fds
Created attachment 1043441 [details] File: proc_pid_status
Created attachment 1043442 [details] File: var_log_messages
Fedora 22 changed to end-of-life (EOL) status on 2016-07-19. Fedora 22 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.
Created attachment 1263283 [details] Fedora 25 backtrace We still see this crash during cockpit tests on Fedora 25. I attach a current backtrace, but it looks almost the same as the original one. Relevant package versions: pcp-libs-3.11.8-2.fc25.x86_64 pcp-selinux-3.11.8-2.fc25.x86_64 pcp-3.11.8-2.fc25.x86_64 pcp-conf-3.11.8-2.fc25.x86_64 cockpit-pcp-134-1.fc25.x86_64 pcp-debuginfo-3.11.8-2.fc25.x86_64
It crashes in this loop: __pmInDomProfile * __pmFindProfile(pmInDom indom, const __pmProfile *prof) { __pmInDomProfile *p, *p_end; if (prof != NULL && prof->profile_len > 0) /* search for the profile entry for this instance domain */ for (p=prof->profile, p_end=p+prof->profile_len; p < p_end; p++) { if (p->indom == indom) /* found : an entry for this instance domain already exists */ return p; } /* not found */ return NULL; } "prof" looks valid, but it appears like `profile_len` has some bogus value and thus the iteration goes far beyond the real array: (gdb) p prof $16 = (const __pmProfile *) 0x5604723756e0 (gdb) p *prof $17 = {state = 1916235728, profile_len = 22020, profile = 0x560472332a60} (gdb) p p $18 = (__pmInDomProfile *) 0x560472398010 (gdb) p *p Cannot access memory at address 0x560472398010 (gdb) p sizeof(__pmProfile) $19 = 16 (gdb) p (p - prof->profile) $20 = 17298 I don't know the pcp code and thus I'm not sure how to interpret `profile_len`: But usually it's either the number of entries (and then 22020 is implausibly high) or it's the total array size. Each entry has 16 bytes, but 22020/16 == 1376.25. So it looks like some housekeeping error on the length? Can you please reopen this bug? I don't immediately see how this could be influenced from outside from cockpit-pcp, but we don't current have a standalone reproducer at hand.
This message is a reminder that Fedora 25 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 25. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '25'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 25 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Still happening on Fedora 26, moving version.
Reassigning to PCP. Version 4.x fixed a bunch of crashes, maybe it got fixed by now.
| We still see this crash during cockpit tests on Fedora 25. Martin, can you describe what the test is doing when this happens? From what I can tell, it looks like Cockpit is extracting live stats using the "local context" mode of operation in libpcp. Do you know which metrics would be fetched? Does it happen immediately, or only after some time? Do you know if the Cockpit code has loaded multiple DSO (local context) PMDAs here, or does it only use pmda_linux.so in this mode? And does anyone have a core dump I can examine in more detail? Thanks. I don't think this issue has been reported by others or observed by PCP maintainers, so at this stage I suspect pcp v4 will not resolve it unfortunately.
> Martin, can you describe what the test is doing when this happens? Nothing special actually - This happens randomly, spread out across tests that e. g. cover IPA (https://fedorapeople.org/groups/cockpit/logs/pull-8769-20180307-130707-dbbabf0f-verify-ubuntu-stable/log.html#35) or SSL connection settings (https://fedorapeople.org/groups/cockpit/logs/pull-8726-20180307-115757-31aa0ed6-verify-rhel-7/log.html#103) - in all of those it doesn't actually fail the tests, as none of them cover PCP - these just get spotted as after each test we check the journal for unexpected messages. As Cockpit's front page shows the host's performance metrics for CPU, memory, IO, and network, PCP is more or less involved in the cockpit startup of every test. But I'm afraid there's no particular action that triggers it, that just happens randomly. I'm following the development of bug 1550995, which most probably is closely related.
Reassigning, as discussed in BZ 1550995
This message is a reminder that Fedora 26 is nearing its end of life. Approximately 4 (four) weeks from now Fedora will stop maintaining and issuing updates for Fedora 26. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '26'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 26 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Moving to Fedora 27 for now, as there we definitively still see the crashes. We don't have these naughty overrides for Fedora 28, so it remains to be seen if it's still actually an issue there.
This message is a reminder that Fedora 27 is nearing its end of life. On 2018-Nov-30 Fedora will stop maintaining and issuing updates for Fedora 27. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as EOL if it remains open with a Fedora 'version' of '27'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version. Thank you for reporting this issue and we are sorry that we were not able to fix it before Fedora 27 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora, you are encouraged change the 'version' to a later Fedora version prior this bug is closed as described in the policy above. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete.
Fedora 27 changed to end-of-life (EOL) status on 2018-11-30. Fedora 27 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. If you are unable to reopen this bug, please file a new report against the current release. If you experience problems, please add a comment to this bug. Thank you for reporting this bug and we are sorry it could not be fixed.