| Summary: | gatherd: segfault in libmetricUnixProcess.so | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Milos Malik <mmalik> |
| Component: | sblim-gather | Assignee: | Vitezslav Crhonek <vcrhonek> |
| Status: | CLOSED ERRATA | QA Contact: | qe-baseos-daemons |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 6.2 | CC: | azelinka, kvolny, rvokal |
| Target Milestone: | rc | Keywords: | TestOnly |
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | sblim-gather-2.2.3-2.el6 | Doc Type: | Bug Fix |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-12-06 11:57:00 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
|
Description
Milos Malik
2011-09-01 14:09:37 UTC
I have to know the exact condition which triggers the segfault - at first it seemed I cannot reproduce it: .live.[root@s390x-6s-v1 tps]# service gatherer status gatherd is stopped reposd is stopped .live.[root@s390x-6s-v1 tps]# service gatherer start Starting gatherd: [ OK ] Starting reposd: [ OK ] .live.[root@s390x-6s-v1 tps]# service gatherer status gatherd (pid 51111) is running... reposd (pid 51121) is running... .live.[root@s390x-6s-v1 tps]# gatherctl s Status initialized and sampling, 12 plugins and 24 metrics. s Status initialized and sampling, 12 plugins and 24 metrics. s Status initialized and sampling, 12 plugins and 24 metrics. s Status initialized and sampling, 12 plugins and 24 metrics. s Status initialized and sampling, 12 plugins and 24 metrics. s Status initialized and sampling, 12 plugins and 24 metrics. s Status initialized and sampling, 12 plugins and 24 metrics. s Status initialized and sampling, 12 plugins and 24 metrics. s Status initialized and sampling, 12 plugins and 24 metrics. s Status initialized and sampling, 12 plugins and 24 metrics. s Status initialized and sampling, 12 plugins and 24 metrics. q .live.[root@s390x-6s-v1 tps]# service gatherer status gatherd (pid 51111) is running... reposd (pid 51121) is running... .live.[root@s390x-6s-v1 tps]# rpm -q sblim-gather sblim-gather-2.2.3-1.el6.s390x but then, after a while, suddenly it became dead: .live.[root@s390x-6s-v1 tps]# service gatherer status gatherd dead but subsys locked reposd (pid 51121) is running... now how long exactly I have to wait, or what action should I take, to be sure that the new version doesn't crash? I tried to run the checks periodically and it seems the daemon dies when the system clock hits a new minute (0 seconds) - for example: Sep 8 12:10:00 x86-64-6s-m1 kernel: gatherd[13777]: segfault at 0 ip 00007f385b0622ba sp 00007f385a453cc0 error 6 in libmetricUnixProcess.so[7f385b060000+3000] - is that the reason, is it enough to wait 61 seconds (in the worst case) then? (In reply to comment #4) > > now how long exactly I have to wait, or what action should I take, to be sure > that the new version doesn't crash? > > I tried to run the checks periodically and it seems the daemon dies when the > system clock hits a new minute (0 seconds) - for example: > > Sep 8 12:10:00 x86-64-6s-m1 kernel: gatherd[13777]: segfault at 0 ip > 00007f385b0622ba sp 00007f385a453cc0 error 6 in > libmetricUnixProcess.so[7f385b060000+3000] > > - is that the reason, is it enough to wait 61 seconds (in the worst case) then? The sampling function (metricRetrCPUTime) which caused segfault is called periodically by the daemon every 60 seconds. So it should be okay to wait 61 seconds (or more if you want to see it survive more iterations). Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1593.html |