Summary: | Nagios regularly crashes with SIGSEGV after couple of weeks of starting. | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora EPEL | Reporter: | Baybars <baybars> | ||||
Component: | nagios | Assignee: | Stephen John Smoogen <smooge> | ||||
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | el6 | CC: | affix, ajz, athmanem, b.heden, herrold, jose.p.oliveira.oss, lemenkov, linux, shawn.starr, smooge, smooge, s, swilkerson, zeltax3 | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | nagios-4.4.3-1.fc28 nagios-4.4.3-1.fc29 nagios-4.4.3-1.el6 nagios-4.4.3-1.el7 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-01-30 01:32:02 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Attachments: |
|
Description
Baybars
2017-10-26 01:40:41 UTC
Thanks. I will see if this enough for upstream to find a fix. Could you try the nagios in epel-test to see if they fixed it in the meantime and if they did could you give it a +1 in karma so I know it works. I have not seen anything from upstream on this, and I have not been able to replicate on my EL6 nagios system yet. Did the updates fix it for you? Hi Stephen, we had to restart nagios (on the 15th of Nov.) after the updates from the test repo were applied, so we haven't got past the ~two week mark. Will update once we know more. Thanks for looking into it! This morning we noticed we wern't getting nagios notifications anymore, and checked the nagios log file; basically it was unable to run any checks with: [1511269200] Unable to run check for service 'sssd' on host 'letter2' [1511269200] Unable to run check for service 'crond-procs' on host 'silk1' [1511269200] Unable to run check for service 'syslog-ng-procs' on host 'syslog1' [1511269200] Unable to run check for service 'memory' on host 'marathon1' [1511269200] Unable to run check for service 'munin-asyncd' on host 'mars' [1511269200] Unable to run check for service 'munin-asyncd' on host 'thm-tsta-vm2' [1511269200] Unable to run check for service 'disk-space-free' on host 'milton2' ...etc. There wasn't any OOM messages in the kernel log, but looking at the munin graphs for the nagios host, we can see that after the epel-test version of nagios was installed, memory and swap usage ramps up considerably. Unfortunately I was unable to get a pstack to help the case. Hmmm I am not sure what could be causing that. How many checks and number of hosts are being looked at? Our couple of hundred hosts inside of Fedora is able to run in 40 MB process space. If you can get more info I would appreciate it. Hi Stephen, Sorry, just seen your response now. We had another crash but the backtrace looks the same. The numbers are: # Active Host / Service Checks: 544 / 12304 # Passive Host / Service Checks: 0 / 1948 We have a livestatus broker in nagios (which was working without issues in version 3.x) which I disabled in our test system to see if that would have an effect. Disabling livestatus did not seem to help, we are still seeing a memory leak. I can confirm various other reports of a memory leak upstream. We haven't found the cause or fix yet, but we have been able to reproduce it internally. Any other data that you can supply may be helpful in producing a fix. Are you able to set debug_verbosity to 2 and debug_level to -1 by chance? I would only suggest this if you have the disk space to spare. We're also seeing both of these behaviors (segfault and memory leak) with 451 hosts and 8286 service active checks on this instance. The only additional item is pnp4nagios processing perfdata. After the update to Nagios 4, we began experiencing segfaults after about 2 weeks of Nagios running. (This bug wasn't yet here when I implemented a script to handle restarting Nagios if it logged a segfault as a temporary measure.) And we've begun to see the check failures and now the memory leak (evidenced by exhaustion of virtual memory, with Nagios using over 3GB of memory after about 1 week of runtime), since the update from 4.3.2-5.el6 to 4.3.4-4.el6 (installed on 2017-12-7). We'll be rebooting the system with the latest kernel today and monitor from there. Is there any particular information that would be helpful in diagnosing this? Are you using the neb module for pnp? If so, which version are you using? I think just the debug log after a segfault and turned on to log everything would be rather useful. nagios-4.4.2-3.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2018-0346a55d0f nagios-4.4.2-3.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2018-42555731d2 nagios-4.4.2-3.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-70fe6a4d75 nagios-4.4.2-3.el6 has been submitted as an update to Fedora EPEL 6. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2018-61fe7c6e70 nagios-4.4.2-3.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-70fe6a4d75 nagios-4.4.2-3.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2018-0346a55d0f nagios-4.4.2-3.el6 has been pushed to the Fedora EPEL 6 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2018-61fe7c6e70 nagios-4.4.2-3.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-42555731d2 nagios-4.4.3-1.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-d661b588d2 nagios-4.4.3-1.el6 has been submitted as an update to Fedora EPEL 6. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-17b388679b nagios-4.4.3-1.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-376ecc221c nagios-4.4.3-1.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2019-0b44528ff1 nagios-4.4.3-1.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-d661b588d2 nagios-4.4.3-1.el6 has been pushed to the Fedora EPEL 6 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-17b388679b nagios-4.4.3-1.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-0b44528ff1 nagios-4.4.3-1.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-376ecc221c nagios-4.4.3-1.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report. nagios-4.4.3-1.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report. nagios-4.4.3-1.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report. nagios-4.4.3-1.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report. Created attachment 1865715 [details]
Log
|