Description of problem: Nagios regularly crashes with SIGSEGV after couple of weeks of starting. This started to happen after epel released nagios-4.3.2-5.el6 Version-Release number of selected component (if applicable): nagios-4.3.2-5.el6.i686 How reproducible: We haven't been able to trigger the bug on demand, but have a core dump and backtrace. Steps to Reproduce: 1. 2. 3. Actual results: Backtrace is: (gdb) bt #0 0x001e1a9f in __strlen_ia32 () from /lib/libc.so.6 #1 0x001ac14f in vfprintf () from /lib/libc.so.6 #2 0x00264c32 in __vasprintf_chk () from /lib/libc.so.6 #3 0x00264b66 in __asprintf_chk () from /lib/libc.so.6 #4 0x0808112c in asprintf (mac=0xbfaccd3c) at /usr/include/bits/stdio2.h:158 #5 add_macrox_environment_vars_r (mac=0xbfaccd3c) at ../common/macros.c:3305 #6 macros_to_kvv (mac=0xbfaccd3c) at ../common/macros.c:3251 #7 0x0805e35a in wproc_run_job (job=0xc6b57e0, mac=<value optimized out>) at workers.c:1036 #8 0x080634d7 in run_async_service_check (svc=0xc4f0ea8, check_options=0, latency=0, scheduled_check=1, reschedule_check=1, time_is_valid=0xbfacd0d0, preferred_time=0xbfacd0d8) at checks.c:306 #9 0x08063891 in run_scheduled_service_check (svc=0xc4f0ea8, check_options=0, latency=0) at checks.c:90 #10 0x080746c1 in handle_timed_event (event=0x9a55718) at events.c:1171 #11 0x08078023 in event_execution_loop () at events.c:1110 #12 0x08058c88 in main (argc=3, argv=0xbfacd514) at nagios.c:814 (gdb) Expected results: Additional info:
Thanks. I will see if this enough for upstream to find a fix. Could you try the nagios in epel-test to see if they fixed it in the meantime and if they did could you give it a +1 in karma so I know it works.
I have not seen anything from upstream on this, and I have not been able to replicate on my EL6 nagios system yet. Did the updates fix it for you?
Hi Stephen, we had to restart nagios (on the 15th of Nov.) after the updates from the test repo were applied, so we haven't got past the ~two week mark. Will update once we know more. Thanks for looking into it!
This morning we noticed we wern't getting nagios notifications anymore, and checked the nagios log file; basically it was unable to run any checks with: [1511269200] Unable to run check for service 'sssd' on host 'letter2' [1511269200] Unable to run check for service 'crond-procs' on host 'silk1' [1511269200] Unable to run check for service 'syslog-ng-procs' on host 'syslog1' [1511269200] Unable to run check for service 'memory' on host 'marathon1' [1511269200] Unable to run check for service 'munin-asyncd' on host 'mars' [1511269200] Unable to run check for service 'munin-asyncd' on host 'thm-tsta-vm2' [1511269200] Unable to run check for service 'disk-space-free' on host 'milton2' ...etc. There wasn't any OOM messages in the kernel log, but looking at the munin graphs for the nagios host, we can see that after the epel-test version of nagios was installed, memory and swap usage ramps up considerably. Unfortunately I was unable to get a pstack to help the case.
Hmmm I am not sure what could be causing that. How many checks and number of hosts are being looked at? Our couple of hundred hosts inside of Fedora is able to run in 40 MB process space. If you can get more info I would appreciate it.
Hi Stephen, Sorry, just seen your response now. We had another crash but the backtrace looks the same. The numbers are: # Active Host / Service Checks: 544 / 12304 # Passive Host / Service Checks: 0 / 1948 We have a livestatus broker in nagios (which was working without issues in version 3.x) which I disabled in our test system to see if that would have an effect.
Disabling livestatus did not seem to help, we are still seeing a memory leak.
I can confirm various other reports of a memory leak upstream. We haven't found the cause or fix yet, but we have been able to reproduce it internally. Any other data that you can supply may be helpful in producing a fix. Are you able to set debug_verbosity to 2 and debug_level to -1 by chance? I would only suggest this if you have the disk space to spare.
We're also seeing both of these behaviors (segfault and memory leak) with 451 hosts and 8286 service active checks on this instance. The only additional item is pnp4nagios processing perfdata. After the update to Nagios 4, we began experiencing segfaults after about 2 weeks of Nagios running. (This bug wasn't yet here when I implemented a script to handle restarting Nagios if it logged a segfault as a temporary measure.) And we've begun to see the check failures and now the memory leak (evidenced by exhaustion of virtual memory, with Nagios using over 3GB of memory after about 1 week of runtime), since the update from 4.3.2-5.el6 to 4.3.4-4.el6 (installed on 2017-12-7). We'll be rebooting the system with the latest kernel today and monitor from there. Is there any particular information that would be helpful in diagnosing this?
Are you using the neb module for pnp? If so, which version are you using? I think just the debug log after a segfault and turned on to log everything would be rather useful.
nagios-4.4.2-3.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2018-0346a55d0f
nagios-4.4.2-3.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2018-42555731d2
nagios-4.4.2-3.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-70fe6a4d75
nagios-4.4.2-3.el6 has been submitted as an update to Fedora EPEL 6. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2018-61fe7c6e70
nagios-4.4.2-3.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-70fe6a4d75
nagios-4.4.2-3.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2018-0346a55d0f
nagios-4.4.2-3.el6 has been pushed to the Fedora EPEL 6 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2018-61fe7c6e70
nagios-4.4.2-3.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-42555731d2
nagios-4.4.3-1.el7 has been submitted as an update to Fedora EPEL 7. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-d661b588d2
nagios-4.4.3-1.el6 has been submitted as an update to Fedora EPEL 6. https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-17b388679b
nagios-4.4.3-1.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-376ecc221c
nagios-4.4.3-1.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2019-0b44528ff1
nagios-4.4.3-1.el7 has been pushed to the Fedora EPEL 7 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-d661b588d2
nagios-4.4.3-1.el6 has been pushed to the Fedora EPEL 6 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2019-17b388679b
nagios-4.4.3-1.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-0b44528ff1
nagios-4.4.3-1.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-376ecc221c
nagios-4.4.3-1.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.
nagios-4.4.3-1.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.
nagios-4.4.3-1.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.
nagios-4.4.3-1.el7 has been pushed to the Fedora EPEL 7 stable repository. If problems still persist, please make note of it in this bug report.
Created attachment 1865715 [details] Log