Description of problem: Following command will crash nagios-4.4.3 server echo 'GET hosts' | unixcat /var/spool/nagios/cmd/livestatus Version-Release number of selected component (if applicable): # nagios [nagios@nagios03 ~]$ cat /etc/redhat-release CentOS Linux release 7.7.1908 (Core) [nagios@nagios03 ~]$ # check-mk installed [root@nagios03 ~]# rpm -qa |grep check-mk check-mk-1.4.0p31-2.el7.x86_64 check-mk-livestatus-1.4.0p31-2.el7.x86_64 [root@nagios03 ~]# # Nagios installed. [root@nagios03 ~]# rpm -qa |grep nagios-4 nagios-4.4.3-1.el7.x86_64 [root@nagios03 ~]# How to reproducible: Steps to Reproduce: 1. stop existing running nagios server(systemctl stop nagios) 2. make sure check-mk-livestatus is configured. [root@nagios03 ~]# grep livestatus /etc/nagios/nagios.cfg # /var/log/nagios/livestatus.log /var/spool/nagios/cmd/livestatus idle_timeout=12000 num_client_threads=20 debug=1 query_timeout=0 [root@nagios03 ~]# 3. Run "nagios /etc/nagios/nagios.cfg" on one vt100 to startup nagios without daemon mode. wproc: Registry request: name=Core Worker 13657;pid=13657 wproc: Registry request: name=Core Worker 13654;pid=13654 wproc: Registry request: name=Core Worker 13656;pid=13656 Event broker module '/usr/lib64/check_mk/livestatus.o' initialized successfully. 2019-10-06 18:28:05 [6] updating log file index 2019-10-06 18:28:05 [6] updating log file index 4. on another vt100 window run following command to query command. echo 'GET hosts' | unixcat /var/spool/nagios/cmd/livestatus 5. we will see following result Actual results: <snipped> wproc: Registry request: name=Core Worker 13654;pid=13654 wproc: Registry request: name=Core Worker 13656;pid=13656 Event broker module '/usr/lib64/check_mk/livestatus.o' initialized successfully. 2019-10-06 18:28:05 [6] updating log file index 2019-10-06 18:28:05 [6] updating log file index Successfully launched command file worker with pid 13668 terminate called after throwing an instance of 'std::logic_error' what(): basic_string::_S_construct null not valid Aborted [nagios@nagios03 ~]$ Expected results: nagios server process not crashed by check-mk-livestatus module. Additional info: same exact OS/check-mk VM's(nagios02t) won't crashed. but nagios02t only has a few test hosts. This issue looks exactly the same like https://bugzilla.redhat.com/show_bug.cgi?id=1585168 but I don't have "check_mk_objects.cfg" file. I have "/etc/nagios/conf.d/check_mk_templates.cfg" instead.
livestatus.log did not show why livestatus module crashed after it got GET hosts request. [nagios@nagios03 ~]$ tail /var/log/nagios/livestatus.log 2019-10-06 12:01:04 [main] flushing log file index 2019-10-06 14:24:12 [main] socket thread has terminated 2019-10-06 14:24:12 [main] flushing log file index 2019-10-06 14:30:23 [client 2] request: GET hosts 2019-10-06 14:32:05 [main] socket thread has terminated 2019-10-06 14:32:05 [main] flushing log file index 2019-10-06 14:33:07 [client 1] request: GET hosts 2019-10-06 14:48:17 [main] socket thread has terminated 2019-10-06 14:48:17 [main] flushing log file index 2019-10-06 18:28:13 [client 1] request: GET hosts [nagios@nagios03 ~]$
* I manually compiled different version of livestatus.o from 1.2.8 up to latest 1.6 [nagios@nagios03 check_mk]$ ls -lrt total 175616 -rwxr-xr-x 1 root root 2806352 Jun 1 2018 livestatus.o.old.p31 -rwxrwxr-x 1 root root 31962768 Oct 6 20:12 livestatus-p31.o -rwxrwxr-x 1 root root 31981976 Oct 6 20:20 livestatus-p37.o -rwxrwxr-x 1 root root 41076080 Oct 6 20:42 livestatus-1.6-p2.o -rwxrwxr-x 1 root root 39231344 Oct 6 21:04 livestatusp-1.5-p21.o -rwxrwxr-x 1 root root 31953192 Oct 6 21:10 livestatus-1.4-p30.o -rwxrwxr-x 1 root root 802544 Oct 6 21:34 livestatus-1.2.8.o lrwxrwxrwx 1 root root 18 Oct 6 21:38 livestatus.o.1.2.8 -> livestatus-1.2.8.o lrwxrwxrwx 1 root root 20 Oct 6 21:41 livestatus.o -> livestatus-1.4-p30.o [nagios@nagios03 check_mk]$ * I changed debug=1 to debug=9 and I got more debug info below. [nagios@nagios03 ~]$ cat /var/log/nagios/livestatus.log 2019-10-06 21:55:01 [client 2] accepted client connection on fd 37 2019-10-06 21:55:01 [client 2] request: GET hosts 2019-10-06 21:55:01 [client 2] column hosts.groups is unrestricted 2019-10-06 21:55:01 [client 2] using full table scan [nagios@nagios03 ~]$ * All the versions I tried can not survive "GET hosts" from a production nagios which has 3k hosts. 'GET statehist' was ok. Looks like I have host defined with empty value that trigger what() to failed. But I don't know how to debug furhter since this command has no warning/errors : "nagios -v /etc/nagios/nagios.cfg"
correction: only livestatus-1.2.8.o was able to withstand the 'GET hosts' LQL query without aborting.
Asking help from upstream also: https://lists.mathias-kettner.de/pipermail/checkmk-en/2019-October/028889.html
I downgraded livestatus further down to 1.2.6 so that adagios's web GUI can display comment and downtime records. See details at https://github.com/opinkerfi/adagios/issues/643
Another comment: The real solution is to locate which line of c++ code is aborting due to NULL value and fix the c++ code as suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1758935
Correction of above URL mentioned. Wrong: https://bugzilla.redhat.com/show_bug.cgi?id=1758935 Correct: https://stackoverflow.com/questions/21068758/basic-string-s-construct-null-not-valid
EPEL 7 entered end-of-life (EOL) status on 2024-06-30.\n\nEPEL 7 is no longer maintained, which means that it\nwill not receive any further security or bug fix updates.\n As a result we are closing this bug.