Bug 1758935
Summary: | check-mk-livestatus-1.4.0p31-2 crashed after "Get hosts" query | ||
---|---|---|---|
Product: | [Fedora] Fedora EPEL | Reporter: | TJ Yang <tjyang2001> |
Component: | check-mk | Assignee: | Orphan Owner <extras-orphan> |
Status: | CLOSED EOL | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | epel7 | CC: | andrea.veri, extras-orphan, tjyang2001 |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2024-07-09 02:56:31 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
TJ Yang
2019-10-06 22:40:19 UTC
livestatus.log did not show why livestatus module crashed after it got GET hosts request. [nagios@nagios03 ~]$ tail /var/log/nagios/livestatus.log 2019-10-06 12:01:04 [main] flushing log file index 2019-10-06 14:24:12 [main] socket thread has terminated 2019-10-06 14:24:12 [main] flushing log file index 2019-10-06 14:30:23 [client 2] request: GET hosts 2019-10-06 14:32:05 [main] socket thread has terminated 2019-10-06 14:32:05 [main] flushing log file index 2019-10-06 14:33:07 [client 1] request: GET hosts 2019-10-06 14:48:17 [main] socket thread has terminated 2019-10-06 14:48:17 [main] flushing log file index 2019-10-06 18:28:13 [client 1] request: GET hosts [nagios@nagios03 ~]$ * I manually compiled different version of livestatus.o from 1.2.8 up to latest 1.6 [nagios@nagios03 check_mk]$ ls -lrt total 175616 -rwxr-xr-x 1 root root 2806352 Jun 1 2018 livestatus.o.old.p31 -rwxrwxr-x 1 root root 31962768 Oct 6 20:12 livestatus-p31.o -rwxrwxr-x 1 root root 31981976 Oct 6 20:20 livestatus-p37.o -rwxrwxr-x 1 root root 41076080 Oct 6 20:42 livestatus-1.6-p2.o -rwxrwxr-x 1 root root 39231344 Oct 6 21:04 livestatusp-1.5-p21.o -rwxrwxr-x 1 root root 31953192 Oct 6 21:10 livestatus-1.4-p30.o -rwxrwxr-x 1 root root 802544 Oct 6 21:34 livestatus-1.2.8.o lrwxrwxrwx 1 root root 18 Oct 6 21:38 livestatus.o.1.2.8 -> livestatus-1.2.8.o lrwxrwxrwx 1 root root 20 Oct 6 21:41 livestatus.o -> livestatus-1.4-p30.o [nagios@nagios03 check_mk]$ * I changed debug=1 to debug=9 and I got more debug info below. [nagios@nagios03 ~]$ cat /var/log/nagios/livestatus.log 2019-10-06 21:55:01 [client 2] accepted client connection on fd 37 2019-10-06 21:55:01 [client 2] request: GET hosts 2019-10-06 21:55:01 [client 2] column hosts.groups is unrestricted 2019-10-06 21:55:01 [client 2] using full table scan [nagios@nagios03 ~]$ * All the versions I tried can not survive "GET hosts" from a production nagios which has 3k hosts. 'GET statehist' was ok. Looks like I have host defined with empty value that trigger what() to failed. But I don't know how to debug furhter since this command has no warning/errors : "nagios -v /etc/nagios/nagios.cfg" correction: only livestatus-1.2.8.o was able to withstand the 'GET hosts' LQL query without aborting. Asking help from upstream also: https://lists.mathias-kettner.de/pipermail/checkmk-en/2019-October/028889.html I downgraded livestatus further down to 1.2.6 so that adagios's web GUI can display comment and downtime records. See details at https://github.com/opinkerfi/adagios/issues/643 Another comment: The real solution is to locate which line of c++ code is aborting due to NULL value and fix the c++ code as suggested in https://bugzilla.redhat.com/show_bug.cgi?id=1758935 Correction of above URL mentioned. Wrong: https://bugzilla.redhat.com/show_bug.cgi?id=1758935 Correct: https://stackoverflow.com/questions/21068758/basic-string-s-construct-null-not-valid EPEL 7 entered end-of-life (EOL) status on 2024-06-30.\n\nEPEL 7 is no longer maintained, which means that it\nwill not receive any further security or bug fix updates.\n As a result we are closing this bug. |