Description of problem: Pollers never terminate with latest cacti update and it ends up eating all system resources cacti 9802 0.0 0.0 450128 36672 ? S 10:45 0:00 /bin/php -q /usr/share/cacti.git/cmd.php --poller=1 --first=0 --last=7 --mibs cacti 9806 96.9 0.0 266760 25360 ? R 10:45 13:54 \_ /bin/php -q /usr/share/cacti.git/script_server.php cmd cacti 9804 0.0 0.0 450128 36544 ? S 10:45 0:00 /bin/php -q /usr/share/cacti.git/cmd.php --poller=1 --first=8 --last=8 --mibs cacti 9808 97.7 0.0 266760 25632 ? R 10:45 14:00 \_ /bin/php -q /usr/share/cacti.git/script_server.php cmd cacti 12193 0.0 0.0 450128 36748 ? S 10:50 0:00 /bin/php -q /usr/share/cacti.git/cmd.php --poller=1 --first=0 --last=7 --mibs cacti 12197 96.7 0.0 266760 25324 ? R 10:50 8:57 \_ /bin/php -q /usr/share/cacti.git/script_server.php cmd cacti 12195 0.0 0.0 450128 36492 ? S 10:50 0:00 /bin/php -q /usr/share/cacti.git/cmd.php --poller=1 --first=8 --last=8 --mibs cacti 12198 97.6 0.0 266760 25704 ? R 10:50 9:02 \_ /bin/php -q /usr/share/cacti.git/script_server.php cmd cacti 14757 0.1 0.0 450128 36432 ? S 10:55 0:00 /bin/php -q /usr/share/cacti.git/cmd.php --poller=1 --first=0 --last=7 --mibs cacti 14768 95.6 0.0 266760 25700 ? R 10:55 4:08 \_ /bin/php -q /usr/share/cacti.git/script_server.php cmd cacti 14760 0.0 0.0 450128 36452 ? S 10:55 0:00 /bin/php -q /usr/share/cacti.git/cmd.php --poller=1 --first=8 --last=8 --mibs cacti 14769 97.5 0.0 266760 25492 ? R 10:55 4:12 \_ /bin/php -q /usr/share/cacti.git/script_server.php cmd Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Hello, I'm not able to reproduce this. I think it's a specific problem with your installation. /usr/share/cacti.git doesn't exist in the official Fedora cacti RPM.
2018-01-28T14:32:38Z INFO Upgraded: php-json-7.2.2~RC1-3.fc28.x86_64 2018-01-28T14:32:39Z INFO Upgraded: php-common-7.2.2~RC1-3.fc28.x86_64 2018-01-28T14:32:42Z INFO Upgraded: php-cli-7.2.2~RC1-3.fc28.x86_64 2018-01-28T14:32:43Z INFO Upgraded: php-pdo-7.2.2~RC1-3.fc28.x86_64 2018-01-28T14:32:46Z INFO Upgraded: php-mysqlnd-7.2.2~RC1-3.fc28.x86_64 2018-01-28T14:32:52Z INFO Upgraded: php-xml-7.2.2~RC1-3.fc28.x86_64 2018-01-28T14:32:52Z INFO Upgraded: php-snmp-7.2.2~RC1-3.fc28.x86_64 2018-01-28T14:32:53Z INFO Upgraded: php-process-7.2.2~RC1-3.fc28.x86_64 2018-01-28T14:32:53Z INFO Upgraded: php-mbstring-7.2.2~RC1-3.fc28.x86_64 2018-01-28T14:32:54Z INFO Upgraded: php-ldap-7.2.2~RC1-3.fc28.x86_64 2018-01-28T14:32:54Z INFO Upgraded: php-intl-7.2.2~RC1-3.fc28.x86_64 2018-01-28T14:32:55Z INFO Upgraded: php-imap-7.2.2~RC1-3.fc28.x86_64 2018-01-28T14:32:56Z INFO Upgraded: php-gd-7.2.2~RC1-3.fc28.x86_64 2018-01-28T14:34:44Z INFO Upgraded: php-7.2.2~RC1-3.fc28.x86_64 2018-01-28T14:34:49Z INFO Upgraded: php-fpm-7.2.2~RC1-3.fc28.x86_64 2018-01-28T14:34:33Z INFO Upgraded: cacti-1.1.33-1.fc28.noarch
That's weird ... it started doing this when I updated. If I attach strace to the running poller process, it looks like it's looping on : --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f2e092bbeef} --- rt_sigreturn({mask=[]}) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f2e092bbeef} --- rt_sigreturn({mask=[]}) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f2e092bbeef} --- rt_sigreturn({mask=[]}) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f2e092bbeef} --- rt_sigreturn({mask=[]}) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f2e092bbeef} --- rt_sigreturn({mask=[]}) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f2e092bbeef} --- rt_sigreturn({mask=[]}) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f2e092bbeef} --- rt_sigreturn({mask=[]}) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f2e092bbeef} --- rt_sigreturn({mask=[]}) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f2e092bbeef} --- rt_sigreturn({mask=[]}) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f2e092bbeef} --- rt_sigreturn({mask=[]}) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f2e092bbeef} --- rt_sigreturn({mask=[]}) = 0 --- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f2e092bbeef} --- rt_sigreturn({mask=[]}) = 0 ^C--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0x7f2e092bbeef} ---
Are you polling SNMP devices ?
Also: [root@zappa cacti]# rpm -V cacti S.5....T. c /etc/cacti/db.php S.5....T. c /etc/httpd/conf.d/cacti.conf SM5....T. c /var/log/cacti/cacti.log
It looks like the PHP Script Server Shutdown is done but it fails to exit properly here: [pid 7411] write(1, "PHP Script Server Shutdown request received, exiting\\n", 54 <unfinished ...> [pid 7411] <... write resumed> ) = 54 [pid 7411] setitimer(ITIMER_PROF, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=0, tv_usec=0}}, <unfinished ...> [pid 7411] <... setitimer resumed> NULL) = 0 [pid 7411] close(2 <unfinished ...> [pid 7411] <... close resumed> ) = 0 [pid 7411] close(1 <unfinished ...> [pid 7411] <... close resumed> ) = 0 [pid 7411] close(0 <unfinished ...> [pid 7411] <... close resumed> ) = 0 [pid 7411] close(7 <unfinished ...> [pid 7411] <... close resumed> ) = 0 [pid 7408] wait4(7411, <unfinished ...> [pid 7411] setitimer(ITIMER_PROF, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=0, tv_usec=0}}, <unfinished ...> [pid 7411] <... setitimer resumed> NULL) = 0 [pid 7411] munmap(0x7f0eb7a00000, 2097152 <unfinished ...> [pid 7411] <... munmap resumed> ) = 0 [pid 7411] sendto(5, "\1\0\0\0\1", 5, MSG_DONTWAIT, NULL, 0 <unfinished ...> [pid 7411] <... sendto resumed> ) = 5 [pid 7411] close(5 <unfinished ...> [pid 7411] <... close resumed> ) = 0 [pid 7411] munmap(0x7f0eb8670000, 2126920 <unfinished ...> [pid 7411] <... munmap resumed> ) = 0 [pid 7411] munmap(0x7f0eb8878000, 2126608 <unfinished ...> [pid 7411] <... munmap resumed> ) = 0 [pid 7411] munmap(0x7f0eb8a80000, 2122912 <unfinished ...> [pid 7411] <... munmap resumed> ) = 0 [pid 7411] munmap(0x7f0eb8c87000, 2122632 <unfinished ...> [pid 7411] <... munmap resumed> ) = 0 It loops in this state taking 100% of the CPU ... since the update .
It's like if this chunk of code wouldn't exit properly : if (substr($input_string,0,4) == 'quit') { fputs(STDOUT, 'PHP Script Server Shutdown request received, exiting\n'); fflush(STDOUT); cacti_log('DEBUG: PHP Script Server Shutdown request received, exiting', false, 'PHPSVR', POLLER_VERBOSITY_DEBUG); db_close(); exit(0); }
I'm not quite sure here of what's happening but the cmd script sends "quit" to the script_server which receives it and does what it should be doing but is stuck on munmap and never exits... [pid 13266] recvfrom(4, <unfinished ...> [pid 13266] <... recvfrom resumed> "\1\0\0\1\0015\0\0\2\3def\5cacti\10settings\10settings\5value\5value\f!\0\0\30\0\0\375\1\0\0\0\0\5\0\0\3\376\0\0\2\0\5\0\0\4\376\0\0\2\0", 506, MSG_DONTWAIT, NULL, NULL) = 80 [pid 13266] write(7, "quit\r\n", 6 <unfinished ...> [pid 13270] <... read resumed> "quit\r\n", 8192) = 6 [pid 13266] <... write resumed> ) = 6 [pid 13270] setitimer(ITIMER_PROF, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=0, tv_usec=0}}, <unfinished ...> [pid 13270] <... setitimer resumed> NULL) = 0 [pid 13266] close(7 <unfinished ...> [pid 13270] close(7 <unfinished ...> [pid 13270] <... close resumed> ) = 0 [pid 13270] close(2 <unfinished ...> [pid 13270] <... close resumed> ) = 0 [pid 13270] close(1 <unfinished ...> [pid 13270] <... close resumed> ) = 0 [pid 13270] close(0 <unfinished ...> [pid 13270] <... close resumed> ) = 0 [pid 13266] <... close resumed> ) = 0 [pid 13270] setitimer(ITIMER_PROF, {it_interval={tv_sec=0, tv_usec=0}, it_value={tv_sec=0, tv_usec=0}}, <unfinished ...> [pid 13270] <... setitimer resumed> NULL) = 0 [pid 13266] close(8 <unfinished ...> [pid 13266] <... close resumed> ) = 0 [pid 13266] close(10 <unfinished ...> [pid 13270] munmap(0x7f3337000000, 2097152 <unfinished ...> [pid 13266] <... close resumed> ) = 0 [pid 13266] wait4(13270, <unfinished ...> [pid 13270] <... munmap resumed> ) = 0 [pid 13270] sendto(5, "\1\0\0\0\1", 5, MSG_DONTWAIT, NULL, 0 <unfinished ...> [pid 13270] <... sendto resumed> ) = 5 [pid 13270] close(5 <unfinished ...> [pid 13270] <... close resumed> ) = 0 [pid 13270] munmap(0x7f3337d09000, 2126952 <unfinished ...> [pid 13270] <... munmap resumed> ) = 0
Maybe the problem is with php instead of cacti ? [cacti@zappa ~]$ php <? echo allo; ?> <? echo allo; ?> Segmentation fault (core dumped)
(gdb) run Starting program: /usr/bin/php php Missing separate debuginfos, use: dnf debuginfo-install php-cli-7.1.10-1.fc27.x86_64 warning: Loadable section ".note.gnu.property" outside of ELF segments warning: Loadable section ".note.gnu.property" outside of ELF segments [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". warning: Loadable section ".note.gnu.property" outside of ELF segments warning: Loadable section ".note.gnu.property" outside of ELF segments warning: Loadable section ".note.gnu.property" outside of ELF segments Could not open input file: php Program received signal SIGSEGV, Segmentation fault. 0x00007fffe9f6fb27 in lh_strhash () from /lib64/libcrypto.so.10
I moved /lib64/libcrypto.so.10 to /lib64/libcrypto.so.10.old and retried running the script server ... this solves the problem . Now why is php crashing on libcrypto.so.10 ?
The whole stack trace looks strange too: #0 0x00007fffe9f6ff87 in lh_strhash () from /lib64/libcrypto.so.10 #1 0x00007fffe9ebca24 in obj_name_LHASH_HASH () from /lib64/libcrypto.so.10 #2 0x00007fffe9f6fefd in getrn () from /lib64/libcrypto.so.10 #3 0x00007fffe9f703fd in lh_delete () from /lib64/libcrypto.so.10 #4 0x00007fffe9ebce8a in OBJ_NAME_remove () from /lib64/libcrypto.so.10 #5 0x00007fffe9f705f1 in lh_doall () from /lib64/libcrypto.so.10 #6 0x00007fffe9ebd0a5 in OBJ_NAME_cleanup () from /lib64/libcrypto.so.10 #7 0x00007fffe9f7b3c8 in EVP_cleanup () from /lib64/libcrypto.so.10 #8 0x00007fffeacb09ae in ssh_crypto_finalize () from /lib64/libssh.so.4 #9 0x00007fffeacb2839 in ssh_finalize () from /lib64/libssh.so.4 #10 0x00007fffeb5d8b68 in zm_shutdown_curl () from /usr/lib64/php/modules/curl.so #11 0x00005555557b9647 in module_destructor () #12 0x00005555557b1dbc in module_destructor_zval () #13 0x00005555557c4ee9 in zend_hash_graceful_reverse_destroy () #14 0x00005555557b28c1 in zend_shutdown () #15 0x000055555574f9cb in php_module_shutdown () #16 0x0000555555622bd7 in main ()
Ok ... I've commented out: 20-curl.ini:; Enable curl extension module 20-curl.ini:#extension=curl.so And this solves the problem for everything. I guess this is a php issue instead of cacti. Sorry for the noise here.
[root@zappa php.d]# rpm -qf /lib64/libssh.so.4 libssh-0.7.5-4.fc28.x86_64 [root@zappa log]# rpm -qf /usr/lib64/php/modules/curl.so php-common-7.2.2~RC1-3.fc28.x86_64 [root@zappa log]# rpm -qf /lib64/libcrypto.so.10 compat-openssl10-1.0.2n-1.fc28.x86_64
> compat-openssl10-1.0.2n-1.fc28.x86_64 This package should not be used (obvious conflict when both openssl 1.0 and 1.1 are loaded in the php stack) Need to dig to find which package pull it
Can you please to remove compat-openssl10 to detect which package requires it ?
Please also run for i in /usr/lib64/php/modules/*.so do ldd $i | grep libcrypto.so.10 && echo in $i; done
Can you please test using libc-client from this scratch build https://koji.fedoraproject.org/koji/taskinfo?taskID=24568673 (this should fix buf #1540020 and thus drop compat-openssl10 from the stack)
The various packages failing in Koschei because of this segfault are now back to normal.
Removing: compat-openssl10 x86_64 1:1.0.2n-1.fc28 @rawhide-devel 3.0 M Removing dependent packages: compat-openssl10-pkcs11-helper x86_64 1.22-3.fc27 @System 145 k freerdp1.2 x86_64 1.2.0-9.fc27 @System 3.1 M gnome-python2-bonobo x86_64 2.28.1-21.fc27 @System 375 k gnome-vfs2 x86_64 2.24.4-25.fc27 @System 1.1 M gnome-vfs2-common noarch 2.24.4-25.fc27 @System 3.3 M libbonoboui x86_64 2.24.5-13.fc27 @System 1.3 M libgnome x86_64 2.32.1-15.fc27 @System 4.5 M libwvstreams x86_64 4.6.1-22.fc28 @rawhide 2.1 M nodejs x86_64 1:8.9.4-2.fc28 @rawhide 19 M nodejs-emojione-json noarch 2.2.7-4.fc27 @System 661 k npm x86_64 1:5.6.0-1.8.9.4.2.fc28 @rawhide 18 M telepathy-salut x86_64 0.8.1-12.fc27 @System 1.1 M wvdial x86_64 1.61-17.fc27 @System 246 k
[root@zappa ~]# for i in /usr/lib64/php/modules/*.so > do ldd $i | grep libcrypto.so.10 && echo in $i; > done [root@zappa ~]# Seems like nothing is ...