pmcd in the new 3.8.10-1.fc19 build crashes not too long after startup on one of my x86-64 servers with a SEGV. It is monitored remotely with a pmmgr instance (thus the default pmlogconf/pmieconf configs). # gdb -args /usr/libexec/pcp/bin/pmcd -f GNU gdb (GDB) Fedora 7.6.1-46.fc19 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu". For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>... Reading symbols from /usr/libexec/pcp/bin/pmcd...Reading symbols from /usr/lib/debug/usr/libexec/pcp/bin/pmcd.debug...done. done. (gdb) run Starting program: /usr/libexec/pcp/bin/pmcd -f [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [New Thread 0x2aaaadd75700 (LWP 3931)] Detaching after fork from child process 3932. Detaching after fork from child process 3933. Detaching after fork from child process 3934. Detaching after fork from child process 3935. [New Thread 0x2aaab497a700 (LWP 3936)] Program received signal SIGSEGV, Segmentation fault. 0x00002aaaae5b53f8 in get_ordinal_fields ( fields=0x2aaaae7c2440 <icmpmsg_fields>, buffer=0x7fffffffd320 "IcmpMsg:", header=0x7fffffffcf20 "IcmpMsg:") at proc_net_snmp.c:273 273 *(fields[i].offset + inst) = strtoull(p, NULL, 10); (gdb) bt #0 0x00002aaaae5b53f8 in get_ordinal_fields ( fields=0x2aaaae7c2440 <icmpmsg_fields>, buffer=0x7fffffffd320 "IcmpMsg:", header=0x7fffffffcf20 "IcmpMsg:") at proc_net_snmp.c:273 #1 refresh_proc_net_snmp (snmp=snmp@entry=0x2aaaae7c4bc0 <_pm_proc_net_snmp>) at proc_net_snmp.c:350 #2 0x00002aaaae5aacd7 in linux_refresh (pmda=pmda@entry=0x5555557aaae0, need_refresh=0x7fffffffd7b0) at pmda.c:3156 #3 0x00002aaaae5aafe8 in linux_fetch (numpmid=1, pmidlist=0x55555582c6d0, resp=0x7fffffffd960, pmda=0x5555557aaae0) at pmda.c:4844 #4 0x0000555555560e05 in SendFetch (ctxnum=0, cPtr=0x55555582c3f0, aPtr=0x5555557a82c8, dpList=0x55555582c6b0) at dofetch.c:263 #5 DoFetch (cip=cip@entry=0x55555582c3f0, pb=0x5555557ae000) at dofetch.c:407 #6 0x000055555555aacb in HandleClientInput ( fdsPtr=fdsPtr@entry=0x7fffffffdc70) at pmcd.c:343 #7 0x000055555555af45 in ClientLoop () at pmcd.c:723 #8 0x0000555555559a6b in main (argc=2, argv=<optimized out>) at pmcd.c:974 # (some time after crash:) cat /proc/net/snmp | grep IcmpMsg IcmpMsg: InType0 InType3 OutType3 OutType8 IcmpMsg: 4 90 88 4 (gdb) l 268 if ((p = strtok(NULL, " \n")) == NULL) 269 break; 270 for (i = 0; fields[i].field; i++) { 271 if (sscanf(indices[j], fields[i].field, &inst) != 1) 272 continue; 273 *(fields[i].offset + inst) = strtoull(p, NULL, 10); 274 break; 275 } 276 } 277 } (gdb) p indices {0x7fffffffcf29 "InType0", 0x7fffffffcf31 "InType3", 0x7fffffffcf39 "InType5", 0x7fffffffcf41 "InType8", 0x7fffffffcf49 "InType11", 0x7fffffffcf52 "InType141", 0x7fffffffcf5c "OutType0", 0x7fffffffcf65 "OutType3", 0x7fffffffcf6e "OutType8", 0x7fffffffcf77 "OutType11", (gdb) p j 5 ... in other words, it was parsing "InType141". (gdb) p fields[0] $22 = {field = 0x2aaaae5b989f "InType%u", offset = 0x2aaaae7c4d30 <_pm_proc_net_snmp+368>} (gdb) p fields[1] $23 = {field = 0x2aaaae5b98a8 "OutType%u", offset = 0x2aaaae7c4db0 <_pm_proc_net_snmp+496>} (gdb) x/i $pc => 0x2aaaae5b53f8 <refresh_proc_net_snmp+1080>: mov %rax,0x0(%r13) (gdb) p $rax $25 = 2 (gdb) p/x $r13 $27 = 0x2aaaae7c5198 Note that the pointer is indeed 141 8-byte words past fields[0].offset, considerably larger than (gdb) p sizeof(_pm_proc_net_snmp.icmpmsg) $42 = 256 We need some better range checking and probably larger default limits.
The only remaining mystery is where did the string "InType141" come from? It appears to be larger than all valid ICMP types (from include/uapi/linux/icmp.h which goes up to NR_ICMP_TYPES - just 18) - and it wasn't found by the grep. I was looking further at how to reproduce, and test the fix - following on from our discussion yesterday - I can't seem to find any way to get ping(1) to set the type explicitly though (nor am I sure that would even work, but something, somehow has managed to set that bogus type it seems!).
"The only remaining mystery is where did the string "InType141" come from?" I would assume some unusual packet once arrived from the public network: that's an occupational hazard for an internet-connected box. The server's /proc/net/snmp IcmpMsg currently says: IcmpMsg: InType0 InType3 InType5 InType8 InType11 InType141 OutType0 OutType3 OutType8 OutType11 IcmpMsg: 201 624002 6 206948 1266 2 206948 578121 2338 80 so it wasn't the pmda imagining it.
Just updating BZ state - fix and qa test merged in dev branch, we're expecting this fix to release in pcp-3.8.11 for Fedora.
pcp-3.8.12-1.fc20 has been submitted as an update for Fedora 20. https://admin.fedoraproject.org/updates/pcp-3.8.12-1.fc20
pcp-3.8.12-1.fc19 has been submitted as an update for Fedora 19. https://admin.fedoraproject.org/updates/pcp-3.8.12-1.fc19
pcp-3.8.12-1.el6 has been submitted as an update for Fedora EPEL 6. https://admin.fedoraproject.org/updates/pcp-3.8.12-1.el6
pcp-3.8.12-1.el5 has been submitted as an update for Fedora EPEL 5. https://admin.fedoraproject.org/updates/pcp-3.8.12-1.el5
Package pcp-3.8.12-1.el6: * should fix your issue, * was pushed to the Fedora EPEL 6 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=epel-testing pcp-3.8.12-1.el6' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-EPEL-2014-0396/pcp-3.8.12-1.el6 then log in and leave karma (feedback).
pcp-3.8.12-1.fc19 has been pushed to the Fedora 19 stable repository. If problems still persist, please make note of it in this bug report.
pcp-3.8.12-1.fc20 has been pushed to the Fedora 20 stable repository. If problems still persist, please make note of it in this bug report.
pcp-3.8.12-1.el5 has been pushed to the Fedora EPEL 5 stable repository. If problems still persist, please make note of it in this bug report.
pcp-3.8.12-1.el6 has been pushed to the Fedora EPEL 6 stable repository. If problems still persist, please make note of it in this bug report.