Bug 1055818 - pmcd SEGV in linux pmda
Summary: pmcd SEGV in linux pmda
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: pcp
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nathan Scott
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Keywords:
Depends On:
Blocks: 1055826
TreeView+ depends on / blocked
 
Reported: 2014-01-21 01:55 UTC by Frank Ch. Eigler
Modified: 2014-02-16 11:20 UTC (History)
5 users (show)

(edit)
Clone Of:
: 1055826 (view as bug list)
(edit)
Last Closed: 2014-02-07 03:06:06 UTC


Attachments (Terms of Use)

Description Frank Ch. Eigler 2014-01-21 01:55:21 UTC
pmcd in the new 3.8.10-1.fc19 build crashes not too long after startup
on one of my x86-64 servers with a SEGV.  It is monitored remotely with
a pmmgr instance (thus the default pmlogconf/pmieconf configs).

# gdb -args /usr/libexec/pcp/bin/pmcd -f
GNU gdb (GDB) Fedora 7.6.1-46.fc19
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/libexec/pcp/bin/pmcd...Reading symbols from /usr/lib/debug/usr/libexec/pcp/bin/pmcd.debug...done.
done.
(gdb) run
Starting program: /usr/libexec/pcp/bin/pmcd -f
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x2aaaadd75700 (LWP 3931)]
Detaching after fork from child process 3932.
Detaching after fork from child process 3933.
Detaching after fork from child process 3934.
Detaching after fork from child process 3935.
[New Thread 0x2aaab497a700 (LWP 3936)]

Program received signal SIGSEGV, Segmentation fault.
0x00002aaaae5b53f8 in get_ordinal_fields (
    fields=0x2aaaae7c2440 <icmpmsg_fields>, buffer=0x7fffffffd320 "IcmpMsg:", 
    header=0x7fffffffcf20 "IcmpMsg:") at proc_net_snmp.c:273
273	            *(fields[i].offset + inst) = strtoull(p, NULL, 10);

(gdb) bt
#0  0x00002aaaae5b53f8 in get_ordinal_fields (
    fields=0x2aaaae7c2440 <icmpmsg_fields>, buffer=0x7fffffffd320 "IcmpMsg:", 
    header=0x7fffffffcf20 "IcmpMsg:") at proc_net_snmp.c:273
#1  refresh_proc_net_snmp (snmp=snmp@entry=0x2aaaae7c4bc0 <_pm_proc_net_snmp>)
    at proc_net_snmp.c:350
#2  0x00002aaaae5aacd7 in linux_refresh (pmda=pmda@entry=0x5555557aaae0, 
    need_refresh=0x7fffffffd7b0) at pmda.c:3156
#3  0x00002aaaae5aafe8 in linux_fetch (numpmid=1, pmidlist=0x55555582c6d0, 
    resp=0x7fffffffd960, pmda=0x5555557aaae0) at pmda.c:4844
#4  0x0000555555560e05 in SendFetch (ctxnum=0, cPtr=0x55555582c3f0, 
    aPtr=0x5555557a82c8, dpList=0x55555582c6b0) at dofetch.c:263
#5  DoFetch (cip=cip@entry=0x55555582c3f0, pb=0x5555557ae000) at dofetch.c:407
#6  0x000055555555aacb in HandleClientInput (
    fdsPtr=fdsPtr@entry=0x7fffffffdc70) at pmcd.c:343
#7  0x000055555555af45 in ClientLoop () at pmcd.c:723
#8  0x0000555555559a6b in main (argc=2, argv=<optimized out>) at pmcd.c:974

# (some time after crash:)  cat /proc/net/snmp | grep IcmpMsg

IcmpMsg: InType0 InType3 OutType3 OutType8
IcmpMsg: 4 90 88 4

(gdb) l
268	        if ((p = strtok(NULL, " \n")) == NULL)
269	            break;
270	        for (i = 0; fields[i].field; i++) {
271	            if (sscanf(indices[j], fields[i].field, &inst) != 1)
272	                continue;
273	            *(fields[i].offset + inst) = strtoull(p, NULL, 10);
274	            break;
275		}
276	    }
277	}

(gdb) p indices
{0x7fffffffcf29 "InType0", 0x7fffffffcf31 "InType3", 
  0x7fffffffcf39 "InType5", 0x7fffffffcf41 "InType8", 
  0x7fffffffcf49 "InType11", 0x7fffffffcf52 "InType141", 
  0x7fffffffcf5c "OutType0", 0x7fffffffcf65 "OutType3", 
  0x7fffffffcf6e "OutType8", 0x7fffffffcf77 "OutType11", 

(gdb) p j
5

... in other words, it was parsing "InType141".

(gdb) p fields[0]
$22 = {field = 0x2aaaae5b989f "InType%u", 
  offset = 0x2aaaae7c4d30 <_pm_proc_net_snmp+368>}
(gdb) p fields[1] 
$23 = {field = 0x2aaaae5b98a8 "OutType%u", 
  offset = 0x2aaaae7c4db0 <_pm_proc_net_snmp+496>}

(gdb) x/i $pc
=> 0x2aaaae5b53f8 <refresh_proc_net_snmp+1080>:	mov    %rax,0x0(%r13)
(gdb) p $rax
$25 = 2
(gdb) p/x $r13
$27 = 0x2aaaae7c5198

Note that the pointer is indeed 141 8-byte words past fields[0].offset,
considerably larger than

(gdb) p sizeof(_pm_proc_net_snmp.icmpmsg)
$42 = 256

We need some better range checking and probably larger default limits.

Comment 1 Nathan Scott 2014-01-21 22:41:03 UTC
The only remaining mystery is where did the string "InType141" come from?  It appears to be larger than all valid ICMP types (from include/uapi/linux/icmp.h which goes up to NR_ICMP_TYPES - just 18) - and it wasn't found by the grep.

I was looking further at how to reproduce, and test the fix - following on from our discussion yesterday - I can't seem to find any way to get ping(1) to set the type explicitly though (nor am I sure that would even work, but something, somehow has managed to set that bogus type it seems!).

Comment 2 Frank Ch. Eigler 2014-01-22 03:03:01 UTC
"The only remaining mystery is where did the string "InType141" come from?"

I would assume some unusual packet once arrived from the public network:
that's an occupational hazard for an internet-connected box.  The server's
/proc/net/snmp IcmpMsg currently says:

IcmpMsg: InType0 InType3 InType5 InType8 InType11 InType141 OutType0 OutType3 OutType8 OutType11
IcmpMsg: 201 624002 6 206948 1266 2 206948 578121 2338 80

so it wasn't the pmda imagining it.

Comment 3 Nathan Scott 2014-01-22 21:53:42 UTC
Just updating BZ state - fix and qa test merged in dev branch, we're expecting this fix to release in pcp-3.8.11 for Fedora.

Comment 4 Fedora Update System 2014-01-29 06:31:47 UTC
pcp-3.8.12-1.fc20 has been submitted as an update for Fedora 20.
https://admin.fedoraproject.org/updates/pcp-3.8.12-1.fc20

Comment 5 Fedora Update System 2014-01-29 06:32:39 UTC
pcp-3.8.12-1.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/pcp-3.8.12-1.fc19

Comment 6 Fedora Update System 2014-01-29 06:34:02 UTC
pcp-3.8.12-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/pcp-3.8.12-1.el6

Comment 7 Fedora Update System 2014-01-29 06:34:38 UTC
pcp-3.8.12-1.el5 has been submitted as an update for Fedora EPEL 5.
https://admin.fedoraproject.org/updates/pcp-3.8.12-1.el5

Comment 8 Fedora Update System 2014-01-29 21:23:53 UTC
Package pcp-3.8.12-1.el6:
* should fix your issue,
* was pushed to the Fedora EPEL 6 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=epel-testing pcp-3.8.12-1.el6'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-EPEL-2014-0396/pcp-3.8.12-1.el6
then log in and leave karma (feedback).

Comment 9 Fedora Update System 2014-02-07 03:06:06 UTC
pcp-3.8.12-1.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 10 Fedora Update System 2014-02-07 03:14:16 UTC
pcp-3.8.12-1.fc20 has been pushed to the Fedora 20 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 11 Fedora Update System 2014-02-16 11:17:42 UTC
pcp-3.8.12-1.el5 has been pushed to the Fedora EPEL 5 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 12 Fedora Update System 2014-02-16 11:20:47 UTC
pcp-3.8.12-1.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.