Bug 672595

Summary: Segfault when starting snmpd
Product: Red Hat Enterprise Linux 6 Reporter: Erinn Looney-Triggs <erinn.looneytriggs>
Component: net-snmpAssignee: Jan Safranek <jsafrane>
Status: CLOSED ERRATA QA Contact: BaseOS QE Security Team <qe-baseos-security>
Severity: high Docs Contact:
Priority: high    
Version: 6.0CC: jwest, ovasik, pingale, spoyarek
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: net-snmp-5.5-30.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1140236 (view as bug list) Environment:
Last Closed: 2011-05-19 14:13:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1140236    
Attachments:
Description Flags
snmpd.conf, standard file all custom changes are at the very end.
none
coredump
none
Java stack trace
none
full stack trace
none
tcpdump of smux
none
Debug output
none
Test program that takes smux-send.raw as input
none
smux requests from the first tcp stream
none
NULL the tail of the smux registration list none

Description Erinn Looney-Triggs 2011-01-25 17:09:15 UTC
Created attachment 475211 [details]
snmpd.conf, standard file all custom changes are at the very end.

Description of problem:
I am regularly getting segfaults when trying to start snmpd on two separate systems. 

Version-Release number of selected component (if applicable):

net-snmp-5.5-27.el6_0.1.x86_64

How reproducible:
Near as I can tell install and run snmpd, trying to start it will sometimes segfault, after multiple retries it will start. About the only thing special that I know of in my config is that I am actually using snmpv3. I will attach the config file.

Actual results:
snmpd[2754]: segfault at 2380 ip 00007f063591e8d8 sp 00007fff1fae2e40 error 4 in libnetsnmpagent.so.20.0.0[7f06358fb000+47000]
snmpd[4295]: segfault at 40c ip 00007f2e94daf8d8 sp 00007fff1fe79a70 error 4 in libnetsnmpagent.so.20.0.0[7f2e94d8c000+47000]
snmpd[2764]: segfault at 401 ip 00007ffa5f2448d8 sp 00007fff2c0e1030 error 4 in libnetsnmpagent.so.20.0.0[7ffa5f221000+47000]

Expected results:

Start up and run. 

I will file a support request as well.

Comment 2 Jan Safranek 2011-01-26 08:39:12 UTC
I can't reproduce the crash, snmpd works well with your config file. It's not possible to get any useful data from the stack trace either. Please talk to the support guys how to install a debuginfo package and please provide full stack trace with full function names and line numbers, thanks in advance.

Comment 4 Erinn Looney-Triggs 2011-02-08 19:31:24 UTC
Ok trying, but frankly the support folks aren't being great they pointed me to outdated documentation for setting up a core dump. 

I installed the debuginfo packages, and I am trying to muddle my way through the fedora page for getting stack traces so I will get back to you. After installing debuginfo packages the output changed slightly for segfault:

snmpd[4269] general protection ip:7f20fc10c8d8 sp:7fff1132a980 error:0 in libnetsnmpagent.so.20.0.0[7f20fc0e9000+47000]

Will try to get you more information. 

-Erinn

Comment 5 Erinn Looney-Triggs 2011-02-08 21:36:41 UTC
Looks like abrt was grabbing a coredump all along. Let me know if this is useful to you. As is probably clear I am not too familiar with coredumps, how to gather them nor how to use them.

-Erinn

Comment 6 Erinn Looney-Triggs 2011-02-08 21:45:09 UTC
Created attachment 477695 [details]
coredump

Comment 7 Erinn Looney-Triggs 2011-02-08 21:54:57 UTC
It looks to me like this may be an interaction between net-snmp and the dell openmanage tools. The smux peer part looks a bit cagey (if I understand anything from this coredump), are you able to test on a rhel 6 x64 server with openmanage installed and configured for snmp queries?

It also looks like I am getting a coredump on the openmanage end for the smux part, I am not sure which side of the line this will fall on, as in red hat's problem or Dell's but there seems to be something there. Will attach dell coredump, yes I know you don't support it etc.

-Erinn

Comment 8 Erinn Looney-Triggs 2011-02-08 21:58:49 UTC
Created attachment 477697 [details]
Java stack trace

Comment 9 Jan Safranek 2011-02-09 15:56:30 UTC
Created attachment 477847 [details]
full stack trace

Stack trace:

#0  smux_rreq_process (sd=10, ptr=0x7fff2c0e154a "", len=0x7fff2c0e14c0)
    at mibgroup/smux/smux.c:1106
#1  0x00007ffa5f247a1d in smux_pdu_process (fd=10, data=0x7fff2c0e1530 "b\202", length=26)
    at mibgroup/smux/smux.c:768
#2  0x00007ffa5f2480dd in smux_process (fd=10) at mibgroup/smux/smux.c:733
#3  0x00007ffa5f68ff7a in receive (argc=<value optimized out>, argv=<value optimized out>)
    at snmpd.c:1205
#4  main (argc=<value optimized out>, argv=<value optimized out>) at snmpd.c:1060

Comment 10 Jan Safranek 2011-02-09 16:23:13 UTC
From the stack trace, I can see it's smux related and that local list of active registrations gets corrupted, but I don't see why.

So, some additional debugging is needed. I hope the crash could be reproduced with these conditions:

1) stop your snmpd service

2) start capturing SMUX and/or SNMP traffic, e.g. by tcpdump:
   $ tcpdump -i any -s 0 -w smux.pcap "port 199 or port 161"

3) starts snmpd in a special way to print SMUX verbose logging:
   $ snmpd -f -Lo -Dsmux >snmpd.log
   (you should see messages like "smux_init: [smux_init] done; smux listen sd is 8, smux port is 199" in the log)

4) now snmpd should crash, maybe you need to restart openmanage or so... If it does not crash, try again from 1), I just need smux.pcap and snmpd.log when it crashes.

Thanks in advance!

Comment 11 Erinn Looney-Triggs 2011-02-09 17:36:59 UTC
Created attachment 477869 [details]
tcpdump of smux

Ask and ye shall receive :). There are two start ups in the pcap because the first one worked, and the second one crashed.

Comment 12 Erinn Looney-Triggs 2011-02-09 17:37:29 UTC
Created attachment 477870 [details]
Debug output

Comment 14 Erinn Looney-Triggs 2011-02-11 18:18:35 UTC
Can you give me a general idea of what the bug is? Just out of personal interest.

Thanks,
-Erinn

Comment 15 Siddhesh Poyarekar 2011-02-11 21:11:39 UTC
Created attachment 478311 [details]
Test program that takes smux-send.raw as input

Comment 16 Siddhesh Poyarekar 2011-02-11 21:14:37 UTC
Created attachment 478313 [details]
smux requests from the first tcp stream

Attached test program and input requests that can reproduce this issue.

$ gcc -o smux-hack smux-hack.c
$ ./smux-hack smux-send.raw

when snmpd is running. A couple of runs should bring snmpd down.

Comment 18 Siddhesh Poyarekar 2011-02-11 21:43:31 UTC
Created attachment 478319 [details]
NULL the tail of the smux registration list

This is upstream r17904.

Comment 19 Jan Safranek 2011-02-14 13:32:30 UTC
Siddhesh, thanks a lot for your effort! It saves me lot of work, I wish all bugs from support got this attention!

I confirm it's snmpd fault, the list of registrations is really broken.

Comment 24 Jan Safranek 2011-05-11 10:56:41 UTC
*** Bug 676537 has been marked as a duplicate of this bug. ***

Comment 25 errata-xmlrpc 2011-05-19 14:13:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-0729.html