Bug 729738
Summary: | net-snmp dumps core in netsnmp_oid_find_prefix | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Martin Wilck <martin.wilck> | ||||||||
Component: | net-snmp | Assignee: | Jan Safranek <jsafrane> | ||||||||
Status: | CLOSED ERRATA | QA Contact: | BaseOS QE Security Team <qe-baseos-security> | ||||||||
Severity: | medium | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 6.0 | CC: | gasmith, josef.moellers, ksrot, rvokal | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | 6.2 | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
URL: | http://sourceforge.net/tracker/index.php?func=detail&aid=1633670&group_id=12694&atid=112694 | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: |
When AgentX subagent was being disconnected from snmpd daemon, the daemon did not properly detach all outstanding SNMP requests from internal session object representing the AgentX subagent. Therefore, the snmpd daemon could crash when processing these requests. With this update, the snmpd daemon ensures that all outstaning SNMP requests do not point to AgentX sesion which is being closed.
|
Story Points: | --- | ||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2011-12-06 17:12:16 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 696653 | ||||||||||
Attachments: |
|
Description
Martin Wilck
2011-08-10 17:37:26 UTC
Created attachment 517664 [details]
core dump + infos
From the coredump I can only see that tree cache becomes corrupted, without any indication why. snmpd crashed when processing GETNEXT request for OID 1.3.6.1.4.1.231.2.10.2.2.1, while some AgentX subagent was being disconnected. I cannot find if the disconnected subagent was involved in handling of the 1.3.6.1.4.1.231.2.10.2.2.1 OID or not... I suppose you cannot share details about how do you use AgentX? How many subagents do you have, how often do they disconnect/reconnect? With the information above (GETNEXT && a subagent being disconnected), can you reproduce the bug in a more reliable way? I'll try to investigate the crash further in parallel, but without your subagent(s), I am mostly blind. Created attachment 517981 [details]
SRVMAGT-BIOS MIB
iso.org.dod.internet.private = 1.3.6.1.4
enterprises.sni.sniProductMibs.sniExtensions.sniServerMgmt.sniCommon.sniBios
1 231.2 .10 .2 .2 .1
sniBiosVersionMajor OBJECT-TYPE
SYNTAX INTEGER
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Major Version of the BIOS"
::= { sniBios 1 }
sniBiosVersionMinor OBJECT-TYPE
SYNTAX INTEGER
ACCESS read-only
STATUS mandatory
DESCRIPTION
"Minor Version of the BIOS"
::= { sniBios 2 }
sniBiosDiagnosticStatus OBJECT-TYPE
SYNTAX INTEGER
ACCESS read-only
STATUS mandatory
DESCRIPTION
"A bit field:
BIT MEANING
0 Timeout reading an adapter ID (eisa)
1 Adapter do not match configuration(eisa)
2 CMOS RAM time found invalid
3 Fixed disk/adapter fails initialization
4 Memory size compare error at POST
5 Invalid configuration information found at POST
6 CMOS RAM checksum is bad
7 Real-time clock lost power"
::= { sniBios 3 }
Created attachment 517983 [details]
serverview 5.10.22
Here are the binaries of our agents, including all MIBS, docs, snmp configuration etc. They will probably only work on a PRIMERGY. Gary should be able to provide access to one (and install the agents) if needed.
Here is a statement from our agent developer: "These OIDs are served by our BIOS agent. I can't imagine why a problem should occur with these OIDs, this is more likely to be related to the internal processing of net-snmpd. Of all our agents, the BIOS agent is the one which has least to do." "There are the following ServerView subagents: sc sc2 bus hd unix ether bios secur status inv thr vv hpsim vme. The process name is the agent name + "agt", e.g. scagt, busagt, etc." "These subagents will register with snmpd when they start and unregister when they are stopped. However it happens sometimes that the AgentX communication is interrupted and must be reestablished. We see that once in a while in our traces." "The question 'I suppose you cannot share details about how do you use AgentX?' can't be answered easily because this code is very ancient and it's not exactly clear what the question is targeted at." Some more information from my side: most of our agents don't procure the information for net-snmp directly. Rather, they communicate with a separate daemon (eecd) which collects the data. With dummy AgentX subagent which disconnects during first GETNEXT query and reconnects (+ lot of GETNEXT requests), I was able to get sigsegv once. Valgrind tells me: ==4052== at 0x4E40749: netsnmp_remove_delegated_requests_for_session (in /usr/lib64/libnetsnmpagent.so.20.0.0) ==4052== by 0x4E60FB7: close_agentx_session (in /usr/lib64/libnetsnmpagent.so.20.0.0) ==4052== by 0x4E6156B: handle_master_agentx_packet (in /usr/lib64/libnetsnmpagent.so.20.0.0) ==4052== by 0x6CF466E: _sess_read (in /usr/lib64/libnetsnmp.so.20.0.0) ==4052== by 0x6CF5048: snmp_sess_read2 (in /usr/lib64/libnetsnmp.so.20.0.0) ==4052== by 0x6CF510A: snmp_read2 (in /usr/lib64/libnetsnmp.so.20.0.0) ==4052== by 0x10CFDD: main (in /usr/sbin/snmpd) ==4052== Address 0xbca8218 is 72 bytes inside a block of size 152 free'd ==4052== at 0x4C2695D: free (vg_replace_malloc.c:366) ==4052== by 0x4E43FCE: unregister_mibs_by_session (in /usr/lib64/libnetsnmpagent.so.20.0.0) ==4052== by 0x4E60EA7: close_agentx_session (in /usr/lib64/libnetsnmpagent.so.20.0.0) ==4052== by 0x4E617EB: handle_master_agentx_packet (in /usr/lib64/libnetsnmpagent.so.20.0.0) ==4052== by 0x6CF3867: ??? (in /usr/lib64/libnetsnmp.so.20.0.0) ==4052== by 0x6CF47A1: _sess_read (in /usr/lib64/libnetsnmp.so.20.0.0) ==4052== by 0x6CF5048: snmp_sess_read2 (in /usr/lib64/libnetsnmp.so.20.0.0) ==4052== by 0x6CF510A: snmp_read2 (in /usr/lib64/libnetsnmp.so.20.0.0) ==4052== by 0x10CFDD: main (in /usr/sbin/snmpd) But I still cannot reproduce it reliably. If you can give me instructions how to run the instrumented SNMP daemon we can try to reproduce the problem here again. I have asked QA to reproduce it with 6.1 first, because this one was originally reported for 6.0. There is no indication in the change logs though that it's fixed in 6.1. I've uploaded reproducer to upstream bug tracker, https://sourceforge.net/tracker/index.php?func=detail&aid=1633670&group_id=12694&atid=112694 I have successfully crashed net-snmp-5.7, upstream trunk and also RHEL 6.2 build I made today for RHEL 6.2 errata, so the bug is reproducible everywhere, probably incl. RHEL 6.1. That looks promising, thanks a lot for digging into this problem. I sent a fix to upstream bug tracker, it's not perfect, but at least snmpd does not crash. Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: When AgentX subagent was being disconnected from snmpd daemon, the daemon did not properly detach all outstanding SNMP requests from internal session object representing the AgentX subagent. Therefore, the snmpd daemon could crash when processing these requests. With this update, the snmpd daemon ensures that all outstaning SNMP requests do not point to AgentX sesion which is being closed. (In reply to comment #9) > I've uploaded reproducer to upstream bug tracker, > https://sourceforge.net/tracker/index.php?func=detail&aid=1633670&group_id=12694&atid=112694 Link has moved to https://sourceforge.net/tracker/?func=detail&aid=1633670&group_id=12694&atid=312694 Qa found out that if the AgentX subagent disconnects while processing a request, the request then leaks a bit of memory in the master snmpd (approx 44 bytes per such request). Valgrind report: ==8326== at 0x4A04A28: calloc (vg_replace_malloc.c:467) ==8326== by 0x4C33E6A: netsnmp_create_delegated_cache (agent_handler.c:713) ==8326== by 0x4C36BC9: agentx_master_handler (master.c:591) ==8326== by 0x4C3642E: netsnmp_call_handlers (agent_handler.c:440) ==8326== by 0x4C26710: handle_var_requests (snmp_agent.c:2611) ==8326== by 0x4C28395: handle_pdu (snmp_agent.c:3407) ==8326== by 0x4C2A7EF: netsnmp_handle_request (snmp_agent.c:3203) ==8326== by 0x4C2B2A9: handle_snmp_packet (snmp_agent.c:1929) ==8326== by 0x6AD6867: _sess_process_packet (snmp_api.c:5604) ==8326== by 0x6AD71FF: _sess_read (snmp_api.c:6043) ==8326== by 0x6AD8048: snmp_sess_read2 (snmp_api.c:6075) ==8326== by 0x6AD810A: snmp_read2 (snmp_api.c:5667) I assume the AgentX subagents disconnect very rarely and this memory leak happens only in very exceptional case, so I left the leak there for now, while working on it upstream. Please reopen the bug if you'r AgentX disconnects often so the leak might matter. I have filed a new bug 736580 for the memory leak. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1524.html |