RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1809077 - snmpd crashes due to double-free when freeing security context
Summary: snmpd crashes due to double-free when freeing security context
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: net-snmp
Version: 8.1
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: 8.0
Assignee: Josef Ridky
QA Contact: David Jež
URL:
Whiteboard:
Depends On: 1809076
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-02 11:47 UTC by Renaud Métrich
Modified: 2022-05-01 23:39 UTC (History)
5 users (show)

Fixed In Version: net-snmp-5.8-14.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1809076
Environment:
Last Closed: 2020-04-28 16:47:25 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-39014 0 None None None 2022-05-01 23:39:18 UTC
Red Hat Product Errata RHBA-2020:1817 0 None None None 2020-04-28 16:47:43 UTC

Description Renaud Métrich 2020-03-02 11:47:15 UTC
+++ This bug was initially created as a clone of Bug #1809076 +++

Description of problem:

A customer suffers from many crashes in snmpd since he updated the package to net-snmp-5.7.2-43.el7_7.3.
The core analysis (see next comment) shows the customer is hitting https://sourceforge.net/p/net-snmp/bugs/2956/

Please backport corresponding commit 5f881d3bf24599b90d67a45cae7a3eb099cd71c9

This is a request for RHEL 8 because, from plain looking at the code, the issue should also be hit (code is exactly the same as the one for 5.7.2).


Version-Release number of selected component (if applicable):

net-snmp-5.7.2-43.el7_7.3


How reproducible:

Don't know. Often on customer site.

--- Additional comment from RHEL Program Management on 2020-03-02 11:42:13 UTC ---

Since this bug report was entered in Red Hat Bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Renaud Métrich on 2020-03-02 11:44:06 UTC ---

The coredump analysis shows the following:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# gdb /usr/sbin/snmpd ./coredump
[...]
Program terminated with signal 6, Aborted.
#0  0x00007f67cc5d0377 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
55	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007f67cc5d0377 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007f67cc5d1a68 in __GI_abort () at abort.c:90
#2  0x00007f67cc612ec7 in __libc_message (do_abort=do_abort@entry=2, 
    fmt=fmt@entry=0x7f67cc7253f8 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007f67cc61b6b9 in malloc_printerr (ar_ptr=0x7f67cc961760 <main_arena>, ptr=<optimized out>, 
    str=0x7f67cc7254b8 "double free or corruption (fasttop)", action=3) at malloc.c:4967
#4  _int_free (av=0x7f67cc961760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3843
#5  0x00007f67ce05f9ec in free_securityStateRef (pdu=<optimized out>, pdu=<optimized out>)
    at snmp_api.c:3851
#6  0x00007f67ce066b00 in snmp_free_securityStateRef (pdu=<optimized out>) at snmp_api.c:3873
#7  0x00007f67cea22ae2 in free_agent_snmp_session (asp=0x559321baf420) at snmp_agent.c:1436
#8  0x00007f67cea26084 in handle_snmp_packet (op=<optimized out>, session=<optimized out>, 
    reqid=<optimized out>, pdu=0x559321baf300, magic=<optimized out>) at snmp_agent.c:1968
#9  0x00007f67ce06e2fc in _sess_process_packet (sessp=sessp@entry=0x559321b84980, 
    sp=sp@entry=0x559321baf4b0, isp=isp@entry=0x559321baf1e0, transport=transport@entry=0x559321baf250, 
    opaque=<optimized out>, olength=<optimized out>, 
    packetptr=packetptr@entry=0x559321bb8f70 "0\201\227\002\001\003\060\021\002\004\062\376_K\002\003", 
    length=154) at snmp_api.c:5436
#10 0x00007f67ce06eb5e in _sess_read (sessp=sessp@entry=0x559321b84980, 
    fdset=fdset@entry=0x7fff05449a10) at snmp_api.c:5851
#11 0x00007f67ce06f9a9 in snmp_sess_read2 (sessp=sessp@entry=0x559321b84980, 
    fdset=fdset@entry=0x7fff05449a10) at snmp_api.c:5883
#12 0x00007f67ce06f9fb in snmp_read2 (fdset=fdset@entry=0x7fff05449a10) at snmp_api.c:5485
#13 0x00005593206518f8 in receive () at snmpd.c:1341
#14 main (argc=<optimized out>, argv=<optimized out>) at snmpd.c:1130
(gdb) 
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

We had snmpd die while freeing some internal structure. It died because of a Double Free.
From the backtrace, we had the following code executed ("HERE" tag is where the crash happened):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1419 void
1420 free_agent_snmp_session(netsnmp_agent_session *asp)
1421 {
 :
1432     /* Clean up securityStateRef here to prevent a double free */
1433     if (asp->orig_pdu && asp->orig_pdu->securityStateRef)
1434         snmp_free_securityStateRef(asp->orig_pdu);
1435     if (asp->pdu && asp->pdu->securityStateRef)
1436         snmp_free_securityStateRef(asp->pdu); <--- HERE
 :
1458 }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

The code being called, snmp_free_securityStateRef(), resets the pointer pdu->securityStateRef to NULL after freeing:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
3840 static void
3841 free_securityStateRef(netsnmp_pdu* pdu)
3842 {
 :
3848     sptr = find_sec_mod(pdu->securityModel);
3849     if (sptr) {
3850         if (sptr->pdu_free_state_ref) {
3851             (*sptr->pdu_free_state_ref) (pdu->securityStateRef); <--- HERE
3852         } else {
 :
3862     pdu->securityStateRef = NULL;
3863 } 
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

So in theory, this should work fine: this code can be called multiple times.

Unfortunately this assumes that the pointer pdu->securityStateRef is not referenced in other structures, which alas isn't the case:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) p asp->orig_pdu 
$20 = (netsnmp_pdu *) 0x559321be11c0
(gdb) p asp->pdu 
$21 = (netsnmp_pdu *) 0x559321be1060
(gdb) p pdu
$26 = (netsnmp_pdu *) 0x559321baf300

(gdb) p *asp->orig_pdu 
$24 = {version = 3, command = 161, reqid = 942092819, msgid = 855531339, transid = 20949, sessid = 0, 
  errstat = 0, errindex = 0, time = 0, flags = 4, securityModel = 3, securityLevel = 3, 
  msgParseModel = 0, transport_data = 0x559321bca040, transport_data_length = 60, 
  tDomain = 0x7f67ce2f3720 <netsnmpUDPDomain>, tDomainLen = 7, variables = 0x559321bcbb60, 
  community = 0x0, community_len = 0, enterprise = 0x0, enterprise_length = 0, trap_type = 0, 
  specific_type = 0, agent_addr = "\000\000\000", contextEngineID = 0x559321be12e0 "\200", 
  contextEngineIDLen = 17, contextName = 0x559321b838c0 "cts2", contextNameLen = 4, 
  securityEngineID = 0x559321b82b80 "\200", securityEngineIDLen = 17, 
  securityName = 0x559321be1300 "XXX", securityNameLen = 9, priority = 0, range_subid = 0, 
  securityStateRef = 0x0}
(gdb) p *asp->pdu 
$25 = {version = 3, command = 161, reqid = 942092819, msgid = 855531339, transid = 20949, sessid = 0, 
  errstat = 0, errindex = 0, time = 0, flags = 4, securityModel = 3, securityLevel = 3, 
  msgParseModel = 0, transport_data = 0x559321ae4000, transport_data_length = 60, 
  tDomain = 0x7f67ce2f3720 <netsnmpUDPDomain>, tDomainLen = 7, variables = 0x559321bcb6e0, 
  community = 0x0, community_len = 0, enterprise = 0x0, enterprise_length = 0, trap_type = 0, 
  specific_type = 0, agent_addr = "\000\000\000", contextEngineID = 0x559321bafdc0 "\200", 
  contextEngineIDLen = 17, contextName = 0x559321be1180 "cts2", contextNameLen = 4, 
  securityEngineID = 0x5593219cdc90 "\200", securityEngineIDLen = 17, 
  securityName = 0x559321be11a0 "XXX", securityNameLen = 9, priority = 0, range_subid = 0, 
  securityStateRef = 0x559321bb8be0}
(gdb) p *pdu
$28 = {version = 3, command = 161, reqid = 942092819, msgid = 855531339, transid = 20949, sessid = 0, 
  errstat = 0, errindex = 0, time = 0, flags = 4, securityModel = 3, securityLevel = 3, 
  msgParseModel = 0, transport_data = 0x559321b8e3a0, transport_data_length = 60, 
  tDomain = 0x7f67ce2f3720 <netsnmpUDPDomain>, tDomainLen = 7, variables = 0x559321be0be0, 
  community = 0x0, community_len = 0, enterprise = 0x0, enterprise_length = 0, trap_type = 0, 
  specific_type = 0, agent_addr = "\000\000\000", contextEngineID = 0x559321bd52d0 "\200", 
  contextEngineIDLen = 17, contextName = 0x559321beabb0 "cts2g\177", contextNameLen = 4, 
  securityEngineID = 0x559321bafd30 "\200", securityEngineIDLen = 17, 
  securityName = 0x559321bb8ad0 "XXX", securityNameLen = 9, priority = 0, range_subid = 0, 
  securityStateRef = 0x559321bb8be0}
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

From the variables above, taken on frame 8, we can see that "pdu", "asp->orig_pdu" and "asp->pdu" are 3 different structures,
but all refer to the same memory at location 0x559321bb8be0 as "*->securityStateRef" pointer.

This explains the crash: free_agent_snmp_session() freed the memory on line 1434 for asp->orig_pdu, and again on line 1436:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1419 void
1420 free_agent_snmp_session(netsnmp_agent_session *asp)
1421 {
 :
1432     /* Clean up securityStateRef here to prevent a double free */
1433     if (asp->orig_pdu && asp->orig_pdu->securityStateRef)
1434         snmp_free_securityStateRef(asp->orig_pdu);
1435     if (asp->pdu && asp->pdu->securityStateRef)
1436         snmp_free_securityStateRef(asp->pdu); <--- HERE
 :
1458 }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Going deeper, we have this code which handles cloning of PDUs (which happens here, "asp->orig_pdu" is a clone of "pdu" and "asp->pdu" is a clone of "asp->orig_pdu"):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1388 netsnmp_agent_session *
1389 init_agent_snmp_session(netsnmp_session * session, netsnmp_pdu *pdu)
1390 {
 :
1400     asp->pdu = snmp_clone_pdu(pdu);
1401     asp->orig_pdu = snmp_clone_pdu(pdu);
 :
1417 }



 581 netsnmp_pdu    *
 582 snmp_clone_pdu(netsnmp_pdu *pdu)
 583 {
 584     return _clone_pdu(pdu, 0);  /* copies all variables */
 585 }



 551 static
 552 netsnmp_pdu    *
 553 _clone_pdu(netsnmp_pdu *pdu, int drop_err)
 554 {
 555     netsnmp_pdu    *newpdu;
 556     newpdu = _clone_pdu_header(pdu);
 557     newpdu = _copy_pdu_vars(pdu, newpdu, drop_err, 0, 10000);   /* skip none, copy all */
 558 
 559     return newpdu;
 560 }



 347 static
 348 netsnmp_pdu    *
 349 _clone_pdu_header(netsnmp_pdu *pdu)
 350 {
 :
 394     if (pdu != NULL && pdu->securityStateRef &&
 395         pdu->command == SNMP_MSG_TRAP2) {
 396 
 397         ret = usm_clone_usmStateReference((struct usmStateReference *) pdu->securityStateRef,
 398                 (struct usmStateReference **) &newpdu->securityStateRef );
 :
 405     }
 406 
 407     if ((sptr = find_sec_mod(newpdu->securityModel)) != NULL &&
 408         sptr->pdu_clone != NULL) {
 409         /*
 410          * call security model if it needs to know about this 
 411          */
 412         (*sptr->pdu_clone) (pdu, newpdu);
 413     }
 :
 416 }


#define SNMP_MSG_GETNEXT    (ASN_CONTEXT | ASN_CONSTRUCTOR | 0x1) /* a1=161 */
#define SNMP_MSG_TRAP2      (ASN_CONTEXT | ASN_CONSTRUCTOR | 0x7) /* a7=167 */

(gdb) p asp->pdu->command
$31 = 161
(gdb) p 0x80|0x20|0x7
$32 = 167
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

On line 395 of _clone_pdu_header(), we have a clone happening of the "securityStateRef" field, but here we don't fall into this since "pdu->command" is not 167, but 161.
Later on line 412 we have some clone code again, which is probably what makes "securityStateRef" be reference 3 times in various "pdu", "asp->pdu" and "asp->orig_pdu" structures.

Browsing Upstream's code, I can see a very similar issue being reported:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
commit 5f881d3bf24599b90d67a45cae7a3eb099cd71c9
Author: Bart Van Assche <bvanassche>
Date:   Sat Jul 27 19:34:09 2019 -0700

    libsnmp, USM: Introduce a reference count in struct usmStateReference
    
    This patch fixes https://sourceforge.net/p/net-snmp/bugs/2956/.
---
 snmplib/snmp_client.c | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

The issue https://sourceforge.net/p/net-snmp/bugs/2956 has a similar backtrace:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f3a8a220067 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f3a8a220067 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f3a8a221448 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f3a8a25e1b4 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f3a8a26398e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007f3a8a264696 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007f3a8a8ced96 in free_securityStateRef (pdu=<optimized out>, pdu=<optimized out>) at snmp_api.c:4018
#6  0x00007f3a8a8d2700 in snmp_free_securityStateRef (pdu=<optimized out>) at snmp_api.c:4040
#7  0x00007f3a8b0ed0ba in free_agent_snmp_session (asp=0x1fc7480) at snmp_agent.c:1612
#8  0x00007f3a8b0f1be4 in handle_snmp_packet (op=<optimized out>, session=<optimized out>, reqid=<optimized out>, pdu=0x1fc2680, magic=<optimized out>) at snmp_agent.c:2251
#9  0x00007f3a8a8deddf in _sess_process_packet_handle_pdu (transport=0x1fb4570, pdu=0x1fc2680, isp=0x1fb4850, sp=0x1fb48e0, sessp=0x1fb4820) at snmp_api.c:5827
#10 _sess_process_packet (sessp=0x1fb4820, sp=0x1fb48e0, isp=0x1fb4850, transport=0x1fb4570, opaque=<optimized out>, olength=<optimized out>, packetptr=0x1fc7b80 "0\201\225\002\001\003\060\021\002\004b\364\240\250\002\003", length=152) at snmp_api.c:5883
#11 0x00007f3a8a8e0342 in _sess_read (sessp=0x1fb4820, fdset=0x7ffd69c89d50) at snmp_api.c:6144
#12 0x00007f3a8a8e0919 in snmp_sess_read2 (sessp=sessp@entry=0x1fb4820, fdset=fdset@entry=0x7ffd69c89d50) at snmp_api.c:6417
#13 0x00007f3a8a8e096b in snmp_read2 (fdset=0x7ffd69c89d50) at snmp_api.c:5932
#14 0x0000000000404bd0 in ?? ()
#15 0x0000000000403e75 in main ()
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Comment 14 errata-xmlrpc 2020-04-28 16:47:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1817


Note You need to log in before you can comment on or make changes to this bug.