Bug 1809077

Summary: snmpd crashes due to double-free when freeing security context
Product: Red Hat Enterprise Linux 8 Reporter: Renaud Métrich <rmetrich>
Component: net-snmpAssignee: Josef Ridky <jridky>
Status: CLOSED ERRATA QA Contact: David Jež <djez>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.1CC: djez, jridky, lmiksik, ovasik, qe-baseos-apps
Target Milestone: rc   
Target Release: 8.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: net-snmp-5.8-14.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1809076 Environment:
Last Closed: 2020-04-28 16:47:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1809076    
Bug Blocks:    

Description Renaud Métrich 2020-03-02 11:47:15 UTC
+++ This bug was initially created as a clone of Bug #1809076 +++

Description of problem:

A customer suffers from many crashes in snmpd since he updated the package to net-snmp-5.7.2-43.el7_7.3.
The core analysis (see next comment) shows the customer is hitting https://sourceforge.net/p/net-snmp/bugs/2956/

Please backport corresponding commit 5f881d3bf24599b90d67a45cae7a3eb099cd71c9

This is a request for RHEL 8 because, from plain looking at the code, the issue should also be hit (code is exactly the same as the one for 5.7.2).


Version-Release number of selected component (if applicable):

net-snmp-5.7.2-43.el7_7.3


How reproducible:

Don't know. Often on customer site.

--- Additional comment from RHEL Program Management on 2020-03-02 11:42:13 UTC ---

Since this bug report was entered in Red Hat Bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Renaud Métrich on 2020-03-02 11:44:06 UTC ---

The coredump analysis shows the following:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# gdb /usr/sbin/snmpd ./coredump
[...]
Program terminated with signal 6, Aborted.
#0  0x00007f67cc5d0377 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
55	  return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0  0x00007f67cc5d0377 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007f67cc5d1a68 in __GI_abort () at abort.c:90
#2  0x00007f67cc612ec7 in __libc_message (do_abort=do_abort@entry=2, 
    fmt=fmt@entry=0x7f67cc7253f8 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007f67cc61b6b9 in malloc_printerr (ar_ptr=0x7f67cc961760 <main_arena>, ptr=<optimized out>, 
    str=0x7f67cc7254b8 "double free or corruption (fasttop)", action=3) at malloc.c:4967
#4  _int_free (av=0x7f67cc961760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3843
#5  0x00007f67ce05f9ec in free_securityStateRef (pdu=<optimized out>, pdu=<optimized out>)
    at snmp_api.c:3851
#6  0x00007f67ce066b00 in snmp_free_securityStateRef (pdu=<optimized out>) at snmp_api.c:3873
#7  0x00007f67cea22ae2 in free_agent_snmp_session (asp=0x559321baf420) at snmp_agent.c:1436
#8  0x00007f67cea26084 in handle_snmp_packet (op=<optimized out>, session=<optimized out>, 
    reqid=<optimized out>, pdu=0x559321baf300, magic=<optimized out>) at snmp_agent.c:1968
#9  0x00007f67ce06e2fc in _sess_process_packet (sessp=sessp@entry=0x559321b84980, 
    sp=sp@entry=0x559321baf4b0, isp=isp@entry=0x559321baf1e0, transport=transport@entry=0x559321baf250, 
    opaque=<optimized out>, olength=<optimized out>, 
    packetptr=packetptr@entry=0x559321bb8f70 "0\201\227\002\001\003\060\021\002\004\062\376_K\002\003", 
    length=154) at snmp_api.c:5436
#10 0x00007f67ce06eb5e in _sess_read (sessp=sessp@entry=0x559321b84980, 
    fdset=fdset@entry=0x7fff05449a10) at snmp_api.c:5851
#11 0x00007f67ce06f9a9 in snmp_sess_read2 (sessp=sessp@entry=0x559321b84980, 
    fdset=fdset@entry=0x7fff05449a10) at snmp_api.c:5883
#12 0x00007f67ce06f9fb in snmp_read2 (fdset=fdset@entry=0x7fff05449a10) at snmp_api.c:5485
#13 0x00005593206518f8 in receive () at snmpd.c:1341
#14 main (argc=<optimized out>, argv=<optimized out>) at snmpd.c:1130
(gdb) 
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

We had snmpd die while freeing some internal structure. It died because of a Double Free.
From the backtrace, we had the following code executed ("HERE" tag is where the crash happened):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1419 void
1420 free_agent_snmp_session(netsnmp_agent_session *asp)
1421 {
 :
1432     /* Clean up securityStateRef here to prevent a double free */
1433     if (asp->orig_pdu && asp->orig_pdu->securityStateRef)
1434         snmp_free_securityStateRef(asp->orig_pdu);
1435     if (asp->pdu && asp->pdu->securityStateRef)
1436         snmp_free_securityStateRef(asp->pdu); <--- HERE
 :
1458 }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

The code being called, snmp_free_securityStateRef(), resets the pointer pdu->securityStateRef to NULL after freeing:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
3840 static void
3841 free_securityStateRef(netsnmp_pdu* pdu)
3842 {
 :
3848     sptr = find_sec_mod(pdu->securityModel);
3849     if (sptr) {
3850         if (sptr->pdu_free_state_ref) {
3851             (*sptr->pdu_free_state_ref) (pdu->securityStateRef); <--- HERE
3852         } else {
 :
3862     pdu->securityStateRef = NULL;
3863 } 
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

So in theory, this should work fine: this code can be called multiple times.

Unfortunately this assumes that the pointer pdu->securityStateRef is not referenced in other structures, which alas isn't the case:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
(gdb) p asp->orig_pdu 
$20 = (netsnmp_pdu *) 0x559321be11c0
(gdb) p asp->pdu 
$21 = (netsnmp_pdu *) 0x559321be1060
(gdb) p pdu
$26 = (netsnmp_pdu *) 0x559321baf300

(gdb) p *asp->orig_pdu 
$24 = {version = 3, command = 161, reqid = 942092819, msgid = 855531339, transid = 20949, sessid = 0, 
  errstat = 0, errindex = 0, time = 0, flags = 4, securityModel = 3, securityLevel = 3, 
  msgParseModel = 0, transport_data = 0x559321bca040, transport_data_length = 60, 
  tDomain = 0x7f67ce2f3720 <netsnmpUDPDomain>, tDomainLen = 7, variables = 0x559321bcbb60, 
  community = 0x0, community_len = 0, enterprise = 0x0, enterprise_length = 0, trap_type = 0, 
  specific_type = 0, agent_addr = "\000\000\000", contextEngineID = 0x559321be12e0 "\200", 
  contextEngineIDLen = 17, contextName = 0x559321b838c0 "cts2", contextNameLen = 4, 
  securityEngineID = 0x559321b82b80 "\200", securityEngineIDLen = 17, 
  securityName = 0x559321be1300 "XXX", securityNameLen = 9, priority = 0, range_subid = 0, 
  securityStateRef = 0x0}
(gdb) p *asp->pdu 
$25 = {version = 3, command = 161, reqid = 942092819, msgid = 855531339, transid = 20949, sessid = 0, 
  errstat = 0, errindex = 0, time = 0, flags = 4, securityModel = 3, securityLevel = 3, 
  msgParseModel = 0, transport_data = 0x559321ae4000, transport_data_length = 60, 
  tDomain = 0x7f67ce2f3720 <netsnmpUDPDomain>, tDomainLen = 7, variables = 0x559321bcb6e0, 
  community = 0x0, community_len = 0, enterprise = 0x0, enterprise_length = 0, trap_type = 0, 
  specific_type = 0, agent_addr = "\000\000\000", contextEngineID = 0x559321bafdc0 "\200", 
  contextEngineIDLen = 17, contextName = 0x559321be1180 "cts2", contextNameLen = 4, 
  securityEngineID = 0x5593219cdc90 "\200", securityEngineIDLen = 17, 
  securityName = 0x559321be11a0 "XXX", securityNameLen = 9, priority = 0, range_subid = 0, 
  securityStateRef = 0x559321bb8be0}
(gdb) p *pdu
$28 = {version = 3, command = 161, reqid = 942092819, msgid = 855531339, transid = 20949, sessid = 0, 
  errstat = 0, errindex = 0, time = 0, flags = 4, securityModel = 3, securityLevel = 3, 
  msgParseModel = 0, transport_data = 0x559321b8e3a0, transport_data_length = 60, 
  tDomain = 0x7f67ce2f3720 <netsnmpUDPDomain>, tDomainLen = 7, variables = 0x559321be0be0, 
  community = 0x0, community_len = 0, enterprise = 0x0, enterprise_length = 0, trap_type = 0, 
  specific_type = 0, agent_addr = "\000\000\000", contextEngineID = 0x559321bd52d0 "\200", 
  contextEngineIDLen = 17, contextName = 0x559321beabb0 "cts2g\177", contextNameLen = 4, 
  securityEngineID = 0x559321bafd30 "\200", securityEngineIDLen = 17, 
  securityName = 0x559321bb8ad0 "XXX", securityNameLen = 9, priority = 0, range_subid = 0, 
  securityStateRef = 0x559321bb8be0}
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

From the variables above, taken on frame 8, we can see that "pdu", "asp->orig_pdu" and "asp->pdu" are 3 different structures,
but all refer to the same memory at location 0x559321bb8be0 as "*->securityStateRef" pointer.

This explains the crash: free_agent_snmp_session() freed the memory on line 1434 for asp->orig_pdu, and again on line 1436:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1419 void
1420 free_agent_snmp_session(netsnmp_agent_session *asp)
1421 {
 :
1432     /* Clean up securityStateRef here to prevent a double free */
1433     if (asp->orig_pdu && asp->orig_pdu->securityStateRef)
1434         snmp_free_securityStateRef(asp->orig_pdu);
1435     if (asp->pdu && asp->pdu->securityStateRef)
1436         snmp_free_securityStateRef(asp->pdu); <--- HERE
 :
1458 }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Going deeper, we have this code which handles cloning of PDUs (which happens here, "asp->orig_pdu" is a clone of "pdu" and "asp->pdu" is a clone of "asp->orig_pdu"):
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
1388 netsnmp_agent_session *
1389 init_agent_snmp_session(netsnmp_session * session, netsnmp_pdu *pdu)
1390 {
 :
1400     asp->pdu = snmp_clone_pdu(pdu);
1401     asp->orig_pdu = snmp_clone_pdu(pdu);
 :
1417 }



 581 netsnmp_pdu    *
 582 snmp_clone_pdu(netsnmp_pdu *pdu)
 583 {
 584     return _clone_pdu(pdu, 0);  /* copies all variables */
 585 }



 551 static
 552 netsnmp_pdu    *
 553 _clone_pdu(netsnmp_pdu *pdu, int drop_err)
 554 {
 555     netsnmp_pdu    *newpdu;
 556     newpdu = _clone_pdu_header(pdu);
 557     newpdu = _copy_pdu_vars(pdu, newpdu, drop_err, 0, 10000);   /* skip none, copy all */
 558 
 559     return newpdu;
 560 }



 347 static
 348 netsnmp_pdu    *
 349 _clone_pdu_header(netsnmp_pdu *pdu)
 350 {
 :
 394     if (pdu != NULL && pdu->securityStateRef &&
 395         pdu->command == SNMP_MSG_TRAP2) {
 396 
 397         ret = usm_clone_usmStateReference((struct usmStateReference *) pdu->securityStateRef,
 398                 (struct usmStateReference **) &newpdu->securityStateRef );
 :
 405     }
 406 
 407     if ((sptr = find_sec_mod(newpdu->securityModel)) != NULL &&
 408         sptr->pdu_clone != NULL) {
 409         /*
 410          * call security model if it needs to know about this 
 411          */
 412         (*sptr->pdu_clone) (pdu, newpdu);
 413     }
 :
 416 }


#define SNMP_MSG_GETNEXT    (ASN_CONTEXT | ASN_CONSTRUCTOR | 0x1) /* a1=161 */
#define SNMP_MSG_TRAP2      (ASN_CONTEXT | ASN_CONSTRUCTOR | 0x7) /* a7=167 */

(gdb) p asp->pdu->command
$31 = 161
(gdb) p 0x80|0x20|0x7
$32 = 167
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

On line 395 of _clone_pdu_header(), we have a clone happening of the "securityStateRef" field, but here we don't fall into this since "pdu->command" is not 167, but 161.
Later on line 412 we have some clone code again, which is probably what makes "securityStateRef" be reference 3 times in various "pdu", "asp->pdu" and "asp->orig_pdu" structures.

Browsing Upstream's code, I can see a very similar issue being reported:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
commit 5f881d3bf24599b90d67a45cae7a3eb099cd71c9
Author: Bart Van Assche <bvanassche>
Date:   Sat Jul 27 19:34:09 2019 -0700

    libsnmp, USM: Introduce a reference count in struct usmStateReference
    
    This patch fixes https://sourceforge.net/p/net-snmp/bugs/2956/.
---
 snmplib/snmp_client.c | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

The issue https://sourceforge.net/p/net-snmp/bugs/2956 has a similar backtrace:
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f3a8a220067 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f3a8a220067 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f3a8a221448 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f3a8a25e1b4 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f3a8a26398e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x00007f3a8a264696 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007f3a8a8ced96 in free_securityStateRef (pdu=<optimized out>, pdu=<optimized out>) at snmp_api.c:4018
#6  0x00007f3a8a8d2700 in snmp_free_securityStateRef (pdu=<optimized out>) at snmp_api.c:4040
#7  0x00007f3a8b0ed0ba in free_agent_snmp_session (asp=0x1fc7480) at snmp_agent.c:1612
#8  0x00007f3a8b0f1be4 in handle_snmp_packet (op=<optimized out>, session=<optimized out>, reqid=<optimized out>, pdu=0x1fc2680, magic=<optimized out>) at snmp_agent.c:2251
#9  0x00007f3a8a8deddf in _sess_process_packet_handle_pdu (transport=0x1fb4570, pdu=0x1fc2680, isp=0x1fb4850, sp=0x1fb48e0, sessp=0x1fb4820) at snmp_api.c:5827
#10 _sess_process_packet (sessp=0x1fb4820, sp=0x1fb48e0, isp=0x1fb4850, transport=0x1fb4570, opaque=<optimized out>, olength=<optimized out>, packetptr=0x1fc7b80 "0\201\225\002\001\003\060\021\002\004b\364\240\250\002\003", length=152) at snmp_api.c:5883
#11 0x00007f3a8a8e0342 in _sess_read (sessp=0x1fb4820, fdset=0x7ffd69c89d50) at snmp_api.c:6144
#12 0x00007f3a8a8e0919 in snmp_sess_read2 (sessp=sessp@entry=0x1fb4820, fdset=fdset@entry=0x7ffd69c89d50) at snmp_api.c:6417
#13 0x00007f3a8a8e096b in snmp_read2 (fdset=0x7ffd69c89d50) at snmp_api.c:5932
#14 0x0000000000404bd0 in ?? ()
#15 0x0000000000403e75 in main ()
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Comment 14 errata-xmlrpc 2020-04-28 16:47:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:1817