Bug 1875698

Summary: Double free in netsnmp_handler_free when snmpd exits
Product: [Fedora] Fedora Reporter: Josef Ridky <jridky>
Component: net-snmpAssignee: Josef Ridky <jridky>
Status: CLOSED EOL QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 34CC: fkrska, jridky, jsafrane, mhjacks, myamazak, qe-baseos-apps, rmetrich, sbroz, zdohnal
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1823841 Environment:
Last Closed: 2022-06-07 22:23:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1823841    
Bug Blocks:    

Description Josef Ridky 2020-09-04 06:19:34 UTC
+++ This bug was initially created as a clone of Bug #1823841 +++

Description of problem:

When using an extend, a double-free occurs when snmpd service shuts down.


Version-Release number of selected component (if applicable):

net-snmp-5.7.2-48.el7_8.x86_64


How reproducible:

ALWAYS

Steps to Reproduce:

1. Patch /etc/snmp/snmpd.conf as shown below

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
-view    systemview    included   .1.3.6.1.2.1.1
-view    systemview    included   .1.3.6.1.2.1.25.1.1
+#view    systemview    included   .1.3.6.1.2.1.1
+#view    systemview    included   .1.3.6.1.2.1.25.1.1
+view    all    included    .1 80

...

-access  notConfigGroup ""      any       noauth    exact  systemview none none
+#access  notConfigGroup ""      any       noauth    exact  systemview none none
+access  notConfigGroup ""      any       noauth    exact  all all none

...

+extend .1.3.6.1.4.1.2021.8 mpstat /usr/bin/mpstat -P ALL
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

2. Start snmpd

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# /usr/sbin/snmpd  -f -LS0-6d
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

3. Execute a snmpwalk command triggering mpstat (can be installed, or not, same result)

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# snmpwalk -v 1 -c public localhost .1.3.6.1.4.1.2021.8
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

4. Stop snmpd

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
# pkill -TERM snmpd
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------


Actual results:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
*** Error in `/usr/sbin/snmpd': free(): invalid pointer: 0xXXX ***
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

Additional info:

The backtrace shows it crashes while freeing a pointer. For sure there is a double-free but so far I couldn't spot who freed first:

-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------
 761 void
 762 netsnmp_handler_registration_free(netsnmp_handler_registration *reginfo)
 763 {
 764     if (reginfo != NULL) {
 765         netsnmp_handler_free(reginfo->handler);	<---- HERE
 766         SNMP_FREE(reginfo->handlerName);
 767         SNMP_FREE(reginfo->contextName);
 768         SNMP_FREE(reginfo->rootoid);
 769         reginfo->rootoid_len = 0;
 770         SNMP_FREE(reginfo);
 771     }
 772 }
-------- 8< ---------------- 8< ---------------- 8< ---------------- 8< --------

--- Additional comment from RHEL Program Management on 2020-04-14 17:28:36 CEST ---

Since this bug report was entered in Red Hat Bugzilla, the release flag has been set to ? to ensure that it is properly evaluated for this release.

--- Additional comment from Renaud Métrich on 2020-04-15 10:52:48 CEST ---

It's unclear to me why this doesn't show up under Valgrind, which would be very helpful finding the exact place where the first free() happened.

Anyway, I now have a better view of what happens.
During shutdown, we have the following:

1. Extend gets unregistered, which frees the "ereg_head" global list

gdb) bt
#0  _unregister_extend (eptr=0x555555883a50) at agent/extend.c:260
#1  shutdown_extend () at agent/extend.c:316
#2  0x00007ffff77d0915 in _shutdown_mib_modules (majorID=<optimized out>, minorID=<optimized out>, 
    serve=<optimized out>, client=<optimized out>) at ../agent/mibgroup/mib_module_shutdown.h:34
#3  0x00007ffff721354f in snmp_call_callbacks (major=major@entry=0, minor=minor@entry=2, 
    caller_arg=caller_arg@entry=0x0) at callback.c:363
#4  0x00007ffff71e62f7 in snmp_shutdown (type=<optimized out>) at snmp_api.c:910
#5  0x00005555555579d4 in main (argc=<optimized out>, argv=<optimized out>) at snmpd.c:1135

2. Cache tree gets freed, leading again to freeing the already freed extend

(gdb) bt
#0  0x00007ffff574d387 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007ffff574ea78 in __GI_abort () at abort.c:90
#2  0x00007ffff578fed7 in __libc_message (do_abort=do_abort@entry=2, 
    fmt=fmt@entry=0x7ffff58a2350 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007ffff5798299 in malloc_printerr (ar_ptr=0x7ffff5ade760 <main_arena>, ptr=<optimized out>, 
    str=0x7ffff589fb50 "free(): invalid pointer", action=3) at malloc.c:4967
#4  _int_free (av=0x7ffff5ade760 <main_arena>, p=<optimized out>, have_lock=0) at malloc.c:3843
#5  0x00007ffff7b95b02 in netsnmp_handler_registration_free (reginfo=0x5555559ba7b0)
    at agent_handler.c:765
#6  0x00007ffff7b9886e in netsnmp_subtree_free (a=0x5555559ba6c0) at agent_registry.c:474
#7  0x00007ffff7b9915c in clear_subtree (sub=<optimized out>) at agent_registry.c:961
#8  0x00007ffff7b991ac in clear_context () at agent_registry.c:427
#9  0x00007ffff7ba4f1e in shutdown_agent () at snmp_vars.c:370
#10 0x00005555555579de in main (argc=<optimized out>, argv=<optimized out>) at snmpd.c:1137

In frame 6, we have the subtree reference the extend handler (reginfo), which has already been freed in step 1, causing the crash:

 21 typedef struct netsnmp_subtree_s {
 :
 41     netsnmp_handler_registration *reginfo;      /* new API */
 :
 45 } netsnmp_subtree;


This is quite expected, since after unregistering the extend, the "reginfo" in the subtree in now a dangling pointer.

--- Additional comment from Josef Ridky on 2020-04-15 15:09:05 CEST ---

Issue has been reported to upstream. https://github.com/net-snmp/net-snmp/issues/97

--- Additional comment from Renaud Métrich on 2020-04-16 11:49:21 CEST ---

Another scenario is while performing a reload:

1. Have the extend in /etc/snmp/snmpd.conf

    extend .1.3.6.1.4.1.2021.8 mpstat /usr/bin/mpstat -P ALL

2. Start snmpd

    # /usr/sbin/snmpd -f -Le0-6d

3. Remove the extend

    #extend .1.3.6.1.4.1.2021.8 mpstat /usr/bin/mpstat -P ALL

4. Reload snmpd

    # pkill -HUP snmpd


Oddly, running under valgrind prevents the crash from happening at all.

--- Additional comment from Filip Krska on 2020-08-28 15:55:43 CEST ---

//PX backlog review

Customer closed the case, OK with safely ignoring the error, i.e. not a severe business impact.

I'd suggest to close this for el7 and check if reproducible in el8, Fedora, upstream and eventually continue investigation there.

--- Additional comment from Josef Ridky on 2020-09-04 07:50:35 CEST ---

Agree.

Comment 1 Ben Cotton 2021-02-09 16:24:02 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 34 development cycle.
Changing version to 34.

Comment 2 Ben Cotton 2022-05-12 15:34:19 UTC
This message is a reminder that Fedora Linux 34 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora Linux 34 on 2022-06-07.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
'version' of '34'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, change the 'version' 
to a later Fedora Linux version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora Linux 34 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora Linux, you are encouraged to change the 'version' to a later version
prior to this bug being closed.

Comment 3 Ben Cotton 2022-06-07 22:23:03 UTC
Fedora Linux 34 entered end-of-life (EOL) status on 2022-06-07.

Fedora Linux 34 is no longer maintained, which means that it
will not receive any further security or bug fix updates. As a result we
are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release.

Thank you for reporting this bug and we are sorry it could not be fixed.