Bug 1663027 - net-snmpd double free or corruption error
Summary: net-snmpd double free or corruption error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: net-snmp
Version: 30
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Josef Ridky
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks: 1753506
TreeView+ depends on / blocked
 
Reported: 2019-01-02 18:21 UTC by Michael Watters
Modified: 2019-10-31 15:26 UTC (History)
7 users (show)

Fixed In Version: net-snmp-5.8-10.fc30 net-snmp-5.8-10.fc29
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1753506 (view as bug list)
Environment:
Last Closed: 2019-07-11 00:57:38 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Valgrind log file (244.46 KB, text/plain)
2019-06-18 13:35 UTC, Michael Watters
no flags Details

Description Michael Watters 2019-01-02 18:21:03 UTC
Description of problem:

After upgrading to the latest release of the net-snmp package the service will fail after some time with the following error in the journal.

Jan 02 12:57:24 host.example.com snmpd[3421]: double free or corruption (fasttop)
Jan 02 12:57:24 host.example.com systemd[1]: snmpd.service: Main process exited, code=dumped, status=6/ABRT

Version-Release number of selected component (if applicable):

Name        : net-snmp
Epoch       : 1
Version     : 5.8
Release     : 3.fc29
Architecture: x86_64
Install Date: Wed 02 Jan 2019 10:11:53 AM EST

How reproducible:

Always.

Steps to Reproduce:
1.  start snmpd service

Actual results:

service fails with error message.

Expected results:

service runs with no errors.

Comment 1 Michael Watters 2019-01-03 15:03:48 UTC
After starting the service the process core dumps after running for 10-15 minutes.

Jan 03 09:57:26 host.example.com systemd[1]: Started Process Core Dump (PID 3116/UID 0).
Jan 03 09:57:26 host.example.com systemd[1]: snmpd.service: Main process exited, code=dumped, status=6/ABRT
Jan 03 09:57:26 host.example.com systemd[1]: snmpd.service: Failed with result 'core-dump'.
Jan 03 09:57:26 host.example.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=snmpd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Jan 03 09:57:27 host.example.com systemd-coredump[3117]: Process 1848 (snmpd) of user 0 dumped core.
                                                                      Stack trace of thread 1848:
                                                                      #0  0x00007fc8679de53f raise (libc.so.6)
                                                                      #1  0x00007fc8679c8895 abort (libc.so.6)
                                                                      #2  0x00007fc867a21927 __libc_message (libc.so.6)
                                                                      #3  0x00007fc867a2825c malloc_printerr (libc.so.6)
                                                                      #4  0x00007fc867a29c17 _int_free (libc.so.6)
                                                                      #5  0x00007fc868072923 usm_free_usmStateReference (libnetsnmp.so.35)
                                                                      #6  0x00007fc868077c45 usm_generate_out_msg (libnetsnmp.so.35)
                                                                      #7  0x00007fc8680784b9 usm_secmod_generate_out_msg (libnetsnmp.so.35)
                                                                      #8  0x00007fc86801b8e2 snmpv3_packet_build (libnetsnmp.so.35)
                                                                      #9  0x00007fc86801d917 snmp_build (libnetsnmp.so.35)
                                                                      #10 0x00007fc86801de3b netsnmp_build_packet (libnetsnmp.so.35)
                                                                      #11 0x00007fc86801e0e6 _build_initial_pdu_packet (libnetsnmp.so.35)
                                                                      #12 0x00007fc868488949 netsnmp_wrap_up_request (libnetsnmpagent.so.35)
                                                                      #13 0x00007fc86848bafb netsnmp_handle_request (libnetsnmpagent.so.35)
                                                                      #14 0x00007fc86848be1a handle_snmp_packet (libnetsnmpagent.so.35)
                                                                      #15 0x00007fc8680260f2 n/a (libnetsnmp.so.35)
                                                                      #16 0x00007fc868027156 _sess_read (libnetsnmp.so.35)
                                                                      #17 0x00007fc868027cbd snmp_sess_read2 (libnetsnmp.so.35)
                                                                      #18 0x00007fc868027d0b snmp_read2 (libnetsnmp.so.35)
                                                                      #19 0x000055cb26168e77 n/a (snmpd)
                                                                      #20 0x000055cb261681b8 n/a (snmpd)
                                                                      #21 0x00007fc8679ca413 __libc_start_main (libc.so.6)
                                                                      #22 0x000055cb261685ce n/a (snmpd)
Jan 03 09:57:27 host.example.com audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-coredump@0-3116-0 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Jan 03 09:57:30 host.example.com abrt-server[3126]: Deleting problem directory ccpp-2019-01-03-09:57:27.337778-1848 (dup of ccpp-2018-12-27-16:57:10.112564-1770)
Jan 03 09:57:30 host.example.com dbus-daemon[1032]: [system] Activating service name='org.freedesktop.problems' requested by ':1.48' (uid=0 pid=3170 comm="/usr/bin/python3 /usr/bin/abrt-action-notify -d /v" label="system_u:system_r:abrt_t:s0-s0:c0.c1023") (using servicehelper)
Jan 03 09:57:30 host.example.com dbus-daemon[1032]: [system] Successfully activated service 'org.freedesktop.problems'
Jan 03 09:57:31 host.example.com abrt-notification[3176]: Process 1770 (snmpd) crashed in usm_free_usmStateReference()

Please let me know if you would like a core file attached to this ticket.

Comment 2 Josef Ridky 2019-02-15 06:47:15 UTC
Hi, 

may I ask you for core file and output of following command:

# valgrind --leak-check=full snmpd -Lo0-7d -f

Before running of this command, be sure, that snmpd service is stopped and you're in the separate terminal window as root.

Comment 3 Michael Watters 2019-02-15 16:17:05 UTC
Hello,

It appears that this may have been fixed by recent updates.  I just installed net-snmp 5.8.6 and the service is no longer crashing.

Comment 4 Michael Watters 2019-06-18 13:34:49 UTC
This issue appears to have come back however the crashes only happen when snmpd is started by systemd.  The following errors are shown in the logs when the service fails.

Jun 17 12:27:21 mdct-bacula.dartcontainer.com snmpd[3619]: double free or corruption (fasttop)
Jun 17 12:27:21 mdct-bacula.dartcontainer.com systemd[1]: snmpd.service: Main process exited, code=dumped, status=6/ABRT
Jun 17 12:27:21 mdct-bacula.dartcontainer.com systemd[1]: snmpd.service: Failed with result 'core-dump'.

When running through valgrind I did not see any crashes or core dumps created.  I've attached the log file in case it contains any useful data.

Comment 5 Michael Watters 2019-06-18 13:35:39 UTC
Created attachment 1581711 [details]
Valgrind log file

This contains valgrind data from 24 hours of operation.

Comment 6 Jonathan Liedy 2019-06-27 16:38:16 UTC
I am having the same issue on 2 fresh installs of RHEL8 and updating to the latest SNMP packages:

net-snmp-libs-5.8-7.el8_0.1.x86_64
net-snmp-utils-5.8-7.el8_0.1.x86_64
net-snmp-5.8-7.el8_0.1.x86_64
net-snmp-agent-libs-5.8-7.el8_0.1.x86_64

Would any additional core files/valgrind output help?  I can provide a support contract # for the systems as well.
I went ahead and opened case # 02415022 and referenced this bug in order to prevent any duplicates being created.

Comment 7 Jonathan Liedy 2019-06-27 16:51:26 UTC
Core dumps and SOS report added to the case.

Comment 8 Michael Watters 2019-06-27 17:14:41 UTC
I'd also add that this does not happen on our servers running net-snmp 5.8.7.  I was previously able to resolve this by running "dnf downgrade net-snmp" however Fedora 30 doesn't appear to have any older packages available.

Comment 9 Jonathan Liedy 2019-06-27 20:01:17 UTC
I did a downgrade (still 5.8-7 on RHEL8) and it's still happening.
My open RHEL support ticket is now referencing this bug so hopefully we'll see some movement on it.

Comment 13 Fedora Update System 2019-07-01 12:33:24 UTC
FEDORA-2019-053574258e has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-053574258e

Comment 14 Fedora Update System 2019-07-01 12:33:25 UTC
FEDORA-2019-6579cf8565 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-6579cf8565

Comment 15 Fedora Update System 2019-07-03 02:26:26 UTC
net-snmp-5.8-10.fc30 has been pushed to the Fedora 30 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-053574258e

Comment 16 Fedora Update System 2019-07-03 19:48:47 UTC
net-snmp-5.8-10.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2019-6579cf8565

Comment 17 Jonathan Liedy 2019-07-09 18:49:33 UTC
Michael,

Do the fixes in the testing repositories work for you for Fedora?  I got the 5.8.8 updated release for RHEL8 which is still failing within 20 minutes of starting the service.

Comment 18 Josef Ridky 2019-07-10 08:46:39 UTC
I have tested the scenario from #02415022 on f30 (net-snmp-5.8-10) and RHEL8 (net-snmp-5.8-10) and all is working for 90+ minutes as expected.

I have used the /etc/snmp/snmpd.conf file from attached SOS report and I am not hitting any issue with net-snmp.

Can you share more information about what queries are sent to snmpd (snmpget, snmpwalk and so on) via #02415022?

Comment 19 Josef Ridky 2019-07-10 08:49:44 UTC
Moving the RHEL-8 related conversation to separate bug report.

@Jonathan, please, respond in bugzilla #1726373

Comment 20 Michael Watters 2019-07-10 15:01:14 UTC
@Jonathan,

I've installed the net-snmp packages from the updates testing repo and the service appears to be stable now.

Comment 21 Fedora Update System 2019-07-11 00:57:38 UTC
net-snmp-5.8-10.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 24 Fedora Update System 2019-08-01 03:50:59 UTC
net-snmp-5.8-10.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 27 pavel 2019-09-17 20:14:43 UTC
fedora 30, snmp 5.8-10

==11498== Invalid free() / delete / delete[] / realloc()
==11498==    at 0x4839A0C: free (vg_replace_malloc.c:540)
==11498==    by 0x4C548CC: usm_rgenerate_out_msg (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4C55458: usm_secmod_rgenerate_out_msg (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4BF46B8: snmpv3_packet_realloc_rbuild (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4BF86B4: snmp_build (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4BF8DC9: netsnmp_build_packet (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4BF9026: _build_initial_pdu_packet (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4BFAE8A: snmp_sess_async_send (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x48A5817: netsnmp_wrap_up_request (in /usr/lib64/libnetsnmpagent.so.35.0.0)
==11498==    by 0x48A82AD: netsnmp_check_delegated_requests (in /usr/lib64/libnetsnmpagent.so.35.0.0)
==11498==    by 0x48A8DF0: netsnmp_check_outstanding_agent_requests (in /usr/lib64/libnetsnmpagent.so.35.0.0)
==11498==    by 0x10D948: ??? (in /usr/sbin/snmpd)
==11498==  Address 0x14491000 is 0 bytes inside a block of size 104 free'd
==11498==    at 0x4839A0C: free (vg_replace_malloc.c:540)
==11498==    by 0x4C53A57: usm_rgenerate_out_msg (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4C55458: usm_secmod_rgenerate_out_msg (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4BF46B8: snmpv3_packet_realloc_rbuild (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4BF86B4: snmp_build (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4BF8DC9: netsnmp_build_packet (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4BF9026: _build_initial_pdu_packet (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x48A5747: netsnmp_wrap_up_request (in /usr/lib64/libnetsnmpagent.so.35.0.0)
==11498==    by 0x48A82AD: netsnmp_check_delegated_requests (in /usr/lib64/libnetsnmpagent.so.35.0.0)
==11498==    by 0x48A8DF0: netsnmp_check_outstanding_agent_requests (in /usr/lib64/libnetsnmpagent.so.35.0.0)
==11498==    by 0x10D948: ??? (in /usr/sbin/snmpd)
==11498==    by 0x10D1D4: ??? (in /usr/sbin/snmpd)
==11498==  Block was alloc'd at
==11498==    at 0x483AB1A: calloc (vg_replace_malloc.c:762)
==11498==    by 0x4C50B00: usm_process_in_msg (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4C51E8C: usm_secmod_process_in_msg (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4BFEC70: snmpv3_parse (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4BFFF72: ??? (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4C01846: ??? (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4C02115: _sess_read (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4C02C3C: snmp_sess_read2 (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x4C02C8A: snmp_read2 (in /usr/lib64/libnetsnmp.so.35.0.0)
==11498==    by 0x10DD81: ??? (in /usr/sbin/snmpd)
==11498==    by 0x10D1D4: ??? (in /usr/sbin/snmpd)
==11498==    by 0x4D9FF42: (below main) (in /usr/lib64/libc-2.29.so)

Comment 28 Josef Ridky 2019-09-18 06:43:08 UTC
@Pavel do you have trap forwarding enabled in configuration?

Comment 29 pavel 2019-09-18 09:47:06 UTC
No, it is disabled. In my config snmpd is master (agentx), agentx transport is unix sockets and subagent is agentx client based on agent++ library. security model is USM.

interesting it fails only on some tables (and no problem with the same table in <= fc28 os)

under valgrind it does not crash (think because of valgrind virtual machine) and shows some interesting messages like

send response: USM encryption error (build string: bad header, length too short: 2 < 11)

or

send response: USM encryption error

or 

send response: USM encryption error (Can't build OID for variable)

i tried to review subagent source code but didnt manage to find any difference between "good" and "bad" tables

may be PDU size limitation ? "bad" table contains 7 columns and one of columns has length ~ 200 characters, snmpd does not crash when subagent returns 1 table entry but immediately crashes when i add second record. anyway i don't see this crash in previous versions of snmpd

Comment 30 Michael Watters 2019-10-31 15:26:34 UTC
This appears to still be broken in Fedora 30.  After updating to net-snmp 5.8-10.fc30 the snmpd service now fails on a regular basis.

Oct 31 09:25:58 mdct-bacula.dartcontainer.com systemd[1]: snmpd.service: Main process exited, code=dumped, status=6/ABRT
Oct 31 09:25:58 mdct-bacula.dartcontainer.com systemd[1]: snmpd.service: Failed with result 'core-dump'.
Oct 31 09:39:09 mdct-bacula.dartcontainer.com systemd[1]: Starting Simple Network Management Protocol (SNMP) Daemon....
Oct 31 09:39:09 mdct-bacula.dartcontainer.com snmpd[31374]: NET-SNMP version 5.8
Oct 31 09:39:09 mdct-bacula.dartcontainer.com systemd[1]: Started Simple Network Management Protocol (SNMP) Daemon..
Oct 31 09:55:58 mdct-bacula.dartcontainer.com snmpd[31374]: double free or corruption (fasttop)
Oct 31 09:55:58 mdct-bacula.dartcontainer.com systemd[1]: snmpd.service: Main process exited, code=dumped, status=6/ABRT
Oct 31 09:55:58 mdct-bacula.dartcontainer.com systemd[1]: snmpd.service: Failed with result 'core-dump'.
Oct 31 10:09:09 mdct-bacula.dartcontainer.com systemd[1]: Starting Simple Network Management Protocol (SNMP) Daemon....
Oct 31 10:09:09 mdct-bacula.dartcontainer.com snmpd[31933]: NET-SNMP version 5.8
Oct 31 10:09:09 mdct-bacula.dartcontainer.com systemd[1]: Started Simple Network Management Protocol (SNMP) Daemon..
Oct 31 10:25:58 mdct-bacula.dartcontainer.com snmpd[31933]: double free or corruption (fasttop)


Note You need to log in before you can comment on or make changes to this bug.