Bug 501210 - net-snmp SIGFPE 0x00002aaaab37744a in var_hrproc (vp=0x7fffffffbf50
Summary: net-snmp SIGFPE 0x00002aaaab37744a in var_hrproc (vp=0x7fffffffbf50
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: net-snmp
Version: 10
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
Assignee: Jan Safranek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-05-18 02:35 UTC by Colin Lai
Modified: 2009-06-16 02:40 UTC (History)
1 user (show)

Fixed In Version: 5.4.2.1-4.fc10
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-06-16 02:11:16 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Colin Lai 2009-05-18 02:35:07 UTC
Description of problem:
Dell PE1950 Server servicing as IPv6 TS stream convert to IPv4 TS stream, using VLC, have 450Mbps multicasat net flow in/out.
net-snmp snmpd service crash within servial days.

Crash gdb propmt:
Program received signal SIGFPE, Arithmetic exception.
0x00002aaaab37744a in var_hrproc (vp=0x7fffffffbf50,
    name=<value optimized out>, length=<value optimized out>,
    exact=<value optimized out>, var_len=0x7fffffffcb80,
    write_method=<value optimized out>) at host/hr_proc.c:183
183             long_return  = 100 - long_return;


(gdb) disassemble
Dump of assembler code for function var_hrproc:
0x00002aaaab3773a0 <var_hrproc+0>:      push   %rbp
......
0x00002aaaab377446 <var_hrproc+166>:    sub    0x20(%rcx),%rdi
0x00002aaaab37744a <var_hrproc+170>:    div    %rdi
0x00002aaaab37744d <var_hrproc+173>:    mov    $0x64,%edx
......

(gdb) info registers
rax            0x0      0
rbx            0x7fffffffcb80   140737488341888
rcx            0x2aaab4f0e180   46912668492160
rdx            0x0      0
rsi            0x2aaaab119170   46912502862192
rdi            0x0      0
rbp            0x7fffffffbf50   0x7fffffffbf50
rsp            0x7fffffffbee0   0x7fffffffbee0
......

It is certian that the SIGFPE was lead by a DIV ZERO.

Trace the code [net-snmp-5.4.2.1]
agent/mibgroup/host/hr_proc.c, Line 181 - 185
        long_return  = (cpu->idle_ticks  - cpu->history[0].idle_hist)*100;
        long_return /= (cpu->total_ticks - cpu->history[0].total_hist);
        long_return  = 100 - long_return;
        if (long_return < 0)
            long_return = 0;

Summary:
BUG1. I think the absence of ZERO check before div in hr_proc.c line 182 causes the SIGFPE crash.
BUG2. hr_proc.c line 184-185 no longer work as before as long_return is now a unsigned var.
Since version 5.4.2, var long_return's definition changed from [signed long] to [fsblkcnt_t], which is always a unsigned integer. So line 184-185 will no longer work correctly.
long_return define:
agent/snmp_vars.c:fsblkcnt_t            long_return;


Version-Release number of selected component (if applicable):
Name        : net-snmp                     Relocations: (not relocatable)
Version     : 5.4.2.1                           Vendor: Fedora Project
Release     : 3.fc10                        Build Date: Mon 16 Feb 2009 07:26:07 PM CST


How reproducible:
Background net-flow 450Mbps IPv6(in)/IPv4(out) multicast IPTV TS flow(transform by vlc).
Start /etc/init.d/snmpd service, it crash within serval days.

Steps to Reproduce:
1. Background net-flow 450Mbps IPv6(in)/IPv4(out) multicast IPTV TS flow(transform by vlc).
2. Start /etc/init.d/snmpd service, and wait for it crash with serval days
  
Actual results:
snmpd crash with servial days

Expected results:
snmpd continue work without SIGFPE crash

Additional info:

Comment 1 Jan Safranek 2009-05-18 10:07:09 UTC
Strange, how did you make cpu->total_ticks == cpu->history[0].total_hist to get the division by zero? Is the affected CPU stuck somehow?

Anyway, checking for zero is good idea. And I am going to finally remove the patch that adds fsblkcnt_t long_return, it's obviously wrong.

Comment 2 Jan Safranek 2009-05-18 10:37:26 UTC
I fixed the bug upstream (SVN revision 17616), built new package in Rawhide and I am going to push updates in F10 and F11.

Comment 3 Fedora Update System 2009-05-18 11:02:43 UTC
net-snmp-5.4.2.1-4.fc10 has been submitted as an update for Fedora 10.
http://admin.fedoraproject.org/updates/net-snmp-5.4.2.1-4.fc10

Comment 4 Fedora Update System 2009-05-18 11:44:41 UTC
net-snmp-5.4.2.1-12.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/net-snmp-5.4.2.1-12.fc11

Comment 5 Fedora Update System 2009-05-19 01:59:27 UTC
net-snmp-5.4.2.1-4.fc10 has been pushed to the Fedora 10 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update net-snmp'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-5062

Comment 6 Colin Lai 2009-06-03 14:28:17 UTC
I once used kernel-2.6.24.7-92.fc8.x86_64 and snmpd crash frequently, finally system down.
I now update to kernel-2.6.29.3-155.fc11.x86_64, snmpd crash disappeared these days.
May be the old kernel-2.6.24.7-92.fc8 make cpu->total_ticks == cpu->history[0].total_hist to get the division by zero.

Comment 7 Fedora Update System 2009-06-16 02:11:11 UTC
net-snmp-5.4.2.1-4.fc10 has been pushed to the Fedora 10 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 8 Fedora Update System 2009-06-16 02:40:06 UTC
net-snmp-5.4.2.1-12.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.