Bug 785169 - SAP 5.10 RFE: vmstat trap divide error
Summary: SAP 5.10 RFE: vmstat trap divide error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: procps
Version: 5.7
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 5.10
Assignee: Jaromír Cápík
QA Contact: Branislav Náter
URL:
Whiteboard:
Depends On: 819073 820507
Blocks: 978304
TreeView+ depends on / blocked
 
Reported: 2012-01-27 14:10 UTC by Alexander Hass
Modified: 2018-11-28 19:46 UTC (History)
13 users (show)

Fixed In Version: procps-3.2.7-23.el5
Doc Type: Enhancement
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-09-30 23:20:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
vmstat trap divide error core file (260.00 KB, application/octet-stream)
2012-01-27 14:10 UTC, Alexander Hass
no flags Details
vmstat trap divide error core file 2 (260.00 KB, application/octet-stream)
2013-02-18 16:20 UTC, Alexander Hass
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Novell 529981 0 None None None Never
Red Hat Bugzilla 817136 0 unspecified CLOSED vmstat fails intermittently with "floating point exception" 2021-02-22 00:41:40 UTC
Red Hat Product Errata RHBA-2013:1338 0 normal SHIPPED_LIVE procps bug fix update 2013-09-30 21:12:59 UTC

Internal Links: 817136

Description Alexander Hass 2012-01-27 14:10:50 UTC
Created attachment 557865 [details]
vmstat trap divide error core file

When calling "vmstat 1" the program crashes after some minutes on this RHEL 5 guest running on VMware ESX:

# vmstat 1:
...
 5  1 724656  52132  26428 917228  224 2592  716 14376 1056 14410 29 16 49  6  0
 0  2 724568  38368  26728 930896   84    0 5924 10050 1673 12767  9 11 50 29  0
32  1 724500  35888  26924 933092   68    0  308  4733  335 5380 25 18 44 13  0
17  1 724424  24436  27120 935556  188    0 2663 14244  898 7644 28 15 37 20  0
28  5 724392  19488  27220 937400   68    0 5720  3933  295 2812 44 30  9 17  0
Floating point exception (core dumped)

# dmesg:
vmstat[26952] trap divide error rip:402457 rsp:7fff6774a910 error:0

Core was generated by `vmstat 1'.
Program terminated with signal 8, Arithmetic exception.
#0  0x0000000000402457 in ?? ()

# rpm -qf `which vmstat`
procps-3.2.7-17.el5

# uname -a
Linux ls3215v11 2.6.18-274.17.1.el5 #1 SMP Wed Jan 4 22:45:44 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

Comment 1 Alexander Hass 2012-02-08 23:26:06 UTC
I had seen the same error on SuSE quite a while ago, the BZ there was 529981
(https://bugzilla.novell.com/show_bug.cgi?id=529981)
and it seems to be fixed in SLES 10 SP3 with procps-3.2.6-18.17.1.

Comment 2 RHEL Program Management 2012-04-19 11:49:23 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 3 Peter Schiffer 2012-05-16 11:47:55 UTC
vmstat is part of the procps package, reassigning to the correct component

Comment 4 RHEL Program Management 2012-05-16 11:57:20 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux release.  Product Management has
requested further review of this request by Red Hat Engineering, for
potential inclusion in a Red Hat Enterprise Linux release for currently
deployed products.  This request is not yet committed for inclusion in
a release.

Comment 5 Jaromír Cápík 2012-06-21 15:30:32 UTC
Hello Alexander.

This bug seems to have similar symptoms like several previously reported bugs where the root cause appeared to be in the kernel. I'm going to change the component to kernel and we'll see.

Regards,
Jaromir.

Comment 6 RHEL Program Management 2012-10-30 05:55:19 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 7 Jes Sorensen 2013-02-15 12:45:40 UTC
This isn't really my space, but trying to catch old bugs that have slipped
through the cracks.

Is this reproducible on real hardware as well, or only while running under
ESX? It could be a bug in the floating point handling in ESX.

Is it still a problem with recent RHEL5?

Thanks,
Jes

Comment 8 Alexander Hass 2013-02-18 16:20:36 UTC
Created attachment 698966 [details]
vmstat trap divide error core file 2

core file with recent sysstat

Comment 9 Alexander Hass 2013-02-18 16:27:14 UTC
Here you go, it happens also with a recent RHEL 5.9 environment and a newer VMware ESX version. I have not observed this on a physical installation or XEN/KVM VM yet.

# vmstat 1:
...
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
11  2 1002020   8660   2196 1605456   0 12596    0 49456  206   87  0 41 48 10  0
 3  0 1008548   9344   2248 1618632   0 15752    4 63020  279  165  0 48 45  7  0
 4  1 1008548  10788   2284 1642244   0 21272    0 65872  294  150  0 47 44 10  0
 1  0 1008548  10532   2376 1661436   0 22144   16 22260  378  205  0 21 49 30  0
 9  0 1010456   8784   2548 1667740   0 16692    4 16692  604  289  0 32 27 41  0
 2  2 1013452  10412   2564 1686276   0 45756    4 46604  653  319  0 35 23 42  0
 6  2 1013452   8992   2604 1710072   0 19816    0 84068  595  344  1 36 44 19  0
 4  2 1018596  10092   2640 1724120   0 11828    4 68252  230  132  0 51 37 11  0
 6  3 1018596  11360   2668 1737756   0 28500    0 116980  201  110  0 69 14 17  0
 8  3 1018596   9996   2680 1746272   0    0     0 23040   81   67  0 65 18 18  0
Floating point exception (core dumped)

# dmesg:
vmstat[3125] trap divide error rip:402466 rsp:7fff2b06aef0 error:0

Core was generated by `vmstat 1'.
Program terminated with signal 8, Arithmetic exception.
#0  0x0000000000402466 in ?? ()

# rpm -qf `which vmstat`
procps-3.2.7-22.el5

# uname -a
Linux ls3215v12 2.6.18-348.1.1.el5 #1 SMP Fri Dec 14 05:25:59 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

Comment 10 Frank Danapfel 2013-04-11 14:14:58 UTC
Alexander, could you provide the upstream commit ID or some other link to the patch that fixed this problem on SLES? My colleagues most likely don't have access to the SUSE bug you mentioned in comment #1.

Comment 11 Alexander Hass 2013-04-11 14:23:27 UTC
This is an extract from the changelog of SuSE's procps package.

* Wed Aug 19 2009 werner
- Be aware that on XEN and VMware systems Div can become zero (bnc#529981)

Therefore I would recommend that your developers get in contact with Werner Fink from SuSE for further details as I cannot provide any upstream patch or similar myself.

Comment 21 Jaromír Cápík 2013-05-06 19:19:55 UTC
Hello.

I've applied some modifications from the procps-ng successor project, where the same issue seems to be fixed in a bit different way.
Could you please test the following package and let me know about the result?

http://jcapik.fedorapeople.org/files/procps/procps-3.2.7-23_testing.x86_64.rpm

Thanks in advance.

Regards,
Jaromir.

Comment 22 Alexander Hass 2013-05-06 20:38:26 UTC
Hello Jaromir,

thank you for the update. After updating the procps package I could not reproduce the error so far.

Best Regards,
 Alexander.

Comment 23 Jaromír Cápík 2013-05-07 11:07:52 UTC
Hello Alexander.

Thanks a lot for the confirmation. Going to ask for devel ack.

Regards,
Jaromir.

Comment 30 errata-xmlrpc 2013-09-30 23:20:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1338.html


Note You need to log in before you can comment on or make changes to this bug.