Created attachment 557865 [details] vmstat trap divide error core file When calling "vmstat 1" the program crashes after some minutes on this RHEL 5 guest running on VMware ESX: # vmstat 1: ... 5 1 724656 52132 26428 917228 224 2592 716 14376 1056 14410 29 16 49 6 0 0 2 724568 38368 26728 930896 84 0 5924 10050 1673 12767 9 11 50 29 0 32 1 724500 35888 26924 933092 68 0 308 4733 335 5380 25 18 44 13 0 17 1 724424 24436 27120 935556 188 0 2663 14244 898 7644 28 15 37 20 0 28 5 724392 19488 27220 937400 68 0 5720 3933 295 2812 44 30 9 17 0 Floating point exception (core dumped) # dmesg: vmstat[26952] trap divide error rip:402457 rsp:7fff6774a910 error:0 Core was generated by `vmstat 1'. Program terminated with signal 8, Arithmetic exception. #0 0x0000000000402457 in ?? () # rpm -qf `which vmstat` procps-3.2.7-17.el5 # uname -a Linux ls3215v11 2.6.18-274.17.1.el5 #1 SMP Wed Jan 4 22:45:44 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
I had seen the same error on SuSE quite a while ago, the BZ there was 529981 (https://bugzilla.novell.com/show_bug.cgi?id=529981) and it seems to be fixed in SLES 10 SP3 with procps-3.2.6-18.17.1.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux release for currently deployed products. This request is not yet committed for inclusion in a release.
vmstat is part of the procps package, reassigning to the correct component
Hello Alexander. This bug seems to have similar symptoms like several previously reported bugs where the root cause appeared to be in the kernel. I'm going to change the component to kernel and we'll see. Regards, Jaromir.
This request was not resolved in time for the current release. Red Hat invites you to ask your support representative to propose this request, if still desired, for consideration in the next release of Red Hat Enterprise Linux.
This isn't really my space, but trying to catch old bugs that have slipped through the cracks. Is this reproducible on real hardware as well, or only while running under ESX? It could be a bug in the floating point handling in ESX. Is it still a problem with recent RHEL5? Thanks, Jes
Created attachment 698966 [details] vmstat trap divide error core file 2 core file with recent sysstat
Here you go, it happens also with a recent RHEL 5.9 environment and a newer VMware ESX version. I have not observed this on a physical installation or XEN/KVM VM yet. # vmstat 1: ... procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 11 2 1002020 8660 2196 1605456 0 12596 0 49456 206 87 0 41 48 10 0 3 0 1008548 9344 2248 1618632 0 15752 4 63020 279 165 0 48 45 7 0 4 1 1008548 10788 2284 1642244 0 21272 0 65872 294 150 0 47 44 10 0 1 0 1008548 10532 2376 1661436 0 22144 16 22260 378 205 0 21 49 30 0 9 0 1010456 8784 2548 1667740 0 16692 4 16692 604 289 0 32 27 41 0 2 2 1013452 10412 2564 1686276 0 45756 4 46604 653 319 0 35 23 42 0 6 2 1013452 8992 2604 1710072 0 19816 0 84068 595 344 1 36 44 19 0 4 2 1018596 10092 2640 1724120 0 11828 4 68252 230 132 0 51 37 11 0 6 3 1018596 11360 2668 1737756 0 28500 0 116980 201 110 0 69 14 17 0 8 3 1018596 9996 2680 1746272 0 0 0 23040 81 67 0 65 18 18 0 Floating point exception (core dumped) # dmesg: vmstat[3125] trap divide error rip:402466 rsp:7fff2b06aef0 error:0 Core was generated by `vmstat 1'. Program terminated with signal 8, Arithmetic exception. #0 0x0000000000402466 in ?? () # rpm -qf `which vmstat` procps-3.2.7-22.el5 # uname -a Linux ls3215v12 2.6.18-348.1.1.el5 #1 SMP Fri Dec 14 05:25:59 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
Alexander, could you provide the upstream commit ID or some other link to the patch that fixed this problem on SLES? My colleagues most likely don't have access to the SUSE bug you mentioned in comment #1.
This is an extract from the changelog of SuSE's procps package. * Wed Aug 19 2009 werner - Be aware that on XEN and VMware systems Div can become zero (bnc#529981) Therefore I would recommend that your developers get in contact with Werner Fink from SuSE for further details as I cannot provide any upstream patch or similar myself.
Hello. I've applied some modifications from the procps-ng successor project, where the same issue seems to be fixed in a bit different way. Could you please test the following package and let me know about the result? http://jcapik.fedorapeople.org/files/procps/procps-3.2.7-23_testing.x86_64.rpm Thanks in advance. Regards, Jaromir.
Hello Jaromir, thank you for the update. After updating the procps package I could not reproduce the error so far. Best Regards, Alexander.
Hello Alexander. Thanks a lot for the confirmation. Going to ask for devel ack. Regards, Jaromir.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1338.html