RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1373590 - pcp-atop killed by SIGFPE
Summary: pcp-atop killed by SIGFPE
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: pcp
Version: 6.8
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: ---
Assignee: Nathan Scott
QA Contact: Miloš Prchlík
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-06 16:29 UTC by Deepu K S
Modified: 2020-05-14 15:17 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-21 11:20:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ABRT captured problem directory (coredump included) (1.83 MB, application/x-gzip)
2016-09-06 16:33 UTC, Deepu K S
no flags Details
pcp atop log (17.97 KB, text/plain)
2016-09-12 14:38 UTC, Deepu K S
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0735 0 normal SHIPPED_LIVE pcp bug fix update 2017-03-21 12:43:59 UTC

Description Deepu K S 2016-09-06 16:29:21 UTC
Description of problem:
Process /usr/libexec/pcp/bin/pcp-atop was killed by signal 8 (SIGFPE)

It looks like pcp-atop crashed due to a divide by zero condition.

Core was generated by `/usr/libexec/pcp/bin/pcp-atop'.
Program terminated with signal 8, Arithmetic exception.
#0  0x0000000000417ef0 in prisyst (sstat=0x83d5e0, curline=2, nsecs=1, avgval=0, fixedhead=0, selp=0x63bd20, highorderp=0x7fff3f7e590e "C", maxcpulines=999, maxdsklines=999, 
    maxmddlines=999, maxlvmlines=999, maxintlines=999, maxnfslines=999, maxcontlines=999) at showlinux.c:1241
1241	        extra.percputot = extra.cputot / sstat->cpu.nrcpu;
(gdb) bt
#0  0x0000000000417ef0 in prisyst (sstat=0x83d5e0, curline=2, nsecs=1, avgval=0, fixedhead=0, selp=0x63bd20, highorderp=0x7fff3f7e590e "C", maxcpulines=999, maxdsklines=999, 
    maxmddlines=999, maxlvmlines=999, maxintlines=999, maxnfslines=999, maxcontlines=999) at showlinux.c:1241
#1  0x00000000004141ef in generic_samp (curtime=1472711047.4024861, delta=1.0000000000000291, sstat=0x83d5e0, tstat=0x83c210, proclist=0x83c350, ndeviat=0, ntask=0, nactproc=0, 
    totproc=0, totrun=0, totslpi=0, totslpu=0, totzomb=0, nexit=0, noverflow=0, flags=1) at showgeneric.c:294
#2  0x000000000040535d in engine () at atop.c:671
#3  0x00000000004056f9 in main (argc=1, argv=<value optimized out>) at atop.c:449
(gdb) f 0
#0  0x0000000000417ef0 in prisyst (sstat=0x83d5e0, curline=2, nsecs=1, avgval=0, fixedhead=0, selp=0x63bd20, highorderp=0x7fff3f7e590e "C", maxcpulines=999, maxdsklines=999, 
    maxmddlines=999, maxlvmlines=999, maxintlines=999, maxnfslines=999, maxcontlines=999) at showlinux.c:1241
1241	        extra.percputot = extra.cputot / sstat->cpu.nrcpu;
(gdb) l
1236	        }
1237	
1238	        if (extra.cputot == 0)
1239	                extra.cputot = 1;             /* avoid divide-by-zero */
1240	
1241	        extra.percputot = extra.cputot / sstat->cpu.nrcpu;
1242	
1243	        if (extra.percputot == 0)
1244	                extra.percputot = 1;          /* avoid divide-by-zero */
1245	
(gdb) p *sstat
$1 = {stamp = {tv_sec = 0, tv_usec = 0}, cpu = {nrcpu = 0, devint = 0, csw = 0, nprocs = 0, lavg1 = -1, lavg5 = -1, lavg15 = -1, all = {cpunr = 0, stime = 0, utime = 0, ntime = 0, 
      itime = 0, wtime = 0, Itime = 0, Stime = 0, steal = 0, guest = 0, freqcnt = {maxfreq = 0, cnt = 0, ticks = 0}}, cpu = 0x83dc10}, mem = {physmem = 0, freemem = 0, buffermem = 0, 
    slabmem = 0, cachemem = 0, cachedrt = 0, totswap = 0, freeswap = 0, pgscans = 0, pgsteal = 0, allocstall = 0, swouts = 0, swins = 0, commitlim = 0, committed = 0, shmem = 0, 
    shmrss = 0, shmswp = 0, slabreclaim = 0, tothugepage = 0, freehugepage = 0, hugepagesz = 0, vmwballoon = 0}, net = {ipv4 = {Forwarding = 0, DefaultTTL = 0, InReceives = 0, 
      InHdrErrors = 0, InAddrErrors = 0, ForwDatagrams = 0, InUnknownProtos = 0, InDiscards = 0, InDelivers = 0, OutRequests = 0, OutDiscards = 0, OutNoRoutes = 0, ReasmTimeout = 0, 
      ReasmReqds = 0, ReasmOKs = 0, ReasmFails = 0, FragOKs = 0, FragFails = 0, FragCreates = 0}, icmpv4 = {InMsgs = 0, InErrors = 0, InDestUnreachs = 0, InTimeExcds = 0, 
      InParmProbs = 0, InSrcQuenchs = 0, InRedirects = 0, InEchos = 0, InEchoReps = 0, InTimestamps = 0, InTimestampReps = 0, InAddrMasks = 0, InAddrMaskReps = 0, OutMsgs = 0, 
      OutErrors = 0, OutDestUnreachs = 0, OutTimeExcds = 0, OutParmProbs = 0, OutSrcQuenchs = 0, OutRedirects = 0, OutEchos = 0, OutEchoReps = 0, OutTimestamps = 0, 
      OutTimestampReps = 0, OutAddrMasks = 0, OutAddrMaskReps = 0}, udpv4 = {InDatagrams = 0, NoPorts = 0, InErrors = 0, OutDatagrams = 0}, ipv6 = {Ip6InReceives = 0, 
      Ip6InHdrErrors = 0, Ip6InTooBigErrors = 0, Ip6InNoRoutes = 0, Ip6InAddrErrors = 0, Ip6InUnknownProtos = 0, Ip6InTruncatedPkts = 0, Ip6InDiscards = 0, Ip6InDelivers = 0, 
      Ip6OutForwDatagrams = 0, Ip6OutRequests = 0, Ip6OutDiscards = 0, Ip6OutNoRoutes = 0, Ip6ReasmTimeout = 0, Ip6ReasmReqds = 0, Ip6ReasmOKs = 0, Ip6ReasmFails = 0, Ip6FragOKs = 0, 
      Ip6FragFails = 0, Ip6FragCreates = 0, Ip6InMcastPkts = 0, Ip6OutMcastPkts = 0}, icmpv6 = {Icmp6InMsgs = 0, Icmp6InErrors = 0, Icmp6InDestUnreachs = 0, Icmp6InPktTooBigs = 0, 
      Icmp6InTimeExcds = 0, Icmp6InParmProblems = 0, Icmp6InEchos = 0, Icmp6InEchoReplies = 0, Icmp6InGroupMembQueries = 0, Icmp6InGroupMembResponses = 0, 
      Icmp6InGroupMembReductions = 0, Icmp6InRouterSolicits = 0, Icmp6InRouterAdvertisements = 0, Icmp6InNeighborSolicits = 0, Icmp6InNeighborAdvertisements = 0, Icmp6InRedirects = 0, 
      Icmp6OutMsgs = 0, Icmp6OutDestUnreachs = 0, Icmp6OutPktTooBigs = 0, Icmp6OutTimeExcds = 0, Icmp6OutParmProblems = 0, Icmp6OutEchoReplies = 0, Icmp6OutRouterSolicits = 0, 
      Icmp6OutNeighborSolicits = 0, Icmp6OutNeighborAdvertisements = 0, Icmp6OutRedirects = 0, Icmp6OutGroupMembResponses = 0, Icmp6OutGroupMembReductions = 0}, udpv6 = {
      Udp6InDatagrams = 0, Udp6NoPorts = 0, Udp6InErrors = 0, Udp6OutDatagrams = 0}, tcp = {RtoAlgorithm = 0, RtoMin = 0, RtoMax = 0, MaxConn = 0, ActiveOpens = 0, PassiveOpens = 0, 
      AttemptFails = 0, EstabResets = 0, CurrEstab = 0, InSegs = 0, OutSegs = 0, RetransSegs = 0, InErrs = 0, OutRsts = 0}}, intf = {nrintf = 0, intf = 0x83dc80}, dsk = {ndsk = 0, 
    nmdd = 0, nlvm = 0, dsk = 0x83dd40, mdd = 0x83de00, lvm = 0x83dda0}, nfs = {server = {netcnt = 0, netudpcnt = 0, nettcpcnt = 0, nettcpcon = 0, rpccnt = 0, rpcbadfmt = 0, 
      rpcbadaut = 0, rpcbadcln = 0, rpcread = 0, rpcwrite = 0, rchits = 0, rcmiss = 0, rcnoca = 0, nrbytes = 0, nwbytes = 0}, client = {rpccnt = 0, rpcretrans = 0, rpcautrefresh = 0, 
      rpcread = 0, rpcwrite = 0}, nrmounts = 0, nfsmnt = 0x83de60}, cfs = {nrcontainer = 0, cont = 0x0}, www = {accesses = 0, totkbytes = 0, uptime = 0, bworkers = 0, iworkers = 0}}
(gdb) p sstat->cpu.nrcpu
$2 = 0
(gdb)
Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux 6.8
pcp-3.10.9-6.el6.x86_64
pcp-system-tools-3.10.9-6.el6.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Run # pcp atop

2. It gives a "Floating point exception" error.


Actual results:
# cat var_log_messages 
Aug 30 10:31:10 lbnss22 kernel: pcp-atop[11688] trap divide error ip:417ef0 sp:7fff003a84a0 error:0 in pcp-atop[400000+34000]
Aug 30 10:33:44 lbnss22 kernel: pcp-atop[19345] trap divide error ip:417ef0 sp:7fffe01ac290 error:0 in pcp-atop[400000+34000]
Aug 30 10:36:21 lbnss22 kernel: pcp-atop[22592] trap divide error ip:417ef0 sp:7fffc225a930 error:0 in pcp-atop[400000+34000]
Aug 30 10:39:18 lbnss22 kernel: pcp-atop[25979] trap divide error ip:417ef0 sp:7ffe930e49d0 error:0 in pcp-atop[400000+34000]
Aug 30 11:20:37 lbnss22 kernel: pcp-atop[16194] trap divide error ip:417ef0 sp:7ffd64137240 error:0 in pcp-atop[400000+34000]
Sep  1 08:23:21 lbnss22 kernel: pcp-atop[13381] trap divide error ip:417ef0 sp:7ffc1f575630 error:0 in pcp-atop[400000+34000]
Sep  1 08:24:07 lbnss22 kernel: pcp-atop[14493] trap divide error ip:417ef0 sp:7fff3f7e5550 error:0 in pcp-atop[400000+34000]
Sep  1 08:24:07 lbnss22 abrt[14506]: Saved core dump of pid 14493 (/usr/libexec/pcp/bin/pcp-atop) to /var/spool/abrt/ccpp-2016-09-01-08:24:07-14493 (1458176 bytes)


Expected results:
No crashes.

Additional info:

Comment 1 Deepu K S 2016-09-06 16:33:21 UTC
Created attachment 1198357 [details]
ABRT captured problem directory (coredump included)

Comment 3 Nathan Scott 2016-09-08 01:43:12 UTC
Hi Deepu,

What does

$ pminfo -f hinv.ncpu

report on this system?  (I'm expecting some kind of error, just curious as to which one)

So far, I've been unable to reproduce the problem locally (with/without pmcd running, with/without pmdalinux running).

Thanks!

Comment 4 Frank Ch. Eigler 2016-09-12 13:49:23 UTC
(In reply to Deepu K S from comment #0)
> Description of problem:
> Process /usr/libexec/pcp/bin/pcp-atop was killed by signal 8 (SIGFPE)
> It looks like pcp-atop crashed due to a divide by zero condition 

Were you able to collect $PCP_DEBUG level traces?
% env PCP_DEBUG=2 pcp atop  2>/tmp/LOGFILE

Comment 5 Deepu K S 2016-09-12 14:37:05 UTC
(In reply to Nathan Scott from comment #3)
> Hi Deepu,
> 
> What does
> 
> $ pminfo -f hinv.ncpu
> 
> report on this system?  (I'm expecting some kind of error, just curious as
> to which one)
> 
> So far, I've been unable to reproduce the problem locally (with/without pmcd
> running, with/without pmdalinux running).
> 
> Thanks!

Sorry for the delay. I now have the output collected.

# pminfo -f hinv.ncpu
hinv.ncpu: pmLookupDesc: No PMCD agent for domain of request

# service pmcd status
Checking for pmcd: running


Output of # env PCP_DEBUG=10  pcp atop 2>pcp-atop.log
is attached.

The crash happens whenever the command is run. It also happens right away.

Most lines from logfile show
  PM_ID_NULL (<noname>): No PMCD agent for domain of request

Comment 6 Deepu K S 2016-09-12 14:38:28 UTC
Created attachment 1200232 [details]
pcp atop log

Comment 7 Frank Ch. Eigler 2016-09-12 14:52:00 UTC
pmFetch returns ...
pmResult dump from 0x83c2e0 timestamp: 1473338279.564024 14:37:59.564 numpmid: 11
  PM_ID_NULL (<noname>): No PMCD agent for domain of request
  PM_ID_NULL (<noname>): No PMCD agent for domain of request
  PM_ID_NULL (<noname>): No PMCD agent for domain of request


Oh, dear.  That suggests that pmdalinux and/or pmdaproc crashed or were taken out of service, and that automatic restarting (if any) was not successful.  (What version of PCP was this?)  A

 # service pmcd restart

should bring them back to life.  It is a bug in pcp-atop that it fails to report the problem and advise the user.

Comment 8 Nathan Scott 2016-09-12 22:44:41 UTC
Thanks Deepu, I understand whats happening now & know how to reproduce, a fix will follow shortly.

Comment 9 Nathan Scott 2016-09-26 03:38:17 UTC
This is fixed in upstream PCP via git commit 7157edb93 and will make its way into the next available RHEL6 PCP update from there.

Comment 12 Miloš Prchlík 2017-01-18 08:26:51 UTC
Verified with build pcp-3.10.9-8.el6.

Comment 14 errata-xmlrpc 2017-03-21 11:20:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0735.html


Note You need to log in before you can comment on or make changes to this bug.