| Summary: | pcp-atop killed by SIGFPE | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Deepu K S <dkochuka> | ||||||
| Component: | pcp | Assignee: | Nathan Scott <nathans> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Miloš Prchlík <mprchlik> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | 6.8 | CC: | brolley, dkochuka, fche, lberk, mbenitez, mcermak, mgoodwin, mprchlik | ||||||
| Target Milestone: | rc | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | x86_64 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-03-21 11:20:54 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Attachments: |
|
||||||||
Created attachment 1198357 [details]
ABRT captured problem directory (coredump included)
Hi Deepu, What does $ pminfo -f hinv.ncpu report on this system? (I'm expecting some kind of error, just curious as to which one) So far, I've been unable to reproduce the problem locally (with/without pmcd running, with/without pmdalinux running). Thanks! (In reply to Deepu K S from comment #0) > Description of problem: > Process /usr/libexec/pcp/bin/pcp-atop was killed by signal 8 (SIGFPE) > It looks like pcp-atop crashed due to a divide by zero condition Were you able to collect $PCP_DEBUG level traces? % env PCP_DEBUG=2 pcp atop 2>/tmp/LOGFILE (In reply to Nathan Scott from comment #3) > Hi Deepu, > > What does > > $ pminfo -f hinv.ncpu > > report on this system? (I'm expecting some kind of error, just curious as > to which one) > > So far, I've been unable to reproduce the problem locally (with/without pmcd > running, with/without pmdalinux running). > > Thanks! Sorry for the delay. I now have the output collected. # pminfo -f hinv.ncpu hinv.ncpu: pmLookupDesc: No PMCD agent for domain of request # service pmcd status Checking for pmcd: running Output of # env PCP_DEBUG=10 pcp atop 2>pcp-atop.log is attached. The crash happens whenever the command is run. It also happens right away. Most lines from logfile show PM_ID_NULL (<noname>): No PMCD agent for domain of request Created attachment 1200232 [details]
pcp atop log
pmFetch returns ... pmResult dump from 0x83c2e0 timestamp: 1473338279.564024 14:37:59.564 numpmid: 11 PM_ID_NULL (<noname>): No PMCD agent for domain of request PM_ID_NULL (<noname>): No PMCD agent for domain of request PM_ID_NULL (<noname>): No PMCD agent for domain of request Oh, dear. That suggests that pmdalinux and/or pmdaproc crashed or were taken out of service, and that automatic restarting (if any) was not successful. (What version of PCP was this?) A # service pmcd restart should bring them back to life. It is a bug in pcp-atop that it fails to report the problem and advise the user. Thanks Deepu, I understand whats happening now & know how to reproduce, a fix will follow shortly. This is fixed in upstream PCP via git commit 7157edb93 and will make its way into the next available RHEL6 PCP update from there. Verified with build pcp-3.10.9-8.el6. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2017-0735.html |
Description of problem: Process /usr/libexec/pcp/bin/pcp-atop was killed by signal 8 (SIGFPE) It looks like pcp-atop crashed due to a divide by zero condition. Core was generated by `/usr/libexec/pcp/bin/pcp-atop'. Program terminated with signal 8, Arithmetic exception. #0 0x0000000000417ef0 in prisyst (sstat=0x83d5e0, curline=2, nsecs=1, avgval=0, fixedhead=0, selp=0x63bd20, highorderp=0x7fff3f7e590e "C", maxcpulines=999, maxdsklines=999, maxmddlines=999, maxlvmlines=999, maxintlines=999, maxnfslines=999, maxcontlines=999) at showlinux.c:1241 1241 extra.percputot = extra.cputot / sstat->cpu.nrcpu; (gdb) bt #0 0x0000000000417ef0 in prisyst (sstat=0x83d5e0, curline=2, nsecs=1, avgval=0, fixedhead=0, selp=0x63bd20, highorderp=0x7fff3f7e590e "C", maxcpulines=999, maxdsklines=999, maxmddlines=999, maxlvmlines=999, maxintlines=999, maxnfslines=999, maxcontlines=999) at showlinux.c:1241 #1 0x00000000004141ef in generic_samp (curtime=1472711047.4024861, delta=1.0000000000000291, sstat=0x83d5e0, tstat=0x83c210, proclist=0x83c350, ndeviat=0, ntask=0, nactproc=0, totproc=0, totrun=0, totslpi=0, totslpu=0, totzomb=0, nexit=0, noverflow=0, flags=1) at showgeneric.c:294 #2 0x000000000040535d in engine () at atop.c:671 #3 0x00000000004056f9 in main (argc=1, argv=<value optimized out>) at atop.c:449 (gdb) f 0 #0 0x0000000000417ef0 in prisyst (sstat=0x83d5e0, curline=2, nsecs=1, avgval=0, fixedhead=0, selp=0x63bd20, highorderp=0x7fff3f7e590e "C", maxcpulines=999, maxdsklines=999, maxmddlines=999, maxlvmlines=999, maxintlines=999, maxnfslines=999, maxcontlines=999) at showlinux.c:1241 1241 extra.percputot = extra.cputot / sstat->cpu.nrcpu; (gdb) l 1236 } 1237 1238 if (extra.cputot == 0) 1239 extra.cputot = 1; /* avoid divide-by-zero */ 1240 1241 extra.percputot = extra.cputot / sstat->cpu.nrcpu; 1242 1243 if (extra.percputot == 0) 1244 extra.percputot = 1; /* avoid divide-by-zero */ 1245 (gdb) p *sstat $1 = {stamp = {tv_sec = 0, tv_usec = 0}, cpu = {nrcpu = 0, devint = 0, csw = 0, nprocs = 0, lavg1 = -1, lavg5 = -1, lavg15 = -1, all = {cpunr = 0, stime = 0, utime = 0, ntime = 0, itime = 0, wtime = 0, Itime = 0, Stime = 0, steal = 0, guest = 0, freqcnt = {maxfreq = 0, cnt = 0, ticks = 0}}, cpu = 0x83dc10}, mem = {physmem = 0, freemem = 0, buffermem = 0, slabmem = 0, cachemem = 0, cachedrt = 0, totswap = 0, freeswap = 0, pgscans = 0, pgsteal = 0, allocstall = 0, swouts = 0, swins = 0, commitlim = 0, committed = 0, shmem = 0, shmrss = 0, shmswp = 0, slabreclaim = 0, tothugepage = 0, freehugepage = 0, hugepagesz = 0, vmwballoon = 0}, net = {ipv4 = {Forwarding = 0, DefaultTTL = 0, InReceives = 0, InHdrErrors = 0, InAddrErrors = 0, ForwDatagrams = 0, InUnknownProtos = 0, InDiscards = 0, InDelivers = 0, OutRequests = 0, OutDiscards = 0, OutNoRoutes = 0, ReasmTimeout = 0, ReasmReqds = 0, ReasmOKs = 0, ReasmFails = 0, FragOKs = 0, FragFails = 0, FragCreates = 0}, icmpv4 = {InMsgs = 0, InErrors = 0, InDestUnreachs = 0, InTimeExcds = 0, InParmProbs = 0, InSrcQuenchs = 0, InRedirects = 0, InEchos = 0, InEchoReps = 0, InTimestamps = 0, InTimestampReps = 0, InAddrMasks = 0, InAddrMaskReps = 0, OutMsgs = 0, OutErrors = 0, OutDestUnreachs = 0, OutTimeExcds = 0, OutParmProbs = 0, OutSrcQuenchs = 0, OutRedirects = 0, OutEchos = 0, OutEchoReps = 0, OutTimestamps = 0, OutTimestampReps = 0, OutAddrMasks = 0, OutAddrMaskReps = 0}, udpv4 = {InDatagrams = 0, NoPorts = 0, InErrors = 0, OutDatagrams = 0}, ipv6 = {Ip6InReceives = 0, Ip6InHdrErrors = 0, Ip6InTooBigErrors = 0, Ip6InNoRoutes = 0, Ip6InAddrErrors = 0, Ip6InUnknownProtos = 0, Ip6InTruncatedPkts = 0, Ip6InDiscards = 0, Ip6InDelivers = 0, Ip6OutForwDatagrams = 0, Ip6OutRequests = 0, Ip6OutDiscards = 0, Ip6OutNoRoutes = 0, Ip6ReasmTimeout = 0, Ip6ReasmReqds = 0, Ip6ReasmOKs = 0, Ip6ReasmFails = 0, Ip6FragOKs = 0, Ip6FragFails = 0, Ip6FragCreates = 0, Ip6InMcastPkts = 0, Ip6OutMcastPkts = 0}, icmpv6 = {Icmp6InMsgs = 0, Icmp6InErrors = 0, Icmp6InDestUnreachs = 0, Icmp6InPktTooBigs = 0, Icmp6InTimeExcds = 0, Icmp6InParmProblems = 0, Icmp6InEchos = 0, Icmp6InEchoReplies = 0, Icmp6InGroupMembQueries = 0, Icmp6InGroupMembResponses = 0, Icmp6InGroupMembReductions = 0, Icmp6InRouterSolicits = 0, Icmp6InRouterAdvertisements = 0, Icmp6InNeighborSolicits = 0, Icmp6InNeighborAdvertisements = 0, Icmp6InRedirects = 0, Icmp6OutMsgs = 0, Icmp6OutDestUnreachs = 0, Icmp6OutPktTooBigs = 0, Icmp6OutTimeExcds = 0, Icmp6OutParmProblems = 0, Icmp6OutEchoReplies = 0, Icmp6OutRouterSolicits = 0, Icmp6OutNeighborSolicits = 0, Icmp6OutNeighborAdvertisements = 0, Icmp6OutRedirects = 0, Icmp6OutGroupMembResponses = 0, Icmp6OutGroupMembReductions = 0}, udpv6 = { Udp6InDatagrams = 0, Udp6NoPorts = 0, Udp6InErrors = 0, Udp6OutDatagrams = 0}, tcp = {RtoAlgorithm = 0, RtoMin = 0, RtoMax = 0, MaxConn = 0, ActiveOpens = 0, PassiveOpens = 0, AttemptFails = 0, EstabResets = 0, CurrEstab = 0, InSegs = 0, OutSegs = 0, RetransSegs = 0, InErrs = 0, OutRsts = 0}}, intf = {nrintf = 0, intf = 0x83dc80}, dsk = {ndsk = 0, nmdd = 0, nlvm = 0, dsk = 0x83dd40, mdd = 0x83de00, lvm = 0x83dda0}, nfs = {server = {netcnt = 0, netudpcnt = 0, nettcpcnt = 0, nettcpcon = 0, rpccnt = 0, rpcbadfmt = 0, rpcbadaut = 0, rpcbadcln = 0, rpcread = 0, rpcwrite = 0, rchits = 0, rcmiss = 0, rcnoca = 0, nrbytes = 0, nwbytes = 0}, client = {rpccnt = 0, rpcretrans = 0, rpcautrefresh = 0, rpcread = 0, rpcwrite = 0}, nrmounts = 0, nfsmnt = 0x83de60}, cfs = {nrcontainer = 0, cont = 0x0}, www = {accesses = 0, totkbytes = 0, uptime = 0, bworkers = 0, iworkers = 0}} (gdb) p sstat->cpu.nrcpu $2 = 0 (gdb) Version-Release number of selected component (if applicable): Red Hat Enterprise Linux 6.8 pcp-3.10.9-6.el6.x86_64 pcp-system-tools-3.10.9-6.el6.x86_64 How reproducible: Always Steps to Reproduce: 1. Run # pcp atop 2. It gives a "Floating point exception" error. Actual results: # cat var_log_messages Aug 30 10:31:10 lbnss22 kernel: pcp-atop[11688] trap divide error ip:417ef0 sp:7fff003a84a0 error:0 in pcp-atop[400000+34000] Aug 30 10:33:44 lbnss22 kernel: pcp-atop[19345] trap divide error ip:417ef0 sp:7fffe01ac290 error:0 in pcp-atop[400000+34000] Aug 30 10:36:21 lbnss22 kernel: pcp-atop[22592] trap divide error ip:417ef0 sp:7fffc225a930 error:0 in pcp-atop[400000+34000] Aug 30 10:39:18 lbnss22 kernel: pcp-atop[25979] trap divide error ip:417ef0 sp:7ffe930e49d0 error:0 in pcp-atop[400000+34000] Aug 30 11:20:37 lbnss22 kernel: pcp-atop[16194] trap divide error ip:417ef0 sp:7ffd64137240 error:0 in pcp-atop[400000+34000] Sep 1 08:23:21 lbnss22 kernel: pcp-atop[13381] trap divide error ip:417ef0 sp:7ffc1f575630 error:0 in pcp-atop[400000+34000] Sep 1 08:24:07 lbnss22 kernel: pcp-atop[14493] trap divide error ip:417ef0 sp:7fff3f7e5550 error:0 in pcp-atop[400000+34000] Sep 1 08:24:07 lbnss22 abrt[14506]: Saved core dump of pid 14493 (/usr/libexec/pcp/bin/pcp-atop) to /var/spool/abrt/ccpp-2016-09-01-08:24:07-14493 (1458176 bytes) Expected results: No crashes. Additional info: