Bug 1202934 - pcp 3.10.3 pmmgr/pmlogconf crashes older remote pmcd servers
Summary: pcp 3.10.3 pmmgr/pmlogconf crashes older remote pmcd servers
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: pcp
Version: 21
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nathan Scott
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-03-17 17:56 UTC by Frank Ch. Eigler
Modified: 2015-12-02 17:41 UTC (History)
8 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2015-12-02 10:12:00 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Frank Ch. Eigler 2015-03-17 17:56:53 UTC
A RH machine running f20 pcp 3.10.3 from fedora/koji monitors nearby pcp servers via pmmgr (and thus pmlogconf/pmlogger).  It appears that something in the connection protocol has recently changed, in that old pmcd's (such as rhel7's 3.9.10, rhel6's 3.9.4) die.

(It may matter that these pcp installations may also have been running pcpqa, so in some cases their pmcd.conf included -T3.)

The symptom of the problem is a bunch of these remote pmcds just dying within minutes of the upgrade to the pmmgr-hosting pmcd server.  One symptom is the affected pmcd.log containing a bunch of lines of this form (once per minute - pmmgr poll interval):

Error: ClientLoop: error sending Conn ACK PDU to new client IPC protocol failure

... but no explicit "exiting" type message.  The log just stops with pmcd dead.  Many of these messages predate the 3.10.3 pcp update, so the 3.10.2 client code was apparently triggering it also.


On one affected machine (rhel7 3.9.10), there is a more interesting item in the kernel logs:
    traps: pmcd[...] trap divide error [...] in libpcp_pmda.so.3
which on a debugger indicates:

Program received signal SIGFPE, Arithmetic exception.
0x00002aaaae114603 in pmdaTreeName (pmns=0x555555782190, pmid=251709451, 
    nameset=0x7fffffffe338) at tree.c:147
147         hashchain = pmns->htab[pmid % pmns->htabsize];
(gdb) bt
#0  0x00002aaaae114603 in pmdaTreeName (pmns=0x555555782190, pmid=251709451, 
    nameset=0x7fffffffe338) at tree.c:147
#1  0x00005555555630d2 in DoPMNSIDs (cp=cp@entry=0x5555557a2790, 
    pb=0x5555557b4000) at dopdus.c:392
#2  0x000055555555b4a9 in HandleClientInput (
    fdsPtr=fdsPtr@entry=0x7fffffffe420) at pmcd.c:350
#3  0x000055555555a2e5 in ClientLoop () at pmcd.c:697
#4  main (argc=<optimized out>, argv=<optimized out>) at pmcd.c:887

where pmns->htabsize == 0.


"valgrind pmcd -f -T3" on an affected rhel6 box (running pcp 3.9.4) indicates an eventual crash probably in the same spot.  a "pmcd -Dall" trace suggests this was the last packet received before death:

secureconnect.c:__pmRecv[secure](1026, ..., 12, 0) -> 12
pduread(1026, ...): have 12, last read 12, still need 0
[29777]pmGetPDU: PMNS_IDS fd=1026 len=24 from=0
000:       18     700d        0        0  1000000  ac8000f 
     fd  client connection from                    ipc ver  operations denied
     ==  ========================================  =======  =================
    1026  tofan.yyz.redhat.com                            2  store 

__pmDecodeIDList
IDlist dump: numids = 1
  PMID[0]: 0x0f00c80a 60.50.10

(which happens to be kernel.percpu.interrupts.MCE, which one can normally read correctly, even on the affected machines.  I assume some prior memory corruption happened.  I have a complete 70MBish pmcd.log for this one.)

Comment 2 Fedora End Of Life 2015-11-04 13:06:45 UTC
This message is a reminder that Fedora 21 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 21. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '21'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 21 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 3 Fedora End Of Life 2015-12-02 10:12:05 UTC
Fedora 21 changed to end-of-life (EOL) status on 2015-12-01. Fedora 21 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.