Bug 958745 - pmchart double free when stopping playback
Summary: pmchart double free when stopping playback
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora EPEL
Classification: Fedora
Component: pcp
Version: el6
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Nathan Scott
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 956716 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-05-02 10:42 UTC by Marko Myllynen
Modified: 2014-02-04 21:42 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-04 21:42:49 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
example pcp archive (13.87 KB, application/x-gzip)
2013-05-03 05:53 UTC, Marko Myllynen
no flags Details
stap trace (5.50 MB, application/x-bzip2)
2013-05-19 15:22 UTC, Marko Myllynen
no flags Details
archive script + CSVs (8.11 KB, application/x-bzip2)
2013-05-19 15:31 UTC, Marko Myllynen
no flags Details

Description Marko Myllynen 2013-05-02 10:42:37 UTC
Description of problem:
When playing an archive and stopping playback with pmchart a double free occurred:

localhost> pmchart -t 1m -a archive
*** glibc detected *** pmchart: double free or corruption (out): 0x0000000001383020 ***
======= Backtrace: =========
/lib64/libc.so.6[0x36eae75916]
/lib64/libc.so.6[0x36eae78443]
/usr/lib64/libpcp.so.3(__pmFreeResultValues+0x151)[0x324b00ec51]
/usr/lib64/libpcp.so.3(pmFreeResult+0x4c)[0x324b00ed0c]
/usr/lib64/libpcp.so.3[0x324b02a02a]
/usr/lib64/libpcp.so.3(__pmLogFetchInterp+0x1d85)[0x324b02cf55]
/usr/lib64/libpcp.so.3(__pmLogFetch+0x6ad)[0x324b0280dd]
/usr/lib64/libpcp.so.3(pmFetch+0xc9)[0x324b00e589]
pmchart[0x498751]
pmchart[0x488712]
pmchart[0x476c7d]
pmchart[0x47458e]
pmchart[0x474e24]
pmchart[0x487b7a]
/usr/lib64/libQtCore.so.4(_ZN11QMetaObject8activateEP7QObjectPKS_iPPv+0x32f)[0x36f1f6a50f]
/usr/lib64/libQtNetwork.so.4[0x36f3a9ec4f]
/usr/lib64/libQtNetwork.so.4[0x36f3a8f321]
/usr/lib64/libQtGui.so.4(_ZN19QApplicationPrivate13notify_helperEP7QObjectP6QEvent+0xac)[0x36f25aaa0c]
/usr/lib64/libQtGui.so.4(_ZN12QApplication6notifyEP7QObjectP6QEvent+0x14b)[0x36f25b09db]
/usr/lib64/libQtCore.so.4(_ZN16QCoreApplication14notifyInternalEP7QObjectP6QEvent+0x8c)[0x36f1f579dc]
/usr/lib64/libQtCore.so.4[0x36f1f7d66a]
/lib64/libglib-2.0.so.0(g_main_context_dispatch+0x22e)[0x36ec638f0e]
/lib64/libglib-2.0.so.0[0x36ec63c938]
/lib64/libglib-2.0.so.0(g_main_context_iteration+0x7a)[0x36ec63ca3a]
/usr/lib64/libQtCore.so.4(_ZN20QEventDispatcherGlib13processEventsE6QFlagsIN10QEventLoop17ProcessEventsFlagEE+0x73)[0x36f1f7d2b3]
/usr/lib64/libQtGui.so.4[0x36f264990e]
/usr/lib64/libQtCore.so.4(_ZN10QEventLoop13processEventsE6QFlagsINS_17ProcessEventsFlagEE+0x32)[0x36f1f56422]
/usr/lib64/libQtCore.so.4(_ZN10QEventLoop4execE6QFlagsINS_17ProcessEventsFlagEE+0xdc)[0x36f1f566ec]
/usr/lib64/libQtCore.so.4(_ZN16QCoreApplication4execEv+0xb9)[0x36f1f589d9]
pmchart[0x430b3e]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x36eae1ecdd]
pmchart[0x421f09]
======= Memory map: ========
00400000-00535000 r-xp 00000000 fd:01 169134                             /usr/bin/pmchart
00734000-0073b000 rw-p 00134000 fd:01 169134                             /usr/bin/pmchart
0073b000-0073e000 rw-p 00000000 00:00 0 
0093a000-00959000 rw-p 0013a000 fd:01 169134                             /usr/bin/pmchart
00e19000-013e9000 rw-p 00000000 00:00 0                                  [heap]
324b000000-324b04d000 r-xp 00000000 fd:01 169065                         /usr/lib64/libpcp.so.3
324b04d000-324b24d000 ---p 0004d000 fd:01 169065                         /usr/lib64/libpcp.so.3
324b24d000-324b24f000 rw-p 0004d000 fd:01 169065                         /usr/lib64/libpcp.so.3
324b24f000-324b252000 rw-p 00000000 00:00 0 
3306000000-3306459000 r-xp 00000000 fd:01 144245                         /usr/lib64/libgtk-x11-2.0.so.0.1800.9
3306459000-3306658000 ---p 00459000 fd:01 144245                         /usr/lib64/libgtk-x11-2.0.so.0.1800.9
3306658000-3306663000 rw-p 00458000 fd:01 144245                         /usr/lib64/libgtk-x11-2.0.so.0.1800.9
3306663000-3306665000 rw-p 00000000 00:00 0 
3306800000-33068b6000 r-xp 00000000 fd:01 135288                         /usr/lib64/libgdk-x11-2.0.so.0.1800.9
33068b6000-3306ab6000 ---p 000b6000 fd:01 135288                         /usr/lib64/libgdk-x11-2.0.so.0.1800.9
3306ab6000-3306abb000 rw-p 000b6000 fd:01 135288                         /usr/lib64/libgdk-x11-2.0.so.0.1800.9
3306c00000-3306c48000 r-xp 00000000 fd:01 133638                         /usr/lib64/libpango-1.0.so.0.2800.1
3306c48000-3306e48000 ---p 00048000 fd:01 133638                         /usr/lib64/libpango-1.0.so.0.2800.1
3306e48000-3306e4b000 rw-p 00048000 fd:01 133638                         /usr/lib64/libpango-1.0.so.0.2800.1
3307000000-3307029000 r-xp 00000000 fd:01 135286                         /usr/lib64/libpangoft2-1.0.so.0.2800.1
3307029000-3307228000 ---p 00029000 fd:01 135286                         /usr/lib64/libpangoft2-1.0.so.0.2800.1
3307228000-330722a000 rw-p 00028000 fd:01 135286                         /usr/lib64/libpangoft2-1.0.so.0.2800.1
3307400000-330740b000 r-xp 00000000 fd:01 135290                         /usr/lib64/libpangocairo-1.0.so.0.2800.1
330740b000-330760b000 ---p 0000b000 fd:01 135290                         /usr/lib64/libpangocairo-1.0.so.0.2800.1
330760b000-330760c000 rw-p 0000b000 fd:01 135290                         /usr/lib64/libpangocairo-1.0.so.0.2800.1
3307800000-3307804000 r-xp 00000000 fd:01 163044                         /usr/lib64/libcanberra-gtk.so.0.1.5
3307804000-3307a03000 ---p 00004000 fd:01 163044                         /usr/lib64/libcanberra-gtk.so.0.1.5
3307a03000-3307a04000 rw-p 00003000 fd:01 163044                         /usr/lib64/libcanberra-gtk.so.0.1.5
3308000000-3308007000 r-xp 00000000 fd:01 137308                         /usr/lib64/libgailutil.so.18.0.1
3308007000-3308206000 ---p 00007000 fd:01 137308                         /usr/lib64/libgailutil.so.18.0.1
3308206000-3308207000 rw-p 00006000 fd:01 137308                         /usr/lib64/libgailutil.so.18.0.1
3308400000-3308432000 r-xp 00000000 fd:01 163193                         /usr/lib64/libgnomecanvas-2.so.0.2600.0
3308432000-3308632000 ---p 00032000 fd:01 163193                         /usr/lib64/libgnomecanvas-2.so.0.2600.0
3308632000-3308634000 rw-p 00032000 fd:01 163193                         /usr/lib64/libgnomecanvas-2.so.0.2600.0
3653a00000-3653b48000 r-xp 00000000 fd:01 163109                         /usr/lib64/libxml2.so.2.7.6
3653b48000-3653d47000 ---p 00148000 fd:01 163109                         /usr/lib64/libxml2.so.2.7.6
3653d47000-3653d51000 rw-p 00147000 fd:01 163109                         /usr/lib64/libxml2.so.2.7.6
3653d51000-3653d52000 rw-p 00000000 00:00 0 
3653e00000-3653e67000 r-xp 00000000 fd:01 164814                         /usr/lib64/libgnomevfs-2.so.0.2400.2
3653e67000-3654066000 ---p 00067000 fd:01 164814                         /usr/lib64/libgnomevfs-2.so.0.2400.2
3654066000-365406b000 rw-p 00066000 fd:01 164814                         /usr/lib64/libgnomevfs-2.so.0.2400.2
3654200000-365426d000 r-xp 00000000 fd:01 164858                         /usr/lib64/libbonoboui-2.so.0.0.0
365426d000-365446c000 ---p 0006d000 fd:01 164858                         /usr/lib64/libbonoboui-2.so.0.0.0
365446c000-3654471000 rw-p 0006c000 fd:01 164858                         /usr/lib64/libbonoboui-2.so.0.0.0
3654600000-3654615000 r-xp 00000000 fd:01 164856                         /usr/lib64/libgnome-2.so.0.2800.0
3654615000-3654815000 ---p 00015000 fd:01 164856                         /usr/lib64/libgnome-2.so.0.2800.0
3654815000-3654817000 rw-p 00015000 fd:01 164856                         /usr/lib64/libgnome-2.so.0.2800.0
3655600000-3655699000 r-xp 00000000 fd:01 164874                         /usr/lib64/libgnomeui-2.so.0.2400.1
3655699000-3655898000 ---p 00099000 fd:01 164874                         /usr/lib64/libgnomeui-2.so.0.2400.1
3655898000-365589e000 rw-p 00098000 fd:01 164874                         /usr/lib64/libgnomeui-2.so.0.2400.1
36eaa00000-36eaa20000 r-xp 00000000 fd:01 7000                           /lib64/ld-2.12.so
36eac1f000-36eac20000 r--p 0001f000 fd:01 7000                           /lib64/ld-2.12.so
36eac20000-36eac21000 rw-p 00020000 fd:01 7000                           /lib64/ld-2.12.so
36eac21000-36eac22000 rw-p 00000000 00:00 0 
36eae00000-36eaf89000 r-xp 00000000 fd:01 7001                           /lib64/libc-2.12.so
36eaf89000-36eb189000 ---p 00189000 fd:01 7001                           /lib64/libc-2.12.so
36eb189000-36eb18d000 r--p 00189000 fd:01 7001                           /lib64/libc-2.12.so
36eb18d000-36eb18e000 rw-p 0018d000 fd:01 7001                           /lib64/libc-2.12.so
zsh: abort      pmchart -t 1m -a archive

Version-Release number of selected component (if applicable):
pcp-3.6.10-2.el6.x86_64

Comment 1 Nathan Scott 2013-05-02 23:09:27 UTC
Thanks Marko.  You mentioned this was 100% reproducible on IRC?  Can you attach the archive and add a reproducer recipe (which metrics, archive offsets, etc).

Comment 2 Marko Myllynen 2013-05-03 05:53:52 UTC
Created attachment 743045 [details]
example pcp archive

With the attached archive this can be reproduced as follows:

1. pmchart -t 1m -a gfs
2. Add both the available metrics as their own charts
3. Start playback from Time Controls, let it run for few seconds
4. Press Stop

Comment 3 Ken McDonell 2013-05-16 07:19:40 UTC
We should not rule out the possibility that libpcp may be a victim here, rather than the cuplrit

1. valgrind reports lots of bad looking errors from Qt and pmchart before the failure

2. $ valgrind pmdumptext -t 1m -a gfs gfs.inodes gfs.fmb

runs to completion with no errors reported by valgrind.

3. And in another test,

$ pmchart -Dall -t 1m -a gfs

dies with a segv as soon as the first metric is selected then Apply ... although this is also deep inside __pmLogFetchInterp() like the original case.

Comment 4 Nathan Scott 2013-05-17 09:05:54 UTC
Have traced this back to fairly close to the root cause now, discussing with the author of the code in question regarding how best to fix it.

Comment 5 Ken McDonell 2013-05-17 20:54:15 UTC
Ignore my case 3. above ... the segv is due to a botch in the interp.c diagnostic code that is exposed with -Dall at the start of the archive.

I have a fix for this.

No closer yet to a diagnosis for the original problem.

Comment 6 Frank Ch. Eigler 2013-05-18 15:17:57 UTC
(I can't reproduce this on fedora18, pcp3.8, pcp-gui 1.5.8.)

Comment 7 Frank Ch. Eigler 2013-05-18 15:29:04 UTC
(To get a sense of what's going on in pcpland, consider using systemtap:

# yum install systemtap pcp-debuginfo
# stap /usr/share/doc/systemtap*/examples/*/para-callgraph.stp 'process("/usr/lib/libpcp.so.3").function("*")'

Comment 8 Frank Ch. Eigler 2013-05-18 19:04:10 UTC
(I can't reproduce this on rhel6, epel pcp3.8, epel pcp-gui 1.5.6.  Valgrind only complains about what seems like an off-by-one error in libqwt:

==31172== Invalid read of size 8
==31172==    at 0x8114BE6: QwtCPointerData::sample(unsigned int) const (qpoint.h:247)
==31172==    by 0x80BBA87: SamplingCurve::drawSeries(QPainter*, QwtScaleMap const&, QwtScaleMap const&, QRectF const&, int, int) const (qwt_plot_seriesitem.h:135)
==31172==    by 0x810B04D: QwtPlotAbstractSeriesItem::draw(QPainter*, QwtScaleMap const&, QwtScaleMap const&, QRectF const&) const (qwt_plot_seriesitem.cpp:88)
==31172==    by 0x80FCC22: QwtPlot::drawItems(QPainter*, QRectF const&, QwtScaleMap const*) const (qwt_plot.cpp:517)
==31172==    by 0x80FCDB3: QwtPlot::drawCanvas(QPainter*) (qwt_plot.cpp:490)
==31172==    by 0x811106D: QwtPlotCanvas::drawCanvas(QPainter*, bool) (qwt_plot_canvas.cpp:893)
==31172==    by 0x81117FE: QwtPlotCanvas::paintEvent(QPaintEvent*) (qwt_plot_canvas.cpp:754)
==31172==    by 0x4B46CEA: QWidget::event(QEvent*) (qwidget.cpp:8144)
==31172==    by 0x4F140D3: QFrame::event(QEvent*) (qframe.cpp:557)
==31172==    by 0x81104CA: QwtPlotCanvas::event(QEvent*) (qwt_plot_canvas.cpp:712)
==31172==    by 0x4AF3DDB: QApplicationPrivate::notify_helper(QObject*, QEvent*) (qapplication.cpp:4300)
==31172==    by 0x4AFA57C: QApplication::notify(QObject*, QEvent*) (qapplication.cpp:4265)
==31172==  Address 0x8bfcb90 is 0 bytes after a block of size 2,344 alloc'd
==31172==    at 0x4007282: malloc (vg_replace_malloc.c:270)
==31172==    by 0x80B8E08: GroupControl::init(int, int, timeval*, timeval*) (groupcontrol.cpp:55)
==31172==    by 0x80707CE: main (main.cpp:752)

Comment 9 Ken McDonell 2013-05-18 21:10:53 UTC
A bit more about this archive (which is unusual in one important respect):

The archive span is almost 4 days:
$ pmdumplog -L gfs
Log Label (Log Format Version 2)
Performance metrics from host mmyllyne.csb
  commencing Thu Apr 18 20:40:00.000 2013
  ending     Mon Apr 22 20:00:00.000 2013

and there are samples every 5 minutes
$ pmdumplog gfs | head -5

20:40:00.000  245.0.1 (gfs.fmb): value 3712689

20:45:00.000  245.0.1 (gfs.fmb): value 3712699

$ pmdumplog gfs | grep '^[0-2]' | wc -l
1146

1146 = 4 * 24 * 12 - 8 (for the 40 min diff between the minutes at the start and end)

and there are no mark records so this is from one continuous pmlogger execution
$ pmdumplog gfs | grep mark

and gfs.fmb appears in every sample but one, while gfs.inodes appears in only one sample
$ pmdumplog gfs gfs.fmb | grep '^[0-2]' | wc -l
1145
$ pmdumplog gfs gfs.inodes | grep '^[0-2]' | wc -l
1

and gfs.inodes is only at the very end
$ pmdumplog gfs | tail -5
19:55:00.000  245.0.1 (gfs.fmb): value 3757337

20:00:00.000  245.0.1 (gfs.fmb): value 3757320

20:00:00.000  245.0.2 (gfs.inodes): value 103860

This does not explain the bug.  But gfs.inodes will _always_ be interpolated to have the value 103860 with a t_prior of -1 and no v_prior and t_next will be the last 20:00:00 timestamp from the archive and v_next will be 103860.

This is sufficiently unusual that it does help explain why this has not been reported before.

So Marko could you please add to the bug explaining how the gfs archive was created?

I'll keep digging ...

Comment 10 Nathan Scott 2013-05-19 03:57:09 UTC
Ken - it will be an archive created through pmlogimport (Perl API, which wraps the C API).  Interesting that you found that too, in another now-possibly-related bug I got to the point of wondering if the archive (which is probably the same archive) was corrupt or created not-quite-right by log import (somehow).

That other bug is 956716.  Marko, can you provide us the script and csv data for just this archive (appropriately anonymised if its customer data - ISTR it was from your own or a lab system though).

Ken - note also the very wierd behaviour with this archive with pmdumplog.  Try both forwards, and then backwards (-r) and note how the names/pmidsseem to spontaneously change!?!  Still trying to get to the bottom of that behaviour, but maybe we need to be looking closely at exactly what pmimport is doing when its writing the metadata? - its a common theme.


nathans@smash:~/gfs$ pmdumplog -a -r gfs | tail
20:50:00.000  245.0.2 (gfs.inodes): value 102799

[52 bytes]
20:45:00.000  245.0.2 (gfs.inodes): value 102798

[52 bytes]
20:40:00.000  245.0.2 (gfs.inodes): value 102808

[52 bytes]
20:40:00.000  245.0.1 (gfs.fmb): value 3712689

nathans@smash:~/gfs$ pmdumplog -a gfs | tail
19:50:00.000  245.0.1 (gfs.fmb): value 3757334

[52 bytes]
19:55:00.000  245.0.1 (gfs.fmb): value 3757337

[52 bytes]
20:00:00.000  245.0.1 (gfs.fmb): value 3757320

[52 bytes]
20:00:00.000  245.0.2 (gfs.inodes): value 103860

Comment 11 Marko Myllynen 2013-05-19 15:22:43 UTC
Created attachment 750139 [details]
stap trace

Frank, our package versions are slightly different (perhaps you're also on 32-bit given the path in your stap example), I'm using:

pcp-3.8.0-1.el6.x86_64
pcp-gui-1.5.8-2.el6.x86_64

I'm attaching the stap output (+ backtrace dumped on terminal at the end of the file). The stap output weights no less than 1.62M lines.

Comment 12 Marko Myllynen 2013-05-19 15:31:19 UTC
Created attachment 750140 [details]
archive script + CSVs

Ken, Nathan,

thanks for your insights - I had actually already deleted the script since it was a test script used with test data from a test lab :) But I rewrote it from memory and by using it with the CSV files included in the attachment I can still reproduce the crash.

The script is obviously not doing what I was expecting but based on your comment 9 and comment 10 it's bit unclear to me is this merely a bug in the script or might there be perhaps something not-quite-right with pmlogimport?

Comment 13 Ken McDonell 2013-05-19 21:14:17 UTC
OK ... problem found!

The script appears to _append_ pmResults for gfs.fmb to the output archive after adding pmResults for gfs.inodes and the time ranges for the two spreadsheets are the same.  This means the time goes backwards in the archive and then repeats all over again.

Checking this size of the archive ... gfs.0 contains 119212 bytes and the pmResults records are 52 bytes long (correct size for one one metric with one double value), so (119212-132)/52 = 2290, which is exactly TWICE the number of samples we expect.

Because of the way pmFetchArchive and the lower level log read routines work, we do not detect the archive is botched and this _exactly_ explains the schizo behavior Nathan reported for pmdumplog forwards vs backwards.

The perl script needs to process the two .csv files in parallel, matching up the timestamps, then adding _both_ metrics to the one pmResult.

We need a pmlogcheck utility to detect this sort of botch.

Can we please have permission to include the gfs.* archive in the PCP QA suite?

Comment 14 Nathan Scott 2013-05-19 22:30:24 UTC
> OK ... problem found!

Well done!  Yes, it all makes sense.

> We need a pmlogcheck utility to detect this sort of botch.

Let me know what you plan to fix and if you want to throw anything over the fence for me to do, lemme know.

> Can we please have permission to include the gfs.* archive in the PCP QA suite?

Its publicly available on bugzilla, and not customer data as Marko mentioned - so I'm hopeful he wont have any issues with it being used for automated testing.

Comment 15 Marko Myllynen 2013-05-20 07:34:20 UTC
> Can we please have permission to include the gfs.* archive in the PCP QA
> suite?

Yes, please go ahead.

Comment 16 Marko Myllynen 2013-05-27 05:42:53 UTC
*** Bug 956716 has been marked as a duplicate of this bug. ***

Comment 17 Fedora Update System 2013-06-19 07:40:31 UTC
pcp-3.8.1-1.el6 has been submitted as an update for Fedora EPEL 6.
https://admin.fedoraproject.org/updates/pcp-3.8.1-1.el6

Comment 18 Fedora Update System 2013-07-04 20:02:12 UTC
pcp-3.8.1-1.el6 has been pushed to the Fedora EPEL 6 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 19 Nathan Scott 2014-02-04 21:42:49 UTC
Closing out manually - issue has been long since fixed.


Note You need to log in before you can comment on or make changes to this bug.