201241 – rrdtool consumes swap until death

Bug 201241 - rrdtool consumes swap until death

Summary: rrdtool consumes swap until death

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	rrdtool
Sub Component:
Version:	6
Hardware:	i386
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jarod Wilson
QA Contact:	Fedora Extras Quality Assurance
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-08-03 19:02 UTC by Andrej Todosic
Modified:	2007-11-30 22:11 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2007-01-30 22:36:26 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
striking difference between rrdtool graphs (121.54 KB, image/jpeg) 2006-08-03 19:02 UTC, Andrej Todosic	no flags	Details
RRD that can be used to reproduce the bug (906 bytes, application/octet-stream) 2006-08-11 16:53 UTC, Jeremias Reith	no flags	Details
STRACE (50.66 KB, application/x-gzip) 2006-11-24 02:31 UTC, Jonathan Breault	no flags	Details
Patch based on upstream changeset r881 and r887 (4.29 KB, patch) 2006-12-14 20:52 UTC, Jarod Wilson	no flags	Details \| Diff
View All

Description Andrej Todosic 2006-08-03 19:02:55 UTC

Description of problem:
When making many graphs, rrdtool eventually stops writing graphs and starts
sucking up memory, until all RAM and swap is consumed and the entire machine
gets wedged

Version-Release number of selected component (if applicable):
rrdtool-1.2.15-3.fc4

How reproducible:
Either using the 'larrd' add-on for BigBrother, or the munin-graph part of munin.

Steps to Reproduce:
1. 'munin-graph --cron'
  
Actual results:
Machine runs out of RAM and swap.  Kernel fault, requires reboot.
Furthermore, the graphs that are drawn are visually very different
(anti-aliased?) from previous versions of rrdtool.

Expected results:
larrd and munin graph .jpg files created, as they were earlier today.

Additional info:
BigBrother was made from a tarball.  It calls rrdtool from bash scripts.
munin is munin-1.2.4-9.fc4, which may call librrd.so from Perl modules.

From the strace:
open("/var/lib/munin/touchtunes.com/kampfgruppe.touchtunes.com-tt_toparse-gen3-g.rrd",
O_RDONLY) = 5
fstat64(5, {st_mode=S_IFREG|0644, st_size=50604, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0xb7f76000
read(5, "RRD\0000001\0\0\0\0/%\300\307C+\37[\1\0\0\0\f\0\0\0,\1"..., 4096) = 4096
_llseek(5, 0, [4096], SEEK_CUR)         = 0
_llseek(5, 36864, [36864], SEEK_SET)    = 0
read(5, "\0\0\370\377\0\0\0\0\0\0\370\377\0\0\0\0\0\0\370\377\0"..., 4096) = 4096
_llseek(5, 32768, [32768], SEEK_SET)    = 0
read(5, "\0\0\370\377\0\0\0\0\0\0\370\377\0\0\0\0\0\0\370\377\0"..., 4096) = 4096
close(5)                                = 0
munmap(0xb7f76000, 4096)                = 0
time(NULL)                              = 1154630928
open("/usr/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf", O_RDONLY) = 5
fcntl64(5, F_SETFD, FD_CLOEXEC)         = 0
fstat64(5, {st_mode=S_IFREG|0644, st_size=60444, ...}) = 0
mmap2(NULL, 60444, PROT_READ, MAP_PRIVATE, 5, 0) = 0xb7d36000
close(5)                                = 0
munmap(0xb7d36000, 60444)               = 0
[above six lines repeated 57 times.]
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1252, ...}) = 0
[above line repeated 60 times]
brk(0xa17c000)                          = 0xa17c000
brk(0xa19d000)                          = 0xa19d000
brk(0xa1be000)                          = 0xa1be000
brk(0xa1df000)                          = 0xa1df000
brk(0xa200000)                          = 0xa200000
brk(0xa221000)                          = 0xa221000
brk(0xa242000)                          = 0xa242000
brk(0xa263000)                          = 0xa263000
brk(0xa284000)                          = 0xa284000
brk(0xa2a5000)                          = 0xa2a5000
[... etc, repeat until entire machine dies]

Comment 1 Andrej Todosic 2006-08-03 19:02:55 UTC

Created attachment 133578 [details]
striking difference between rrdtool graphs

Comment 2 Jarod Wilson 2006-08-03 19:33:47 UTC

I'm kicking this upstream, not a clue what could be going on there with the swap
death.

As for the visual difference between the graphs, that's expected. The art
library changed between 1.0.x and 1.2.x. Not sure what's up with the wonky text
though... Hopefully, upstream has an idea here. I've added you to the cc on the
upstream ticket, which can be viewed here:

http://people.ee.ethz.ch/~oetiker/webtools/rrdtool-trac/ticket/54

Comment 3 Jarod Wilson 2006-08-03 20:50:52 UTC

From Tobi Oetiker, the rrdtool author:

Hi,

can you please try reproducing this problem with rrdtool alone ... using a
short perl script ...

as you describe it, it would indicate a memory leak, but why rrdtool
should suddenly stop working and start eating up all memory is not
explicable to me ...

tobi

Comment 4 Jeremias Reith 2006-08-11 16:53:00 UTC

Created attachment 134039 [details]
RRD that can be used to reproduce the bug

I was able to reproduce the bug on FC6 test2 (x86_64).


Steps to reproduce:

1. put the extracted rrd into the current path

2. Execute the following command

rrdtool graph - \
--logarithmic \
DEF:b="leak.rrd":ReadExecLogPosDiff:AVERAGE \
LINE1:b#837C04:"" > /dev/null


It looks like the --lograrithmic option has something to do with it


Here is a backtrace (got with a SIGKILL):

#0  0x00002aaaaafbc10d in _int_malloc () from /lib64/libc.so.6
#1  0x00002aaaaafbd73d in malloc () from /lib64/libc.so.6
#2  0x0000003366625202 in gfx_new_dashed_line () from /usr/lib64/librrd.so.2
#3  0x0000003366612f22 in horizontal_log_grid () from /usr/lib64/librrd.so.2
#4  0x000000336661561d in grid_paint () from /usr/lib64/librrd.so.2
#5  0x000000336661788e in graph_paint () from /usr/lib64/librrd.so.2
#6  0x0000003366618b9f in rrd_graph () from /usr/lib64/librrd.so.2
#7  0x000000336662c5ef in HandleInputLine () from /usr/lib64/librrd.so.2
#8  0x000000336662c776 in main () from /usr/lib64/librrd.so.2
#9  0x00002aaaaaf6aaa4 in __libc_start_main () from /lib64/libc.so.6
#10 0x0000000000400529 in ?? ()
#11 0x00007fff455f7018 in ?? ()
#12 0x0000000000000000 in ?? ()

Comment 5 Jarod Wilson 2006-08-14 16:56:07 UTC

Latest from Tobi:

-----
Does this help?

--- rrd_graph.c (revision 874)
+++ rrd_graph.c (working copy)
@@ -1721,7 +1721,8 @@
        {1.0, 5.0, 10., 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0},
        {1.0, 2.0, 5.0, 7.0, 10., 0.0, 0.0, 0.0, 0.0, 0.0},
        {1.0, 2.0, 4.0, 6.0, 8.0, 10., 0.0, 0.0, 0.0, 0.0},
-       {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.}};
+       {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.},
+       {0,0,0,0,0, 0,0,0,0,0} /* last line */ };
 
     int i, j, val_exp, min_exp;
     double nex;                /* number of decades in data */
@@ -1730,7 +1731,7 @@
     int mid = -1;      /* row in yloglab for major grid */
     double mspac;      /* smallest major grid spacing (pixels) */
     int flab;          /* first value in yloglab to use */
-    double value, tmp;
+    double value, tmp, pre_value;
     double X0,X1,Y0;   
     char graph_label[100];
 
@@ -1749,11 +1750,11 @@
        mid++;
        for(i = 0; yloglab[mid][i + 1] < 10.0; i++);
        mspac = logscale * log10(10.0 / yloglab[mid][i]);
-    } while(mspac > 2 * im->text_prop[TEXT_PROP_LEGEND].size && mid < 5);
+    } while(mspac > 2 * im->text_prop[TEXT_PROP_LEGEND].size && yloglab[mid][0]
> 0);
     if(mid) mid--;
 
     /* find first value in yloglab */
-    for(flab = 0; frexp10(im->minval, &tmp) > yloglab[mid][flab]; flab++);
+    for(flab = 0; yloglab[mid][flab] < 10 && frexp10(im->minval, &tmp) >
yloglab[mid][flab] ; flab++);
     if(yloglab[mid][flab] == 10.0) {
        tmp += 1.0;
        flab = 0;
@@ -1765,9 +1766,14 @@
     X1=im->xorigin+im->xsize;
 
     /* draw grid */
-    while(1) {
+    pre_value = DNAN;
+    while(1) {       
+
        value = yloglab[mid][flab] * pow(10.0, val_exp);
+        if (  AlmostEqual2sComplement(value,pre_value,4) ) break; /* it seems
we are not converging */
 
+        pre_value = value;
+
        Y0 = ytr(im, value);
        if(Y0 <= im->yorigin - im->ysize) break;
-----

If someone would be kind enough to try to reproduce with this patch applied, it
would be appreciated. If need be, I can make an rpm w/the patch available, but
I'm a bit swamped at the moment.

Comment 6 Jeremias Reith 2006-08-15 16:07:23 UTC

I cannot reproduce the bug anymore. 

This might be related to the latest freetype update on FC6 (2.2.1-4.fc6) as this
is the only change that has something to do with rrdtool.

The provided patch breaks the build of rrdtool-1.2.15-3.fc6:

rrd_graph.c: In function 'horizontal_log_grid':
rrd_graph.c:1771: warning: implicit declaration of function
'AlmostEqual2sComplement'

I will try the current trunk of the svn repo if I get some time for it.

Comment 7 Jeremias Reith 2006-09-05 09:36:30 UTC

I was able to reproduce the bug again. This time on a current RHEL 4 (x86). 

The given fix does not solve the problem (tried the svn snapshot of september 4th).

Is there anything else I can do in order to get this bug fixed?

Comment 8 Jarod Wilson 2006-09-27 17:19:17 UTC

I suspect the actual problem may be with freetype. Jeremias, is there any way
you can try rebuilding the FC6 freetype package on RHEL4, installing the
resulting binraies, and retest?

Comment 9 Jonathan Breault 2006-11-23 14:48:38 UTC

Hi,i am  using a dual cpu 5 gig of ram and lots of swap on a fc6. its killing me . Its seams related with the 
amount of graph you generate at the same time.  i use 8 graph on a page it seam to trigger somthing and 
take all ram/swap until the computer die.

if i use 4 graph on a page its doesnt seam to do it ... heaven then if 2 guy open it up at the same time 
(with different graph or the same ) its kill the computer.


i use FC6 fully updated..

any idea ??? anything that i can provide ?

Comment 10 Jonathan Breault 2006-11-24 02:31:16 UTC

Created attachment 142025 [details]
STRACE 

strace -o test_log /usr/bin/rrdtool graph - --imgformat=PNG --start=1164248318
--end=1164334718 --title="NATS MASTER - teMySQL - Replication" --base=1000
--height=150 --width=600 --alt-autoscale-max --lower-limit=1 --logarithmic
COMMENT:"From 2006/11/22 21\:18\:38 To 2006/11/23 21\:18\:38\c" COMMENT:"  \n"
--vertical-label="" --slope-mode
DEF:a="/usr/share/cacti/rra/nats_master_readexeclogposdiff_179.rrd":ReadExecLogPosDiff:AVERAGE
DEF:b="/usr/share/cacti/rra/nats_master_readexeclogposdiff_179.rrd":SecondsBehindMaster:AVERAGE
DEF:c="/usr/share/cacti/rra/nats_master_readexeclogposdiff_179.rrd":BinlogCacheDiskUse:AVERAGE
DEF:d="/usr/share/cacti/rra/nats_master_readexeclogposdiff_179.rrd":BinlogCacheUse:AVERAGE
CDEF:cdefa=a,1024,/ CDEF:cdefd=c,1024,/ CDEF:cdefe=d,1024,/
AREA:cdefa#EAAF00:"LogPosLag(KB)"  LINE1:cdefa#837C04:"" 
LINE2:b#F51D30:"Seconds Lag"  LINE2:cdefd#4444FF:"Binlog Cache Disk(K)" 
LINE1:cdefe#35962B:"Binlog Cache Use(K)"

Comment 11 Jonathan Breault 2006-11-24 02:33:55 UTC

here is an strace of one of the process killing the computer ... when this process run 1 min later 5 gig of 
ram consumed and iowaits go crazy and swap almost full ...


any idea ??

Comment 12 Jarod Wilson 2006-11-27 14:33:28 UTC

Jonathan, I would suggest adding your information to the trac ticket on the
rrdtool website.

http://people.ee.ethz.ch/~oetiker/webtools/rrdtool-trac/ticket/54

Comment 13 Jarod Wilson 2006-12-12 16:07:40 UTC

Okay, looks like the problem has been tracked down upstream. Tobi says a 1.2.16
release is due Real Soon Now, and will incorporate said fix. I could try to
track down the specific changeset that fixes it, but would rather wait for the
new release if its going to be out imminently...

Comment 14 Jarod Wilson 2006-12-14 20:52:05 UTC

Created attachment 143697 [details]
Patch based on upstream changeset r881 and r887

The attached patch should fix things. I'll get a new rawhide build submitted
momentarily. If folks could test that build (or rebuild it for their distro),
please do. Gimme the thumbs up and I'll spin new packages for fc5 and fc6 (can
probably do fc4 also if there's demand).

Comment 15 Jarod Wilson 2006-12-14 21:21:58 UTC

Not sync'd out to mirrors just yet, but a rawhide build is done. You can grab
the srpm here:

http://buildsys.fedoraproject.org/logs/fedora-development-extras/23763-rrdtool-1.2.15-8.fc7/

Comment 16 Christian Iseli 2007-01-19 23:45:51 UTC

FC3 and FC4 have now been EOL'd.

Please check the ticket against a current Fedora release, and either adjust the
release number, or close it if appropriate.

Thanks.

Your friendly BZ janitor :-)

Comment 17 Jarod Wilson 2007-01-30 22:36:26 UTC

As best as I can tell, this is resolved in rrdtool 1.2.18, which is now in
rawhide and making its way to FC6 shortly.

Note You need to log in before you can comment on or make changes to this bug.