Description of problem: When making many graphs, rrdtool eventually stops writing graphs and starts sucking up memory, until all RAM and swap is consumed and the entire machine gets wedged Version-Release number of selected component (if applicable): rrdtool-1.2.15-3.fc4 How reproducible: Either using the 'larrd' add-on for BigBrother, or the munin-graph part of munin. Steps to Reproduce: 1. 'munin-graph --cron' Actual results: Machine runs out of RAM and swap. Kernel fault, requires reboot. Furthermore, the graphs that are drawn are visually very different (anti-aliased?) from previous versions of rrdtool. Expected results: larrd and munin graph .jpg files created, as they were earlier today. Additional info: BigBrother was made from a tarball. It calls rrdtool from bash scripts. munin is munin-1.2.4-9.fc4, which may call librrd.so from Perl modules. From the strace: open("/var/lib/munin/touchtunes.com/kampfgruppe.touchtunes.com-tt_toparse-gen3-g.rrd", O_RDONLY) = 5 fstat64(5, {st_mode=S_IFREG|0644, st_size=50604, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f76000 read(5, "RRD\0000001\0\0\0\0/%\300\307C+\37[\1\0\0\0\f\0\0\0,\1"..., 4096) = 4096 _llseek(5, 0, [4096], SEEK_CUR) = 0 _llseek(5, 36864, [36864], SEEK_SET) = 0 read(5, "\0\0\370\377\0\0\0\0\0\0\370\377\0\0\0\0\0\0\370\377\0"..., 4096) = 4096 _llseek(5, 32768, [32768], SEEK_SET) = 0 read(5, "\0\0\370\377\0\0\0\0\0\0\370\377\0\0\0\0\0\0\370\377\0"..., 4096) = 4096 close(5) = 0 munmap(0xb7f76000, 4096) = 0 time(NULL) = 1154630928 open("/usr/share/rrdtool/fonts/DejaVuSansMono-Roman.ttf", O_RDONLY) = 5 fcntl64(5, F_SETFD, FD_CLOEXEC) = 0 fstat64(5, {st_mode=S_IFREG|0644, st_size=60444, ...}) = 0 mmap2(NULL, 60444, PROT_READ, MAP_PRIVATE, 5, 0) = 0xb7d36000 close(5) = 0 munmap(0xb7d36000, 60444) = 0 [above six lines repeated 57 times.] stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1252, ...}) = 0 [above line repeated 60 times] brk(0xa17c000) = 0xa17c000 brk(0xa19d000) = 0xa19d000 brk(0xa1be000) = 0xa1be000 brk(0xa1df000) = 0xa1df000 brk(0xa200000) = 0xa200000 brk(0xa221000) = 0xa221000 brk(0xa242000) = 0xa242000 brk(0xa263000) = 0xa263000 brk(0xa284000) = 0xa284000 brk(0xa2a5000) = 0xa2a5000 [... etc, repeat until entire machine dies]
Created attachment 133578 [details] striking difference between rrdtool graphs
I'm kicking this upstream, not a clue what could be going on there with the swap death. As for the visual difference between the graphs, that's expected. The art library changed between 1.0.x and 1.2.x. Not sure what's up with the wonky text though... Hopefully, upstream has an idea here. I've added you to the cc on the upstream ticket, which can be viewed here: http://people.ee.ethz.ch/~oetiker/webtools/rrdtool-trac/ticket/54
From Tobi Oetiker, the rrdtool author: Hi, can you please try reproducing this problem with rrdtool alone ... using a short perl script ... as you describe it, it would indicate a memory leak, but why rrdtool should suddenly stop working and start eating up all memory is not explicable to me ... tobi
Created attachment 134039 [details] RRD that can be used to reproduce the bug I was able to reproduce the bug on FC6 test2 (x86_64). Steps to reproduce: 1. put the extracted rrd into the current path 2. Execute the following command rrdtool graph - \ --logarithmic \ DEF:b="leak.rrd":ReadExecLogPosDiff:AVERAGE \ LINE1:b#837C04:"" > /dev/null It looks like the --lograrithmic option has something to do with it Here is a backtrace (got with a SIGKILL): #0 0x00002aaaaafbc10d in _int_malloc () from /lib64/libc.so.6 #1 0x00002aaaaafbd73d in malloc () from /lib64/libc.so.6 #2 0x0000003366625202 in gfx_new_dashed_line () from /usr/lib64/librrd.so.2 #3 0x0000003366612f22 in horizontal_log_grid () from /usr/lib64/librrd.so.2 #4 0x000000336661561d in grid_paint () from /usr/lib64/librrd.so.2 #5 0x000000336661788e in graph_paint () from /usr/lib64/librrd.so.2 #6 0x0000003366618b9f in rrd_graph () from /usr/lib64/librrd.so.2 #7 0x000000336662c5ef in HandleInputLine () from /usr/lib64/librrd.so.2 #8 0x000000336662c776 in main () from /usr/lib64/librrd.so.2 #9 0x00002aaaaaf6aaa4 in __libc_start_main () from /lib64/libc.so.6 #10 0x0000000000400529 in ?? () #11 0x00007fff455f7018 in ?? () #12 0x0000000000000000 in ?? ()
Latest from Tobi: ----- Does this help? --- rrd_graph.c (revision 874) +++ rrd_graph.c (working copy) @@ -1721,7 +1721,8 @@ {1.0, 5.0, 10., 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0}, {1.0, 2.0, 5.0, 7.0, 10., 0.0, 0.0, 0.0, 0.0, 0.0}, {1.0, 2.0, 4.0, 6.0, 8.0, 10., 0.0, 0.0, 0.0, 0.0}, - {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.}}; + {1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.}, + {0,0,0,0,0, 0,0,0,0,0} /* last line */ }; int i, j, val_exp, min_exp; double nex; /* number of decades in data */ @@ -1730,7 +1731,7 @@ int mid = -1; /* row in yloglab for major grid */ double mspac; /* smallest major grid spacing (pixels) */ int flab; /* first value in yloglab to use */ - double value, tmp; + double value, tmp, pre_value; double X0,X1,Y0; char graph_label[100]; @@ -1749,11 +1750,11 @@ mid++; for(i = 0; yloglab[mid][i + 1] < 10.0; i++); mspac = logscale * log10(10.0 / yloglab[mid][i]); - } while(mspac > 2 * im->text_prop[TEXT_PROP_LEGEND].size && mid < 5); + } while(mspac > 2 * im->text_prop[TEXT_PROP_LEGEND].size && yloglab[mid][0] > 0); if(mid) mid--; /* find first value in yloglab */ - for(flab = 0; frexp10(im->minval, &tmp) > yloglab[mid][flab]; flab++); + for(flab = 0; yloglab[mid][flab] < 10 && frexp10(im->minval, &tmp) > yloglab[mid][flab] ; flab++); if(yloglab[mid][flab] == 10.0) { tmp += 1.0; flab = 0; @@ -1765,9 +1766,14 @@ X1=im->xorigin+im->xsize; /* draw grid */ - while(1) { + pre_value = DNAN; + while(1) { + value = yloglab[mid][flab] * pow(10.0, val_exp); + if ( AlmostEqual2sComplement(value,pre_value,4) ) break; /* it seems we are not converging */ + pre_value = value; + Y0 = ytr(im, value); if(Y0 <= im->yorigin - im->ysize) break; ----- If someone would be kind enough to try to reproduce with this patch applied, it would be appreciated. If need be, I can make an rpm w/the patch available, but I'm a bit swamped at the moment.
I cannot reproduce the bug anymore. This might be related to the latest freetype update on FC6 (2.2.1-4.fc6) as this is the only change that has something to do with rrdtool. The provided patch breaks the build of rrdtool-1.2.15-3.fc6: rrd_graph.c: In function 'horizontal_log_grid': rrd_graph.c:1771: warning: implicit declaration of function 'AlmostEqual2sComplement' I will try the current trunk of the svn repo if I get some time for it.
I was able to reproduce the bug again. This time on a current RHEL 4 (x86). The given fix does not solve the problem (tried the svn snapshot of september 4th). Is there anything else I can do in order to get this bug fixed?
I suspect the actual problem may be with freetype. Jeremias, is there any way you can try rebuilding the FC6 freetype package on RHEL4, installing the resulting binraies, and retest?
Hi,i am using a dual cpu 5 gig of ram and lots of swap on a fc6. its killing me . Its seams related with the amount of graph you generate at the same time. i use 8 graph on a page it seam to trigger somthing and take all ram/swap until the computer die. if i use 4 graph on a page its doesnt seam to do it ... heaven then if 2 guy open it up at the same time (with different graph or the same ) its kill the computer. i use FC6 fully updated.. any idea ??? anything that i can provide ?
Created attachment 142025 [details] STRACE strace -o test_log /usr/bin/rrdtool graph - --imgformat=PNG --start=1164248318 --end=1164334718 --title="NATS MASTER - teMySQL - Replication" --base=1000 --height=150 --width=600 --alt-autoscale-max --lower-limit=1 --logarithmic COMMENT:"From 2006/11/22 21\:18\:38 To 2006/11/23 21\:18\:38\c" COMMENT:" \n" --vertical-label="" --slope-mode DEF:a="/usr/share/cacti/rra/nats_master_readexeclogposdiff_179.rrd":ReadExecLogPosDiff:AVERAGE DEF:b="/usr/share/cacti/rra/nats_master_readexeclogposdiff_179.rrd":SecondsBehindMaster:AVERAGE DEF:c="/usr/share/cacti/rra/nats_master_readexeclogposdiff_179.rrd":BinlogCacheDiskUse:AVERAGE DEF:d="/usr/share/cacti/rra/nats_master_readexeclogposdiff_179.rrd":BinlogCacheUse:AVERAGE CDEF:cdefa=a,1024,/ CDEF:cdefd=c,1024,/ CDEF:cdefe=d,1024,/ AREA:cdefa#EAAF00:"LogPosLag(KB)" LINE1:cdefa#837C04:"" LINE2:b#F51D30:"Seconds Lag" LINE2:cdefd#4444FF:"Binlog Cache Disk(K)" LINE1:cdefe#35962B:"Binlog Cache Use(K)"
here is an strace of one of the process killing the computer ... when this process run 1 min later 5 gig of ram consumed and iowaits go crazy and swap almost full ... any idea ??
Jonathan, I would suggest adding your information to the trac ticket on the rrdtool website. http://people.ee.ethz.ch/~oetiker/webtools/rrdtool-trac/ticket/54
Okay, looks like the problem has been tracked down upstream. Tobi says a 1.2.16 release is due Real Soon Now, and will incorporate said fix. I could try to track down the specific changeset that fixes it, but would rather wait for the new release if its going to be out imminently...
Created attachment 143697 [details] Patch based on upstream changeset r881 and r887 The attached patch should fix things. I'll get a new rawhide build submitted momentarily. If folks could test that build (or rebuild it for their distro), please do. Gimme the thumbs up and I'll spin new packages for fc5 and fc6 (can probably do fc4 also if there's demand).
Not sync'd out to mirrors just yet, but a rawhide build is done. You can grab the srpm here: http://buildsys.fedoraproject.org/logs/fedora-development-extras/23763-rrdtool-1.2.15-8.fc7/
FC3 and FC4 have now been EOL'd. Please check the ticket against a current Fedora release, and either adjust the release number, or close it if appropriate. Thanks. Your friendly BZ janitor :-)
As best as I can tell, this is resolved in rrdtool 1.2.18, which is now in rawhide and making its way to FC6 shortly.