Created attachment 339886 [details] xulrunner spec file patch to enable sdt As a part of the ongoing Fedora Systemtap Static probes effort: https://fedoraproject.org/wiki/Features/SystemtapStaticProbes , please consider enabling static markers in xulrunner. Attached is a xulrunner spec file patch. Additional info: systemtap-sdt-devel package carries the necessary support for building xulrunner with static markers enabled.
Rajan, could you paste in some systemtap script code & output therefrom to help folks see what this patch enables?
I've used dtrace and seen it in action specifically in the context of Firefox so I know what it provides. I consider this a debugging feature that users interested in this ought to be able to figure out how to compile it in on their own. Much more interesting to me would be to know what effect this has on Firefox performance benchmarks as well as code size. Tests such as https://wiki.mozilla.org/Performance:Tinderbox_Tests would be very useful to run. I'm rather loath to add more debugging which would be useful to a handful of people at the cost of real world performance for virtually every Fedora user.
Here are a few test results with and without markers on the same setup: ====WITHOUT MARKERS==== => Ts: Startup time $ ./startup-unix.pl /usr/bin/firefox __startuptime,1248 => Txul: XUL window open time $ /usr/bin/firefox -chrome xpfe/test/winopen.xul openingTimes=179,189,187,184,177,182,179,184,189 avgOpenTime:183 minOpenTime:177 maxOpenTime:189 medOpenTime:184 __xulWinOpenTime:184 => Tdhtml: DHTML performance Test Average Data ============================================================ colorfade: 1329 1339,1279,1342,1352,1333 diagball: 1590 1552,1570,1619,1608,1603 fadespacing: 2148 2123,2134,2167,2162,2156 imageslide: 388 379,390,393,387,392 layers1: 553 470,487,633,603,571 layers2: 19 30,15,17,15,17 layers4: 14 20,12,14,13,13 layers5: 422 450,360,442,440,418 layers6: 31 29,29,32,32,32 meter: 1036 1013,1046,1094,1006,1022 movingtext: 839 977,906,763,787,763 mozilla: 2697 2671,2658,2711,2707,2736 replaceimages: 522 485,601,500,499,525 scrolling: 2781 2769,2819,2787,2759,2769 slidein: 2456 2678,2403,2415,2389,2395 slidingballs: 356 327,377,365,367,345 zoom: 586 711,546,550,578,543 _x_x_mozilla_dhtml,493 ====WITH MARKERS ENABLED==== => Ts: Startup time $ ./startup-unix.pl /usr/bin/firefox __startuptime,1232 => Txul: XUL window open time $ /usr/bin/firefox -chrome xpfe/test/winopen.xul openingTimes=193,189,192,193,197,184,190,199,197 avgOpenTime:193 minOpenTime:184 maxOpenTime:199 medOpenTime:193 __xulWinOpenTime:193 => Tdhtml: DHTML performance Test Average Data ============================================================ colorfade: 1342 1265,1354,1352,1389,1349 diagball: 1603 1575,1611,1624,1596,1611 fadespacing: 2150 2097,2161,2162,2148,2181 imageslide: 388 385,389,392,391,381 layers1: 472 455,476,471,475,481 layers2: 15 15,14,15,14,16 layers4: 12 16,12,10,12,12 layers5: 377 484,350,353,348,351 layers6: 29 35,27,27,29,28 meter: 1057 1109,1043,1044,1045,1044 movingtext: 761 760,758,766,759,760 mozilla: 2720 2677,2718,2732,2749,2723 replaceimages: 421 397,430,423,427,426 scrolling: 2584 2547,2557,2589,2581,2644 slidein: 2441 2641,2384,2390,2397,2395 slidingballs: 378 418,383,388,353,347 zoom: 545 563,519,545,550,549 _x_x_mozilla_dhtml,462
(In reply to comment #2) > I've used dtrace and seen it in action specifically in the context of Firefox > so I know what it provides. I consider this a debugging feature that users > interested in this ought to be able to figure out how to compile it in on their > own. Is it your impression that this facility is interest *solely* to firefox developers, as opposed to knowledgeable users/sysadmins? > Much more interesting to me would be to know what effect this has on Firefox > performance benchmarks as well as code size. Rajan's posting appears to indicate both positive and negative differences between the with- and without-marker cases - where the size of the differences seem to be within the noise of the individual repetitions. Rajan should please post differences between the object code sizes. How low a difference would convince you that the (small?) benefit of having these markers be enabled by default has small enough costs?
(In reply to comment #4) > Is it your impression that this facility is interest *solely* to firefox > developers, as opposed to knowledgeable users/sysadmins? I realize there will be other people that want this on. I'm just saying that the number of people this will be useful to is extremely tiny compared to the number of people that will never use it. If this creates any performance loss at no gain for 99.9% of people, then enabling a feature for *maybe* 100 people is a non-starter. Especially given the level of people who would actually use this feature is of a technical enough level that they would be able to recompile easily. > How low a difference would convince you that the (small?) benefit of having > these markers be enabled by default has small enough costs? Nil. I'm okay with code size going a little bit up. Performance can not be negatively affected though. I would rather get 100 bugs asking me to enable this and tell them all one by one to recompile it themselves if they want to do this, than to cause a hit to the hundreds of thousands of users we have.
The object code size increase is rather small. On an x86_64, /usr/lib64/xulrunner-1.9.1/libmozjs.so is 3440 bytes heavier with the markers enabled. Here's a few more test results: # ====WITHOUT MARKERS==== => Page load time $ /usr/bin/firefox xulrunner-1.9.1/mozilla-1.9.1/tools/performance/pageload/start.html (tinderbox dropping follows) _x_x_mozilla_page_load,130.5,413,29 _x_x_mozilla_page_load_details,avgmedian|130.5|average|125.63|minimum|29|maximum|413|stddev|115.46:|0;bugzilla.mozilla.org/;203.5;197.25;189;274;274;204;193;189;203|1;lxr.mozilla.org/;373;341.75;302;413;413;382;319;364;302|2;vanilla-page/;57.5;54;29;100;29;100;71;44;72 # ====WITH MARKERS==== => Page load time $ /usr/bin/firefox xulrunner-1.9.1/mozilla-1.9.1/tools/performance/pageload/start.html (tinderbox dropping follows) _x_x_mozilla_page_load,122.75,334,32 _x_x_mozilla_page_load_details,avgmedian|122.75|average|116.50|minimum|32|maximum|334|stddev|99.77:|0;bugzilla.mozilla.org/;192.5;190;186;215;215;192;186;189;193|1;lxr.mozilla.org/;301.5;293.25;283;334;334;299;304;283;287|2;vanilla-page/;53;43;32;72;32;72;45;34;61
This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle. Changing version to '11'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
There have been some performance improvements and I would like to get new numbers. So when I run this: /usr/bin/firefox -chrome xpfe/test/winopen.xul It looks in the wrong place so I add some funny links: ln -s /usr/lib64/mozilla /usr/lib64/firefox-2.0.0.5papillon:/work/scox/systemtap/bld/testsuite/xul/bld/js/src ln -s /usr/lib64/nspluginwrapper /usr/lib/nspluginwrapper/i386/linux and then rerunning it yields: it pops up a "Windows Opening Test" window it brings up firefox with: file:///work/scox/systemtap/bld/testsuite/xul/src/xpfe/test/child-window.html and then it "gets stuck" Any advice on how to duplicate Rajan's experiment above?
Hmm, bug 490529
(In reply to comment #9) > Hmm, bug 490529 OK, this bug is actually blocked by bug 490529, so no need to emphasize it at all.
With the latest stap using version 3 sdt.h static markers I built js with and without markers and run the js testsuite with both using: for i in $(find xul/src/js/src/tests -mindepth 2 -name 'regress*' -prune -o -name 'shell.js' -prune -o \( -name '*js' -print \)) ; do taskset 1 ./js -f xul//src/js/src/tests/shell.js -f $i done and the results seem quite close: js built with --enable-dtrace 11.99user 7.19system 0:27.11elapsed 70%CPU (0avgtext+0avgdata 62368maxresident)k 16792inputs+0outputs (24major+1099709minor)pagefaults 0swaps js built without --enable-dtrace 12.00user 7.08system 0:26.16elapsed 72%CPU (0avgtext+0avgdata 62384maxresident)k 14096inputs+0outputs (8major+1099701minor)pagefaults 0swaps
scox, could you post performance numbers with plain systemtap-1.3 (ie. sdt_v2)? (I ask because sdt_v3 won't be seen in fedora for some months yet.)
Here is the same test as #11 but run with 1.3 version of stap js built with --enable-dtrace run with stap with probes that accumulate statistics 472.11user 262.85system 13:48.73elapsed 88%CPU (0avgtext+0avgdata 108352maxresident)k 53352inputs+1441816outputs (258major+13220415minor)pagefaults 0swaps js built with --enable-dtrace not run with stap 12.36user 7.80system 0:35.32elapsed 57%CPU (0avgtext+0avgdata 62368maxresident)k 16856inputs+0outputs (5major+1097342minor)pagefaults 0swaps js built without --enable-dtrace not run with stap 12.47user 7.84system 0:39.45elapsed 51%CPU (0avgtext+0avgdata 62368maxresident)k 20880inputs+0outputs (24major+1096001minor)pagefaults 0swaps
OK, the key part is the last two numbers: essentially indistinguishable runtimes for the compiled-in vs. compiled-out cases. Chris, what else do you need to justify enabling this instrumentation?
--enable-dtrace is enabled for debug builds in spec now. If you want to use them just rebuild the package with "debug_build" set to 1. We will need more performance test to enable it in production environment.
Thanks, good news! For what it's worth, the presence of plain sys/sdt.h instrumentation is so close to zero (a single NOP instruction) that many other packages (including glibc) are adding them. The only concern is if the *_PROBE() markers are unprotected by an outer *_PROBE_ENABLED() test, AND are passed parameters that are unusually expensive to compute (like some elaborate tracing text string). If not both these conditions hold, you should not be able to measure a difference in performance with vs. without the systemtap sys/sdt.h macros.
xulrunner is EOL now.