I find that firefox, galeon, and epiphany all hang when visiting the query.cgi expert mode page of redhat bugzilla. The 'components' select box does not populate itself, it stays grey. The cpu spins and the application does not respond or redraw itself. It can be closed (gnome recognizes not responding, force quit dialog). It appears to be javascript related since I haven't seen this hang on a page without ajax type content yet. The xulrunner-1.9-0.beta2.15.nightly20080130.fc9 is effected as well as firefox-3.0-0.beta2.12.nightly20080121.fc9.i386. Currently seeing the issue from both x86_64 running as vmware guest and as i686 on physical hardware. +++ This bug was initially created as a clone of Bug #430981 +++ I removed comments from the cloned bug which are fixed (the original issue there). -- Additional comment from lordmorgul on 2008-01-30 18:58 EST -- Created an attachment (id=293524) firefox_crash_cpulock.txt I have a crash on the same version of firefox, resulting in CPU spinning widely until gdb attached. The backtrace is attached. This occurred visiting bugzilla.redhat.com with two tabs open. The second tab was loaded at a blank page I believe. -- Additional comment from lordmorgul on 2008-01-30 19:34 EST -- Created an attachment (id=293525) firefox_crash_cpulock_safemode.txt Also happens in -safe-mode, still only browsing bugzilla. -- Additional comment from lordmorgul on 2008-01-30 19:42 EST -- Another note (bug spam I know). The crash I see happens predictably when loading https://bugzilla.redhat.com/query.cgi?format=simple but it does not happen when loading https://bugzilla.redhat.com/enter_bug.cgi?format=guided Both pages use ajax content, but the query is crashing it on queue. The query page displays but the components select box does not populate (stays grey). -- Additional comment from lordmorgul on 2008-01-31 04:24 EST -- galeon-2.0.4-2.fc9.i386 does not freeze loading the bugzilla query page.
If it's a new issue, then xulrunner/firefox are not likely to be the true culprit, because I've stuck to epiphany-2.21.5-0.1.svn7856.fc9 and xulrunner-1.9-0.beta2.11.nightly20080115.fc9.x86_64 which are already 2 weeks old. Nevertheless, I do encounter the same problem when trying to choose a component against which I want to file a new bug in Red Hat bugzilla.
I have only noticed this in the last few days, however I do the vast majority of my access to bugzilla from my macbook not running rawhide. I can't say how new it is. But I have verified I can make epiphany, firefox, and galeon all crash on the same query.cgi page with very similar backtraces (though I lack most of the debugging symbols in them as you see above). I haven't had time to dig further, but also given the lack of interest upstream in unofficial build debugging... I'm just not making it a priority yet.
I suspect it could have to do with the full rewrite of javascript handling they've been planning for ff3, though I don't know if that has been done yet. From what I can tell this only happens when the component list attempts to load, but the rest of the page has been fetched correctly beforehand.
Thank you for taking the time to report this bug report. Unfortunately, that stack traces in the original bugreport are not very useful in determining the cause of the crash. Please install firefox-debuginfo and xulrunner-debuginfo; in order to do this you have to enable -debuginfo repository. yum install --enablerepo=\*debuginfo {firefox,xullrunner}-debuginfo Then run firefox with a parameter -g. That will start firefox running inside of gdb debugger. Then use command run and do whatever you did to make firefox crash. When it happens, you should go back to the gdb and run (gdb) thread apply all backtrace This produces usually many screens of the text. Copy all of them into a text editor and attach the file to the bug as an uncompressed attachment. We will review this issue again once you've had a chance to attach this information. Thanks in advance.
Created attachment 293951 [details] gdb 'thread apply all bt' from hung firefox-3.0-0.beta2.15.nightly20080130.fc9.i386
Unfortunately there's a lot of unresolved references inside libxul.so in that backtrace. I've tried installing scads of debuginfo packages but it's not real helpful so far.
I'll see what I can come up with late tonight.
Created attachment 293985 [details] firefox backtrace unresponsive script Ok here is something new. The backtrace attached has 3 backtraces with a brief continued period in between them. I was viewing this bug, and clicked 'update components' by the component select dropdown. Firefox froze, as the javascript was executed. I stopped it in gdb for bt, continued, bt'd, continued.. and waited. Firefox eventually displayed the 'stop unresponsive script' dialog. The third bt is immediately after the script stopped. Firefox is still running as I post this after those backtraces, so the script being unresponsive did not crash it fully this time.
Like Will said there are alot of missing symbols in libxul, I've got glibc, nspr, firefox, and xulrunner debuginfo installed. (those seemed to be involved in the earlier backtraces) I just tried the query.cgi page and it also now shows the unresponsive script dialog, and the components list is populated... but it took 3 minutes to display the dialog meanwhile firefox is completely frozen.
OK, two things: a) I really cannot reproduce this with firefox-3.0-0.beta2.15.nightly20080130.fc9.x86_64 and xulrunner-1.9-0.beta2.15.nightly20080130.fc9.x86_64 -- all these pages stop downloading stuff (on pretty busy line) in around ten to fifteen seconds. b) to be sure you have all debugging symbols, install yum-utils and then run (as root) debuginfo-install firefox the result is quite large install, but it should really contain everything you can need.
debuginfo-install is what I used to get debuginfo for my backtrace. It should be obvious that it doesn't fetch all the needed debug files, in this case.
Created attachment 294191 [details] backtraces of firefox after page load (normal) and again after scripting crash (processor spinning) Attached is a log of firefox -g after using debuginfo-install. Unfortunately, as the gdb output shows there is alot of missing info still. I did run (as gdb suggests) one of the yum install commands in the top of the file, and it did install a debuginfo package I did not have yet... I have only done one of those so far. What I did in the above attachment is: 1- start firefox -g, then load (by bookmark) my bugzilla front page 2- click on this bug, load the page fully 3- open wireshark to gather tcpdump (will attach next), start capturing packets 4- reload this page via ctrl-r 5- interrupt firefox after page finishes loading and backtrace (the first t a a bt in the attachment) 6- wait 20 seconds or so 7- click 'update components' on the bug page 8- firefox locks up, I wait about 5-7 seconds and notice all traffic has stopped moving (firefox is spinning the cpu 100%) 9- interrupt firefox and t a a bt
Created attachment 294192 [details] wireshark output tcpdump showing traffic from bugzilla Here is all traffic during load of this page (first 3 seconds, 250 packets), and then when the update components script is run (packets 251-299 starting at 30s). Firefox appears to complete the transfer of the component list, then no traffic occurs for more than 5 minutes before the unresponsive script dialog appears. At this time there is NO additional traffic, but after closing the dialog the component list is populated.
Created attachment 294193 [details] An error in gdb that occurred while debugging firefox, t a a bt produced only this first thread's info This is probably not of much use, but here to look at anyway. There are only two references found in this thread; js_Invoke () and JS_CallFunctionValue ()
Meant to remove needinfo and return this to assigned, not modified.
I can still reproduce this on every bugzilla query load with: firefox-3.0-0.beta2.18.nightly20080210.fc9.i386 xulrunner-1.9-0.beta2.18.nightly20080210.fc9.i386 kernel-2.6.24.1-28.fc9.i686
FWIW, the query page does *eventually* come up, but it takes a while. On my machine, after about 3 minutes I get the "Stop unresponsive script" dialog, and after selecting "Continue" and waiting another ~30sec the page is usable.
(In reply to comment #13) > Unfortunately, as the gdb output shows there is alot of missing info still. I am trying to investigate this problem, and we would need for that attached output of this command: rpm -qa --qf '%{name}-%{version}-%{release}.%{arch}\n' Thank you.
Created attachment 295021 [details] Popup from firefox.... After allowing firefox to stew for quite a while on the bugzilla query page, I got the attached popup warning: A script on this page may be busy, or it may have stopped responding. You can stop the script now, or you can continue to see if the script will complete. Script: https://bugzilla.redhat.com/js/product_query.js:53 .... Clicking Stop didn't appear to help much.
Created attachment 295039 [details] rpm-qa-sorted-i686.txt rpm -qa --qf '%{name}-%{version}-%{release}.%{arch}\n' | sort is attached for two different machines (i686 on physical machine, and x86_64 on vmware guest) both experiencing this issue. They have both had debuginfo-install firefox xulrunner galeon epiphany. All three browsers for both machines have the similar behavior on javascript content loads.
Created attachment 295040 [details] rpm-qa-sorted-x86_64.txt
By the way it is interesting that gmail and many other ajax and client side only javascript pages are behaving just fine. The demo sites at http://echo.nextapp.com/site/ for instance all work and they are serverside applications.
Created attachment 295053 [details] Error cosole produce on bugzilla query page Opening an error console for the browse to query page produces the attachment. Apparently, something is not agreeable about the JavaScript function "updateSelect". The error console is complaining about "sel" being null, the concurrent "timeout" popup complains about the for loop about 13 lines later. Here is the function. Line 39 is " sel.disabled = true" function updateSelect(func, field, add_all) { var products = '' if ( field != 'product' ) { products = findProducts() } saved[field] = saved[field].length == 0 ? get_selection(field) : saved[field] var sel = document.getElementById(field); sel.disabled = true var callback = { success:function(o) { var result = eval(o.responseText); if (typeof(sel.options) != 'undefined') { sel.options.length = 0; var i = 0 var j = 0 if (add_all) { sel.options[i] = new Option('All','') i = 1 } for (; j < result.length; i++) { sel.options[i] = new Option(result[j],result[j]) j++ } } sel.disabled = false restore_selection(field, saved[field]) saved[field] = new Array() } } var url = urlbase + "ajax.cgi?ajax_func=" + encodeURIComponent(func) + "&ctype=json" + products var cObj = YAHOO.util.Connect.asyncRequest('GET', url, callback, null) } Any of this helpful?
Thats interesting Tom, I do see that error in console about sel being null when loading query.cgi for firefox 2.0.0.12 on OSX, and on rawhide with ff3. The updateSelectAll() javascript function (in the onload="" for the page body) must be calling that incorrectly. However, it appears that the page is calling updateSelect correctly for the 'Update Components' link on the bug page (top of this page) anyway. That script also causes the hang on rawhide, but I see no javascript console error on this page in OSX or rawhide. There is virtually no delay in updating the components on FF2 in OSX but there is the 3min hang on rawhide. <td> <select id="component" name="component"> <option value="xulrunner" selected="selected">xulrunner</option> </select> <span id="change_component"> <a title="Click here to select a different component" href="javascript:updateSelect('getComponents','component'); emptyDiv('change_component');"> Update Components</a> </span> </td>
Hmm, updateSelectAll looks ok too, at least from my meager javascript background. The component select has the right id, the function is called with that id, the getElementById must just be returning null for some reason. In the generated source of query.cgi: <div id="selcomponent"> <select name="component" id="component" multiple="multiple" size="7"></select> </div> The script error where sel is null: function updateSelectAll(add_all) { updateSelect('getComponents', 'component', add_all) updateSelect('getVersions', 'version', add_all) updateSelect('getMilestones', 'target_milestone', add_all) } function updateSelect(func, field, add_all) { var products = '' if ( field != 'product' ) { products = findProducts() } saved[field] = saved[field].length == 0 ? get_selection(field) : saved[field] var sel = document.getElementById(field); sel.disabled = true
(In reply to comment #21) > Created an attachment (id=295039) [edit] > rpm-qa-sorted-i686.txt The debuginfo rpms versions are not matching the binary rpms versions, such as: firefox-3.0-0.beta2.18.nightly20080210.fc9.i386 firefox-debuginfo-3.0-0.beta2.15.nightly20080130.fc9.i386 Possible workarounds: yum --enablerepo='*-debuginfo' update Follow all the lines printed by GDB: yum --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/../... It is in fact Bug 151598 but there would be YUM failures if it would get fixed.
Just in order to make clear that this bug is not about crash before I will make all the same bugs duplicates of this one. Jan, I think it would be better to keep this problems with debuginfo in bug 432806 -- this bug was reproduced with the upstream binary version of firefox, so it might be CLOSED/UPSTREAM pretty soon.
*** Bug 426685 has been marked as a duplicate of this bug. ***
*** Bug 432291 has been marked as a duplicate of this bug. ***
The debuginfo may have been in sync when I obtained those backtraces, I had deliberately not updated it since and the rpm lists were afterthought. I'll try to get them in sync and see if the bt of thread 1 when it hangs is more useful.
Created attachment 295085 [details] firefox-bt-matched-debuginfo-3.0-0.beta3.21.nightly20080214.txt x86_64 backtrace while firefox is hanging Also it is interesting to note, the 'update products', 'update versions' scripts on this page do not cause firefox to hang, only the update components script does.
Dave, does this seems like something which may be actually the Bugzilla's problem?
(In reply to comment #0) > The cpu spins and the application does not respond or redraw itself. > It can be closed (gnome recognizes not responding, force quit dialog). (In reply to comment #33) > Dave, does this seems like something which may be actually the Bugzilla's problem? Matej, a foreign web page (Bugzilla) must have no right to block the browser responsiveness. Even if the (possibly malicious) web page would really like to. (->gecko/xulrunner Bug)
(In reply to comment #34) > Matej, a foreign web page (Bugzilla) must have no right to block the browser > responsiveness. Even if the (possibly malicious) web page would really like to. > (->gecko/xulrunner Bug) I am not trying now to blame Bugzilla on this bug — I totally agree with you — however, given the importance of this bug for keeping whole bug filing and bug management process working, I would be eager to be able to find as soon as possible whether there is something to be done in Bugzilla to make it working for everybody affected again. Then we could happily return to the interesting issue why is bugzilla able to bring firefox to its knees. Still waiting on reply from Bugzilla folks.
For the BZ guys info, I'm using FF3b3 on windows xp fine with the same network in front of the machine (same router, hardware firewall configured, nat, etc) on two machines. FF3 on rawhide does not work from either of the same machines. BZ is behaving fine for FF2 and safari (windows and osx) and IE7 on the same network. If the BZ javascript is to blame in this case I would think it is something very quirkcentric. ;)
Andrew, does FF2 on Linux work for you?
Well I don't have an F8 install in place now, but it does work on Ubuntu, and it used to work fine on F8. I'll see what I can do about checking on an F8 live cd.
Matej, the official build of ff2.0.0.12 works fine (on rawhide), using the tarball directly from upstream. Both the query.cgi page and the update components link on the bug work within 5-6 seconds with no noticeable cpu load.
And.. its reproduced using the latest upstream FF3 nightly. Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9b4pre) Gecko/2008022004 Minefield/3.0b4pre The javascript interpreter is just broken it seems. I just also tried the latest upstream nightly on OSX and it works fine with bz.
Is it possible that the way js is allocating memory severely degrades the performance? milestones & versions are typically only a few array items, however the component list can be on the order of 1000's of items. var result = eval(o.responseText); if (typeof(sel.options) != 'undefined') { sel.options.length = 0; var i = 0 var j = 0 if (add_all) { sel.options[i] = new Option('All','') i = 1 } for (; j < result.length; i++) { sel.options[i] = new Option(result[j],result[j]) j++ } } To my mind js is going to have to allocate memory for a new 'Option' on each loop iteration as well as allocating space for the array as it grows. Though I'd bet it pre-allocates arrays to handle the general cases. For instance, here are the component counts for some products Fedora 6155 Fedora EPEL 1583 Red Hat Raw Hide 1568 Red Hat Linux Beta 1349 Red Hat Enterprise Linux 5 1298 That is the top 5 products by component count. So it's probably no surprise that you Fedora guys are hitting this. FWIW it works ok for me with FF2 Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.12) Gecko/20080208 Fedora/2.0.0.12-1.fc8 Firefox/2.0.0.12 of F8
Created attachment 295524 [details] callgrind annotated output i tried running callgrind on ff, here's the shortened output firefox-3.0-0.beta3.1.el5 xulrunner-1.9-0.beta3.1.el5
Bug filed upstream as https://bugzilla.mozilla.org/show_bug.cgi?id=418845
Created attachment 295764 [details] oprofile output for only the Option test link load I've created a test script for this bug at (c=numberofoptions): http://www.lordmorgul.net/pub/fedora/testing/ff3-bug431162.php?c=1000 This attached oprofile is for that load only (the page is already loaded so the static options are not included.
*** Bug 435188 has been marked as a duplicate of this bug. ***
The upstream bug result if anyone goes searching for this bug, setting GNOME_ACCESSIBILITY=0 in your environment when starting firefox seems to drastically reduce the problem. A real solution to the issue will hopefully be found soon. > GNOME_ACCESSIBILITY=0 firefox
A more thorough fix for this (temporary workaround for Fedora) is to turn off the gnome assistive technologies option. I do not know how/why this got turned on (maybe its been set as default on in Rawhide?) but it was turned on for both my machines. Setting the environment variable tells firefox (all apps that respect the var) not to use AT but turning it off also works. System->Preferences->Personal->Assistive Technologies Toggle the 'enable assistive technologies' checkbox, then restart firefox. It is not necessary to logout. Matej: maybe we can find out if this is supposed to be on by default for installs? Keeping this from biting the F9 release is probably a good idea. Maybe until it is fixed upstream the Fedora firefox start script could set the environment var? All xulrunner apps actually need it.
accessibility support is turned on by default since gnome 2.17, at least
Seems fixed for me with the 20080310 snapshot (Current rawhide). Huzzah! Andrew, can you retest and confirm?
(In reply to comment #50) > Seems fixed for me with the 20080310 snapshot (Current rawhide). Huzzah! Most likely it is a false positive -- the upstream patch is still being reviewed (and yes the bug generated something over 70 comments -- yay!).
(In reply to comment #51) > > Most likely it is a false positive D'oh! You're right. Mixed up my test machines - this one turned out to have accessibility turned off, which is a known workaround. Please disregard comment #50.
Greetings: I just entered 43780, which appears to be related to this thread. Thanks
Upstream fixed in their CVS.
Looks like this made it to rawhide today. firefox-3.0-0.51.beta5rc2.fc9.i386 xulrunner-1.9-0.51.beta5rc2.fc9.i386 My test script [1] and 'update components' on the bug page both run in about 10 seconds max with accessibility turned on. Confirm and we'll rejoice. [1] http://www.lordmorgul.net/pub/fedora/testing/ff3-bug431162.php?c=6000
Wonderful news!
This may have a regression in firefox-3.0-0.52.beta5 and xulrunner-1.9-0.52.beta5, or maybe the package before. I noticed the bugzilla query page running slower the last day or two, but after update today it seems like a big change. I have assistive technologies toggled on to keep an eye on this. My test page script is taking over a minute now for c=6000. Does anyone but me see this slowness again?
I don't see it.
Created attachment 303957 [details] testing page (In reply to comment #55) > My test script [1] and 'update components' on the bug page both run in about 10 > seconds max with accessibility turned on. Confirm and we'll rejoice. > [1] http://www.lordmorgul.net/pub/fedora/testing/ff3-bug431162.php?c=6000 Just for the sake of beauty of this test page and its preservation, I am attaching it here.