Bug 441643
Summary: | mugshot extension crashes Firefox | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Matěj Cepl <mcepl> | ||||||||||||
Component: | mugshot | Assignee: | Owen Taylor <otaylor> | ||||||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||||||
Severity: | low | Docs Contact: | |||||||||||||
Priority: | low | ||||||||||||||
Version: | rawhide | CC: | axel.thimm, gecko-bugs-nobody, jan.kratochvil, mcepl, mpg, otaylor, stransky, tagoh | ||||||||||||
Target Milestone: | --- | ||||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | All | ||||||||||||||
OS: | Linux | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | Environment: | ||||||||||||||
Last Closed: | 2008-04-28 20:44:00 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | |||||||||||||||
Bug Blocks: | 441977 | ||||||||||||||
Attachments: |
|
Description
Matěj Cepl
2008-04-09 10:11:21 UTC
Created attachment 301771 [details]
session in gdb
*** Bug 441892 has been marked as a duplicate of this bug. *** Colin: can you look into this? > Steps to Reproduce:
> not a good pattern
Until I disabled the mugshot extension the following URL would *always* crash
firefox.
(In reply to comment #4) > Until I disabled the mugshot extension the following URL would *always* crash > firefox. What URL? (In reply to comment #5) > (In reply to comment #4) > > Until I disabled the mugshot extension the following URL would *always* crash > > firefox. > > What URL? http://mugshot.org/visit?post=R43l8bh5QClq8M Created attachment 302438 [details] my backtrace on gdb I can reproduce this bug as well, with http://online.gnome.org/application?id=gnu-emacs say. Reproduced on x86_64 ... investigating. Having a lot of trouble getting past the XPCOM and figuring out what's really going on - the section of our code that is crashing looks like: /* void setListener (in hippoIControlListener listener); */ NS_IMETHODIMP hippoControl::SetWindow(nsIDOMWindow *window) { [...] nsCOMPtr<nsIWidget> widget = GetMainWidget(window); [...] } static nsIWidget* GetMainWidget(nsIDOMWindow* aWindow) { // get the native window for this instance nsCOMPtr<nsPIDOMWindow> window(do_QueryInterface(aWindow)); NS_ENSURE_TRUE(window, nsnull); nsCOMPtr<nsIDocument> doc(do_QueryInterface(window->GetExtantDocument())); NS_ENSURE_TRUE(doc, nsnull); >>>> nsCOMPtr<nsISupports> container = doc->GetContainer(); nsCOMPtr<nsIBaseWindow> baseWindow(do_QueryInterface(container)); The crash occurs in the line marked with >>>> If I trace inside there, the crash occurs in the attempt to dereference the mDocumentContainer weak reference filed of nsIDocument. /** * Get the container (docshell) for this document. */ already_AddRefed<nsISupports> GetContainer() const { nsISupports* container = nsnull; if (mDocumentContainer) >>>> CallQueryReferent(mDocumentContainer.get(), &container); return container; } None of the objects are obviously garbage when inspected in a debugger, but I'm not really expert enough in XPCOM to get past the surface types and look at the details of the implementation classes. As far as we can tell this crash occurs on x86_64 and not on i386. On x86_64 it is 100% reproducible. valgrind shows: ==18620== Invalid read of size 8 ==18620== at 0x1D0E0947: hippoControl::SetWindow(nsIDOMWindow*) (in /usr/lib64/mugshot/firefox/components/libhippofirefox.so) ==18620== by 0x765EAC3: NS_InvokeByIndex_P (xptcinvoke_x86_64_linux.cpp:208) ==18620== by 0x6E959FE: XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) (xpcwrappednative.cpp:2369) ==18620== by 0x6E9E11A: XPC_WN_CallMethod(JSContext*, JSObject*, unsigned, long*, long*) (xpcwrappednativejsops.cpp:1470) ==18620== by 0x61A2F16: js_Invoke (jsinterp.c:1287) ==18620== by 0x6196257: js_Interpret (jsinterp.c:4841) ==18620== by 0x61A2F7E: js_Invoke (jsinterp.c:1303) ==18620== by 0x61A3383: js_InvokeConstructor (jsinterp.c:1861) ==18620== by 0x619A325: js_Interpret (jsinterp.c:3811) ==18620== by 0x61A2F7E: js_Invoke (jsinterp.c:1303) ==18620== by 0x6190BD2: fun_apply (jsfun.c:1650) ==18620== by 0x619F788: js_Interpret (jsinterp.c:4824) ==18620== Address 0x393500382d46546d is not stack'd, malloc'd or (recently) free'd Building without optimization makes this crash dissapear... So, eventually I was able to get a xulrunner built with --enable-debug that exhibits the crash. I see: WARNING: recurring into frame construction: 'mPresContext->mLayoutPhaseCount[eLayoutPhase_FrameC] == 0', file ../../dist/include/layout/nsPresContext.h, line 958 WARNING: NS_ENSURE_TRUE(sgo || !hasHadScriptObject) failed: file nsContentUtils.cpp, line 2594 --DOMWINDOW == 10 (0x1dd1240) [serial = 10] [Outer = 0x1ad7eb0] ++DOMWINDOW == 11 (0x1dd1240) [serial = 12] [Outer = 0x1ad7eb0] CSS Error (http://online.gnome.org/css3/1207784006/site-positions.css :21.17): Error in parsing value for property 'width'. Declaration dropped. WARNING: recurring into frame construction: 'mPresContext->mLayoutPhaseCount[eLayoutPhase_FrameC] == 0', file ../../dist/include/layout/nsPresContext.h, line 958 ++DOMWINDOW == 12 (0x281d420) [serial = 13] [Outer = 0x1ad7eb0] CSS Error (http://online.gnome.org/css3/1207784006/site-positions.css :21.17): Error in parsing value for property 'width'. Declaration dropped. WARNING: recurring into frame construction: 'mPresContext->mLayoutPhaseCount[eLayoutPhase_FrameC] == 0', file ../../dist/include/layout/nsPresContext.h, line 958 ** (firefox:21711): DEBUG: Connected to session bus ** (firefox:21711): DEBUG: unique name of client: :1.19 ** (firefox:21711): DEBUG: adding rule type='signal',sender=':1.19',path='/com/dumbhippo/listener',interface='com.dumbhippo.Listener',member='Connected' ** (firefox:21711): DEBUG: adding rule type='signal',sender=':1.19',path='/com/dumbhippo/listener',interface='com.dumbhippo.Listener',member='Disconnected' ** (firefox:21711): DEBUG: Error from registerEndpoint(): XMPP connection not active Program /usr/lib64/firefox-3.0b5/firefox (pid = 21711) received signal 11. It's here: 676 already_AddRefed<nsISupports> GetContainer() const 677 { 678 nsISupports* container = nsnull; 679 if (mDocumentContainer) 680 CallQueryReferent(mDocumentContainer.get(), &container); 681 682 return container; 683 } (gdb) p mDocumentContainer $119 = {<nsCOMPtr_base> = {mRawPtr = 0x1ad1888}, <No data fields>} (gdb) p mDocumentContainer.mRawPtr $120 = (nsISupports *) 0x1ad1888 (gdb) p* mDocumentContainer.mRawPtr $121 = {_vptr.nsISupports = 0x393500382d465455} 0x393500382d465455 doesn't seem to be the right value here ;-) But why is the mDocumentContainer broken? #0 nsIDocument::GetContainer (this=0x18c9580) at /usr/include/xulrunner-sdk-1.9pre/content/nsIDocument.h:679 #1 0x00002aaac26861e1 in GetMainWidget (aWindow=0x1598690) at common-dist/firefox/src/hippoControl.cpp:167 #2 0x00002aaac26863d7 in hippoControl::SetWindow (this=0x9bbc70, window=0x1598690) at common-dist/firefox/src/hippoControl.cpp:201 #3 0x00002aaaac89807d in NS_InvokeByIndex_P (that=0x9bbc70, methodIndex=9, paramCount=1, params=0x7fffe07f1720) at xptcinvoke_x86_64_linux.cpp:208 #4 0x00002aaaab95f6b6 in XPCWrappedNative::CallMethod (ccx=@0x7fffe07f1b80, mode=XPCWrappedNative::CALL_METHOD) at xpcwrappednative.cpp:2369 #5 0x00002aaaab96e45c in XPC_WN_CallMethod (cx=0x1598a60, obj=0x76ed80, argc=1, argv=0x17e9b58, vp=0x7fffe07f1d88) at xpcwrappednativejsops.cpp:1470 #6 0x00002aaaaab7866f in js_Invoke (cx=0x1598a60, argc=1, vp=0x17e9b48, flags=2048) at jsinterp.c:1287 #7 0x00002aaaaab6a2c0 in js_Interpret (cx=0x1598a60) at jsinterp.c:4841 #8 0x00002aaaaab78781 in js_Invoke (cx=0x1598a60, argc=1, vp=0x17e9a28, flags=1) at jsinterp.c:1303 #9 0x00002aaaaab79f10 in js_InvokeConstructor (cx=0x1598a60, vp=0x17e9a28, argc=1) at jsinterp.c:1861 #10 0x00002aaaaab62e4e in js_Interpret (cx=0x1598a60) at jsinterp.c:3811 #11 0x00002aaaaab78781 in js_Invoke (cx=0x1598a60, argc=1, vp=0x17e97e0, flags=2) at jsinterp.c:1303 #12 0x00002aaaaab78a9d in js_InternalInvoke (cx=0x1598a60, obj=0x1ae03c0, fval=26271568, flags=0, argc=1, argv=0x17e97d8, rval=0x7fffe07f3ad8) at jsinterp.c:1359 Created attachment 303226 [details]
Minimal testcase
The trace I'm getting here stops at SetWindow. I'm using a debug build and there are no messages when clicking the button in the testcase I attached. Using --disable-optimize doesn't seem to make any difference. I can't dig down in the nsIDOMWindow passed to SetWindow. I don't know the xpcom internals and what _vptr.nsISupports is about. (gdb) f 5 #5 0x00007fb26ed56c56 in hippoControl::SetWindow (this=0x22dbf30, window=0x1b64c50) at /usr/include/xulrunner-sdk-1.9pre/stable/nsIWeakReferenceUtils.h:63 63 reinterpret_cast<void**>(aDestination)); (gdb) p window $3 = (nsIDOMWindow *) 0x1b64c50 (gdb) p* window $4 = {<nsISupports> = {_vptr.nsISupports = 0x7fb281f92350}, <No data fields>} Just before calling javascript mDocumentContainer seem to be fine: (gdb) f 17 #17 0x00007fb280ddc384 in nsEventListenerManager::HandleEvent (this=0x2ca0660, aPresContext=0x2c74e10, aEvent=0x7fff8a0a7250, aDOMEvent=0x7fff8a0a7030, aCurrentTarget=0x2ca05a0, aFlags=6, aEventStatus=0x7fff8a0a7038) at nsEventListenerManager.cpp:1184 (gdb) p aPresContext.mDocument.mRawPtr.mDocumentContainer.mRawPtr $35 = (class nsIWeakReference *) 0x1f03b30 (gdb) p* aPresContext.mDocument.mRawPtr.mDocumentContainer.mRawPtr $36 = {<nsISupports> = {_vptr.nsISupports = 0x7fb28204acb0}, <No data fields>} Full trace: #0 0x00000030acea6251 in nanosleep () from /lib64/libc.so.6 #1 0x00000030acea6077 in __sleep (seconds=<value optimized out>) at ../sysdeps/unix/sysv/linux/sleep.c:138 #2 0x00007fb28089fd3b in ah_crap_handler (signum=11) at nsSigHandlers.cpp:149 #3 0x00007fb2808b42c2 in nsProfileLock::FatalSignalHandler (signo=11) at nsProfileLock.cpp:216 #4 <signal handler called> #5 0x00007fb26ed56c56 in hippoControl::SetWindow (this=0x22dbf30, window=0x1b64c50) at /usr/include/xulrunner-sdk-1.9pre/stable/nsIWeakReferenceUtils.h:63 #6 0x00007fb28161232f in NS_InvokeByIndex_P (that=0x22dbf30, methodIndex=9, paramCount=1, params=0x7fff8a0a5e60) at xptcinvoke_x86_64_linux.cpp:208 #7 0x00007fb2808fcf9c in XPCWrappedNative::CallMethod (ccx=@0x7fff8a0a6110, mode=<value optimized out>) at xpcwrappednative.cpp:2369 #8 0x00007fb28090a0ee in XPC_WN_CallMethod (cx=0x1b650b0, obj=0x168c500, argc=1, argv=0x2687cc0, vp=0x7fff8a0a62f8) at xpcwrappednativejsops.cpp:1470 #9 0x000000000067de86 in js_Invoke (cx=0x1b650b0, argc=1, vp=0x2687cb0, flags=2048) at jsinterp.c:1287 #10 0x000000000066acd6 in js_Interpret (cx=0x1b650b0) at jsinterp.c:4841 #11 0x000000000067e2f8 in js_Invoke (cx=0x1b650b0, argc=1, vp=0x2687ba0, flags=2) at jsinterp.c:1303 #12 0x000000000067e524 in js_InternalInvoke (cx=0x1b650b0, obj=0x1682480, fval=23241024, flags=2, argc=1, argv=0x2687b98, rval=0x7fff8a0a6918) at jsinterp.c:1359 #13 0x0000000000625cc4 in JS_CallFunctionValue (cx=0x1b650b0, obj=0x1682480, fval=23241024, argc=1, argv=0x2687b98, rval=0x7fff8a0a6918) at jsapi.c:5036 #14 0x00007fb280f6fb37 in nsJSContext::CallEventHandler (this=0x1b64f70, aTarget=<value optimized out>, aScope=<value optimized out>, aHandler=0x162a140, aargv=0x2c3fc70, arv=0x7fff8a0a6b80) at nsJSEnvironment.cpp:1962 #15 0x00007fb280fd4357 in nsJSEventListener::HandleEvent (this=0x2ca0710, aEvent=0x15dc9a0) at nsJSEventListener.cpp:248 #16 0x00007fb280ddbd73 in nsEventListenerManager::HandleEventSubType (this=0x2ca0660, aListenerStruct=<value optimized out>, aListener=0x2ca0710, aDOMEvent=0x15dc9a0, aCurrentTarget=0x2ca05a0, aPhaseFlags=<value optimized out>) at nsEventListenerManager.cpp:1080 #17 0x00007fb280ddc384 in nsEventListenerManager::HandleEvent (this=0x2ca0660, aPresContext=0x2c74e10, aEvent=0x7fff8a0a7250, aDOMEvent=0x7fff8a0a7030, aCurrentTarget=0x2ca05a0, aFlags=6, aEventStatus=0x7fff8a0a7038) at nsEventListenerManager.cpp:1184 #18 0x00007fb280e04782 in nsEventTargetChainItem::HandleEvent (this=<value optimized out>, aVisitor=@0x7fff8a0a7020, aFlags=6) at nsEventDispatcher.cpp:206 #19 0x00007fb280e04c63 in nsEventTargetChainItem::HandleEventTargetChain (this=0x23136d0, aVisitor=@0x7fff8a0a7020, aFlags=6, aCallback=0x7fff8a0a70e0) at nsEventDispatcher.cpp:264 #20 0x00007fb280e051b1 in nsEventDispatcher::Dispatch (aTarget=<value optimized out>, aPresContext=0x2c74e10, aEvent=0x7fff8a0a7250, aDOMEvent=0x0, aEventStatus=0x7fff8a0a77ac, aCallback=0x7fff8a0a70e0) at nsEventDispatcher.cpp:479 #21 0x00007fb280b4a6ea in PresShell::HandleEventInternal (this=0x2c7a660, aEvent=0x7fff8a0a7250, aView=<value optimized out>, aStatus=0x7fff8a0a77ac) at nsPresShell.cpp:5945 #22 0x00007fb280b4b227 in PresShell::HandleEventWithTarget (this=0x2c7a660, aEvent=0x7fff8a0a7250, aFrame=<value optimized out>, aContent=<value optimized out>, aStatus=0x7fff8a0a77ac) at nsPresShell.cpp:5850 #23 0x00007fb280de23eb in nsEventStateManager::CheckForAndDispatchClick (this=<value optimized out>, aPresContext=<value optimized out>, aEvent=0x7fff8a0a79d0, aStatus=0x7fff8a0a77ac) at nsEventStateManager.cpp:3320 I figured out how to dig through _vptr.nsISupports. I can't find anything wrong there. (gdb) f 5 #5 0x00007fe501b7cc50 in hippoControl::GetMainWidget (this=<value optimized out>, aWindow=<value optimized out>) at /usr/include/xulrunner-sdk-1.9pre/stable/nsIWeakReferenceUtils.h:63 63 reinterpret_cast<void**>(aDestination)); (gdb) p* (nsStandardURL*)((nsWebShell*)((nsWeakReference*)((nsHTMLDocument *)doc).mDocumentContainer).mReferent).mCurrentURI $29 = {<nsIFileURL> = {<nsIURL> = {<nsIURI> = {<nsISupports> = { _vptr.nsISupports = 0x7fe5139bfd10}, <No data fields>}, <No data fields>}, <No data fields>}, <nsIStandardURL> = {<nsIMutable> = {<nsISupports> = {_vptr.nsISupports = 0x7fe5139bff30}, <No data fields>}, <No data fields>}, <nsISerializable> = {<nsISupports> = { _vptr.nsISupports = 0x7fe5139bff70}, <No data fields>}, <nsIClassInfo> = {<nsISupports> = { _vptr.nsISupports = 0x7fe5139bffa8}, <No data fields>}, mRefCnt = {mValue = 11}, _mOwningThread = {mThread = 0x2103070}, mSpec = {<nsACString_internal> = {<nsCSubstring_base> = {<No data fields>}, mData = 0x311e6f8 "http://mugshot.org/visit?post=R43l8bh5QClq8M", mLength = 44, mFlags = 5}, <No data fields>}, mDefaultPort = 80, mPort = -1, mScheme = {mPos = 0, mLen = 4}, mAuthority = {mPos = 7, mLen = 11}, mUsername = {mPos = 7, mLen = -1}, mPassword = {mPos = 7, mLen = -1}, mHost = {mPos = 7, mLen = 11}, mPath = {mPos = 18, mLen = 26}, mFilepath = { mPos = 18, mLen = 6}, mDirectory = {mPos = 18, mLen = 1}, mBasename = {mPos = 19, mLen = 5}, mExtension = {mPos = 19, mLen = -1}, mParam = { mPos = 18, mLen = -1}, mQuery = {mPos = 25, mLen = 19}, mRef = {mPos = 18, mLen = -1}, mOriginCharset = {<nsACString_internal> = {<nsCSubstring_base> = {<No data fields>}, mData = 0x7fe5043bbd18 "", mLength = 0, mFlags = 5}, <No data fields>}, mParser = {mRawPtr = 0x2263fa0}, mFile = {mRawPtr = 0x0}, mHostA = 0x0, mHostEncoding = 1, mSpecEncoding = 1, mURLType = 2, mMutable = 0, mSupportsFileURL = 0, static gIDN = 0x223d940, static gCharsetMgr = 0x0, static gInitialized = 1, static gEscapeUTF8 = 1, static gAlwaysEncodeInUTF8 = 1, static gEncodeQueryInUTF8 = 0} (I had to make GetMainWidget a method, to be able to look at doc) Verified that this is x86_64 specific. I got a better trace by tweaking the plugin compilation flags: #5 0x00007f30ce7aa091 in CallQueryReferent<nsIWeakReference, nsISupports> (aSource=0x7f30d035a3c8, aDestination=0x7fffe955d658) at /usr/include/xulrunner-sdk-1.9pre/stable/nsIWeakReferenceUtils.h:63 #6 0x00007f30ce7aa0ec in nsIDocument::GetContainer (this=0x7f30d0398b20) at /usr/include/xulrunner-sdk-1.9pre/content/nsIDocument.h:680 #7 0x00007f30ce7a82f2 in hippoControl::GetMainWidget (this=0x7f30d024a370, aWindow=0x7f30d002d100) at ./../common/firefox/src/hippoControl.cpp:169 #8 0x00007f30ce7a85c7 in hippoControl::SetWindow (this=0x7f30d024a370, window=0x7f30d002d100) at ./../common/firefox/src/hippoControl.cpp:203 And here is something interesting: (gdb) p ((nsIDocument*)doc).mDocumentContainer.mRawPtr $18 = (class nsIWeakReference *) 0x7f30d0155e70 (gdb) p doc.mRawPtr.mDocumentContainer.mRawPtr $19 = (class nsIWeakReference *) 0x7f30d035a3c8 (gdb) p * $18 $20 = {<nsISupports> = {_vptr.nsISupports = 0x7f30e1501cb0}, <No data fields>} (gdb) p * $19 $21 = {<nsISupports> = {_vptr.nsISupports = 0x393538382d4f5349}, <No data fields>} $18 works, $19 is broken. $18 is what get passed to GetContainer() -> crash Would it be possible to: - Put a conditional breakpoint on nsEventListenerManager::HandleEvent (conditional so it doesn't trigger until you click the button in your test case) - Find the window and document objects from there, make sure they are OK - Put hardware watchpoints on mDocumentContainer->mRawPtr and mDocumentContainer->mRawPtr->mRefCnt [Using the typical (gdb) p &foo->bar $43 = (unsigned int *)0x432341234123123 (gdb) watch *$43 trick, if you aren't familar with that] And see where things go south? I tried to do something like that. * I added a breakpoint to nsEventStateManager::CheckForAndDispatchClick, mDocumentContainer was fine there (I also verified it's fine down until the JS methods). * I set a watch on mDocumentContainer.mRawPtr. Firefox crashed and the mRawPtr changed but gdb didn't break... Log below. Is the watch setup correct? Breakpoint 1, nsEventStateManager::CheckForAndDispatchClick (this=0x7f9ef213b370, aPresContext=0x7f9ef213af00, aEvent=0x7fff05ea54e0, aStatus=0x7fff05ea52bc) at nsEventStateManager.cpp:3278 3278 nsEventStatus* aStatus) Current language: auto; currently c++ (gdb) p &(aPresContext.mDocument.mRawPtr.mDocumentContainer.mRawPtr) $1 = (class nsIWeakReference **) 0x7f9ef050b3e8 (gdb) watch *$1 Hardware watchpoint 2: *$1 (gdb) continue Continuing. Breakpoint 1, nsEventStateManager::CheckForAndDispatchClick (this=0x7f9ef213b370, aPresContext=<value optimized out>, aEvent=0x7fff05ea54e0, aStatus=0x7fff05ea52bc) at nsEventStateManager.cpp:3278 3278 nsEventStatus* aStatus) (gdb) continue Continuing. Program received signal SIGSEGV, Segmentation fault. 0x00007f9eeb6c6091 in CallQueryReferent<nsIWeakReference, nsISupports> (aSource=0x7f9ef06dcd48, aDestination=0x7fff05ea35a8) at /usr/include/xulrunner-sdk-1.9pre/stable/nsIWeakReferenceUtils.h:63 63 reinterpret_cast<void**>(aDestination)); That looks right to me. - Have you checked that the document pointer is the same in the two locations? A sanity check: do p *$1 when you set the watch point, and then *$1 after you crash... if they are different, then the watch point didn't work, if they are the same, then that tells you something else. - I haven't tried hardware watchpoints on x86_64 before. Might be worth testing with a small test program. Before the crash: (gdb) p aPresContext.mDocument.mRawPtr $1 = (nsIDocument *) 0x7f8a3e0c36d0 (gdb) p &($1.mDocumentContainer.mRawPtr) $2 = (class nsIWeakReference **) 0x7f8a3e0c3718 (gdb) p *$2 $3 = (class nsIWeakReference *) 0x7f8a3ec972c0 After the crash: (gdb) p *$2 $4 = (class nsIWeakReference *) 0x7f8a3ec972c0 So nothing is changed, apparently. Though: (gdb) p this $5 = (const nsIDocument * const) 0x7f8a3e0c36d0 gdb) p $5.mDocumentContainer.mRawPtr $6 = (class nsIWeakReference *) 0x7f8a3e83d888 Document is the same but document.mDocumentContainer is changed. /me confused Same thing as above if I watch the nsWeakRef instead of the raw pointer. Transcribing from IRC in case it inspires anybody on cc: <owen> is ((char *)&this.mDocumentContainer - (char *)this) the same in the inside-gecko code, and in the inside-plugin code <marcopg> with "this" you mean the document? <owen> So, the theory I'm going on, is that something in the two compilation environments was different, resulting in a different binary layout for nsIDocument on x86_64, causing the inlined method to go kerfluey <owen> marcopg: Yeah <marcopg> yeah that's a theory that occurred to me too, I tried to make the compilation envs more similar, but I may be very well still missing something The problem is that there is ABI difference between internal xulrunner/firefox and external plugin mugshot: -xulrunner/firefox +mugshot plugin --- /tmp/state-ok 2008-04-22 22:04:48.000000000 +0200 +++ /tmp/state-crashing 2008-04-22 22:04:16.000000000 +0200 @@ -1,12 +1,12 @@ (gdb) ptype this type = const class nsIDocument : public nsINode { protected: - nsString mDocumentTitle; + nsString_external mDocumentTitle; nsCOMPtr<nsIURI> mDocumentURI; nsCOMPtr<nsIURI> mDocumentBaseURI; nsWeakPtr mDocumentLoadGroup; nsWeakPtr mDocumentContainer; due to #define nsString nsString_external in nsStringAPI.h the nsIDocument layout gets presentet differently outside but the same binary structure is passed which breaks the external access: (gdb) p sizeof ( nsString ) $7 = 16 (gdb) p sizeof ( nsString_external ) $8 = 24 It was crashing while accessing `mDocumentContainer' - its offset is dependent on `sizeof nsString'. The problem background was given by stransky. On i386 they are the same size, which explains why the bug is x86_64 specific: (gdb) p sizeof(nsString) $1 = 12 (gdb) p sizeof(nsString_external) $2 = 12 (gdb) p sizeof(nsCString) $3 = 12 (gdb) p sizeof(nsCString_external) $4 = 12 Looks like it has been fixed upstream a few months ago for i386 (it was affecting other extensions): https://bugzilla.mozilla.org/show_bug.cgi?id=390849 Perhaps we should just make the sizes match on x86_64 as well... Created attachment 303440 [details]
A patch to mugshot, uses internal strings with nsIDocument
I'm not sure if it's bug in mozilla. Mugshot mixes internal and external API's
together and I wonder how it even can work on any platform...
I think there is a bug in firefox since you can't use nsIDocument with external strings on x86_64 because of the nsString size difference. It could be argued that nsIDocument should not be used by plugins certainly, I'm defining it a bug on the base of #390849. For Fedora 9 using internal strings sounds like a reasonable solution. Because of http://developer.mozilla.org/en/docs/Migrating_from_Internal_Linkage_to_Frozen_Linkage I was under the impression that the possibility was going away with 1.9, but apparently that's not the case (or not yet). Just for the record, I don't think mugshot is currently mixing external and internal. It's using external only. To recap my understanding of the issue, after having talked with stransky on irc: * external strings + nsIDocument does not work on x86_64 because of the internal/external size difference. This has been fixed by https://bugzilla.mozilla.org/show_bug.cgi?id=390849 but it's still broken on x86_64. * nsIDocument is theoretically an internal header, but it has been made to work with internal linkage to ease the transition. See https://bugzilla.mozilla.org/show_bug.cgi?id=315562 * Currently it's possible to use internal strings/linkage (stransky patch) for the mugshot plugin because all the string symbols we are using are exported. Obviously this might change in the future and we might have to switch back to frozen strings (stransky run into this issue when porting epiphany/galeon). * The correct long term solution is to get the functionality of nsIDocument we need exposed in a public interface. Marco is the size issue (comment 26) filed upstream yet? We should definitely do so if it isn't... (In reply to comment #31) > Marco is the size issue (comment 26) filed upstream yet? We should definitely > do so if it isn't... https://bugzilla.mozilla.org/show_bug.cgi?id=430581 Martin's patch seems like the best bet for now. I'll test it tomorrow morning to make sure it works for me and apply it (with a comment.) Thanks everybody. For me the patch seems to prevent the extension from loading at all (presumably because the use of internal symbols is causing dlopen() to fail, though I don't know how to get firefox to print out the dlerror() visibly.) Since the page appearance with and without the extension is fairly small, my guess is that when Jan's patch was tested that was also the case. (The other possibility is that the visibility tricks that the Mozilla folks play aren't there on x86_64.) With the extension properly loaded, if you are logged into online.gnome.org, have the mugshot client running, and go to: http://online.gnome.org/application?id=mozilla-firefox It should show you information about your installed version, not "No Package Information" My best thought right now is #undef nsString #define nsString ProperlySizedDummyObject #include "nsIDocument.h" #undef nsString #define nsString nsString_external I'll see if I can make that work. Created attachment 303694 [details]
Patch I'm applying
Here's the patch I'm committing. Tested to work on i386. I'm hopeful that it
will fix the problem on x86_64.
mugshot-1.1.95-1.fc9 built in koji and confirmed to work properly on x86_64. I'll close this once it is pushed to Rawhide. |