Description of problem: x86_64 gnomad2 won't run (i386 version runs fine) Version-Release number of selected component (if applicable): 2.8.6-3.fc5 How reproducible: every time Steps to Reproduce: 1. type gnomad2 in shell 2. get X error as program exits 3. Actual results: error exit Expected results: gui comes up Additional info: I constructed this little shell script to access the i386 version on my i386 boot partition: #!/bin/sh if [ -d /x86root ] then PATH=/x86root/usr/lib/qt-3.3/bin:/x86root/usr/local/bin:/x86root/usr/bin:/x86root/bin:/x86root/usr/X11R6/bin:/x86root/usr/NX/bin:/x86root/usr/games:/home/tom/bin:/home/tom/scripts export PATH LD_LIBRARY_PATH=/x86root/usr/lib:/x86root/lib export LD_LIBRARY_PATH FULLNAME=/x86root/usr/bin/gnomad2 else FULLNAME=/usr/bin/gnomad2 fi exec $FULLNAME "$@" That works perfectly on x86_64 linux, but the native x86_64 version of gnomad2 gives randomly different X errors every time I start it. Sometimes protocol errors about bad request size, sometimes it claims it lost the connection to the X server - never the same error twice, but always some error from the X libraries. Something usually flashes on the screen for a fraction of a second as it tries to start, but then it disappears.
Could you invoke gnomad2 in gdb so I can get an idea of where it bombs? (Unfortunately I have no X86_64 box here..) gdb gnomad2 gdb> run
I'll see if I can gather more details tonight (maybe install some debuginfo rpms to get better info). One general bug I've found very common in porting X programs to 64 bit is all the varargs functions that expected to have their argument list terminated with a null pointer. I've seen lots of source code where folks were passing the literal 0 as the last argument, and on x86_64 g++, that winds up passing 4 bytes of zeroes and 4 bytes of uninitialized random trash instead of 8 bytes of zeroes (I consider this a gcc bug, but naturally the gcc folks disagree :-).
Created attachment 134759 [details] some walkbacks from X error calls Here are a couple of different walkbacks from error calls. I notice a very frequent error is the "Xlib: unexpected async reply" message which the X FAQ says is usually due to a multi-threaded program not using a multi-threaded X library. Could the x86_64 libs be built with the wrong flags (or the program linked with the wrong libs)?
Yeah, hm. Upstream (and that's me!) need X86_64 hardware to test and debug this unless someone else does. Upstream doesn't have such stuff... Sadly. Will investigate this myself the day I have it, until then, there is sadly not much to do. :-(
I just tried an interesting experiment. On x86_64, this command functions perfectly: taskset -c 0 /usr/bin/gnomad2 So it looks like you not only need 64 bit hardware, but dual processor 64 bit hardware to debug :-). Definitely some kind of threading bug, but where located and how to find, I wouldn't want to guess. Anyway, I have two work arounds now: run 32bit or run bound to single cpu.
Hmm, Interesting Linus, does gnomad2 use threads, and ifso could it be possible that it makes X-calls from multiple threads? Or does it only use gtk? In case it only uses gtk you should go and read: http://www.gtk.org/faq/#AEN482 Apply the suggestions there and hopefully this will be fixed :)
Yeah Hans you're onto something. Gnomad2 does use the proper thread locking (semaphore-like) syntax of GDK, but it seems to be a dangerous thing to do. Further down in the GTK FAQ they say that its better to just let one thread make GTK calls, so you should typically leave the main loop to do that and the simple solution is to create drawing functions and pass these to g_idle_add(). This is no so very simple so the fix will take some time do get right....
Much has changed in this area in later gnomad2 releases. Can you see if this bug still appears in F7?
I have used gnomad2 recently, and I just verified that I'm not running the silly wrapper I used to have to run the 32 bit version off my 32 bit boot partition, so I'd say this bug has indeed disappeared somewhere between the report and the x86_64 f7 I'm running now.
OK let's say it's working now...