Red Hat Bugzilla – Bug 203677
X errors at startup in x86_64 gnomad2
Last modified: 2007-11-30 17:11:41 EST
Description of problem:
x86_64 gnomad2 won't run (i386 version runs fine)
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. type gnomad2 in shell
2. get X error as program exits
gui comes up
I constructed this little shell script to access the i386 version on my
i386 boot partition:
if [ -d /x86root ]
exec $FULLNAME "$@"
That works perfectly on x86_64 linux, but the native x86_64 version of
gnomad2 gives randomly different X errors every time I start it. Sometimes
protocol errors about bad request size, sometimes it claims it lost the
connection to the X server - never the same error twice, but always some
error from the X libraries. Something usually flashes on the screen for
a fraction of a second as it tries to start, but then it disappears.
Could you invoke gnomad2 in gdb so I can get an idea of where it
bombs? (Unfortunately I have no X86_64 box here..)
I'll see if I can gather more details tonight (maybe install some
debuginfo rpms to get better info). One general bug I've found very
common in porting X programs to 64 bit is all the varargs functions
that expected to have their argument list terminated with a null pointer.
I've seen lots of source code where folks were passing the literal 0
as the last argument, and on x86_64 g++, that winds up passing 4 bytes
of zeroes and 4 bytes of uninitialized random trash instead of 8 bytes
of zeroes (I consider this a gcc bug, but naturally the gcc folks
Created attachment 134759 [details]
some walkbacks from X error calls
Here are a couple of different walkbacks from error calls. I notice a very
frequent error is the "Xlib: unexpected async reply" message which the X
FAQ says is usually due to a multi-threaded program not using a
multi-threaded X library. Could the x86_64 libs be built with the wrong
flags (or the program linked with the wrong libs)?
Yeah, hm. Upstream (and that's me!) need X86_64 hardware to test and
debug this unless someone else does. Upstream doesn't have such stuff...
Sadly. Will investigate this myself the day I have it, until then,
there is sadly not much to do. :-(
I just tried an interesting experiment. On x86_64, this command functions
taskset -c 0 /usr/bin/gnomad2
So it looks like you not only need 64 bit hardware, but dual processor
64 bit hardware to debug :-). Definitely some kind of threading bug,
but where located and how to find, I wouldn't want to guess.
Anyway, I have two work arounds now: run 32bit or run bound to single cpu.
Interesting Linus, does gnomad2 use threads, and ifso could it be possible that
it makes X-calls from multiple threads?
Or does it only use gtk? In case it only uses gtk you should go and read:
Apply the suggestions there and hopefully this will be fixed :)
Yeah Hans you're onto something. Gnomad2 does use the proper thread
locking (semaphore-like) syntax of GDK, but it seems to be a dangerous
thing to do.
Further down in the GTK FAQ they say that its better to just let one thread
make GTK calls, so you should typically leave the main loop to do that
and the simple solution is to create drawing functions and pass these
This is no so very simple so the fix will take some time do get right....
Much has changed in this area in later gnomad2 releases. Can you
see if this bug still appears in F7?
I have used gnomad2 recently, and I just verified that I'm not running
the silly wrapper I used to have to run the 32 bit version off my
32 bit boot partition, so I'd say this bug has indeed disappeared somewhere
between the report and the x86_64 f7 I'm running now.
OK let's say it's working now...