Bug 203677

Summary: X errors at startup in x86_64 gnomad2
Product: [Fedora] Fedora Reporter: Tom Horsley <horsley1953>
Component: gnomad2Assignee: Linus Walleij <triad>
Status: CLOSED NEXTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: extras-qa, hdegoede
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-08-18 14:40:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
some walkbacks from X error calls none

Description Tom Horsley 2006-08-23 01:20:02 UTC
Description of problem:

x86_64 gnomad2 won't run (i386 version runs fine)

Version-Release number of selected component (if applicable):

2.8.6-3.fc5

How reproducible:

every time

Steps to Reproduce:
1. type gnomad2 in shell
2. get X error as program exits
3.
  
Actual results:

error exit

Expected results:

gui comes up

Additional info:

I constructed this little shell script to access the i386 version on my
i386 boot partition:

#!/bin/sh
if [ -d /x86root ]
then
  
PATH=/x86root/usr/lib/qt-3.3/bin:/x86root/usr/local/bin:/x86root/usr/bin:/x86root/bin:/x86root/usr/X11R6/bin:/x86root/usr/NX/bin:/x86root/usr/games:/home/tom/bin:/home/tom/scripts
   export PATH
   LD_LIBRARY_PATH=/x86root/usr/lib:/x86root/lib
   export LD_LIBRARY_PATH
   FULLNAME=/x86root/usr/bin/gnomad2
else
   FULLNAME=/usr/bin/gnomad2
fi
exec $FULLNAME "$@"

That works perfectly on x86_64 linux, but the native x86_64 version of
gnomad2 gives randomly different X errors every time I start it. Sometimes
protocol errors about bad request size, sometimes it claims it lost the
connection to the X server - never the same error twice, but always some
error from the X libraries. Something usually flashes on the screen for
a fraction of a second as it tries to start, but then it disappears.

Comment 1 Linus Walleij 2006-08-23 07:41:17 UTC
Could you invoke gnomad2 in gdb so I can get an idea of where it
bombs? (Unfortunately I have no X86_64 box here..)

gdb gnomad2
gdb> run

Comment 2 Tom Horsley 2006-08-23 14:25:00 UTC
I'll see if I can gather more details tonight (maybe install some
debuginfo rpms to get better info). One general bug I've found very
common in porting X programs to 64 bit is all the varargs functions
that expected to have their argument list terminated with a null pointer.
I've seen lots of source code where folks were passing the literal 0
as the last argument, and on x86_64 g++, that winds up passing 4 bytes
of zeroes and 4 bytes of uninitialized random trash instead of 8 bytes
of zeroes (I consider this a gcc bug, but naturally the gcc folks
disagree :-).


Comment 3 Tom Horsley 2006-08-23 22:59:31 UTC
Created attachment 134759 [details]
some walkbacks from X error calls

Here are a couple of different walkbacks from error calls. I notice a very
frequent error is the "Xlib: unexpected async reply" message which the X
FAQ says is usually due to a multi-threaded program not using a
multi-threaded X library. Could the x86_64 libs be built with the wrong
flags (or the program linked with the wrong libs)?

Comment 4 Linus Walleij 2006-08-24 21:50:53 UTC
Yeah, hm. Upstream (and that's me!) need X86_64 hardware to test and
debug this unless someone else does. Upstream doesn't have such stuff...
Sadly. Will investigate this myself the day I have it, until then,
there is sadly not much to do. :-(

Comment 5 Tom Horsley 2006-08-24 22:15:21 UTC
I just tried an interesting experiment. On x86_64, this command functions
perfectly:

   taskset -c 0 /usr/bin/gnomad2

So it looks like you not only need 64 bit hardware, but dual processor
64 bit hardware to debug :-). Definitely some kind of threading bug,
but where located and how to find, I wouldn't want to guess.

Anyway, I have two work arounds now: run 32bit or run bound to single cpu.


Comment 6 Hans de Goede 2006-11-13 13:16:52 UTC
Hmm,

Interesting Linus, does gnomad2 use threads, and ifso could it be possible that
it makes X-calls from multiple threads?

Or does it only use gtk? In case it only uses gtk you should go and read:
http://www.gtk.org/faq/#AEN482

Apply the suggestions there and hopefully this will be fixed :)


Comment 7 Linus Walleij 2006-11-13 15:40:43 UTC
Yeah Hans you're onto something. Gnomad2 does use the proper thread
locking (semaphore-like) syntax of GDK, but it seems to be a dangerous
thing to do.

Further down in the GTK FAQ they say that its better to just let one thread
make GTK calls, so you should typically leave the main loop to do that
and the simple solution is to create drawing functions and pass these
to g_idle_add().

This is no so very simple so the fix will take some time do get right....

Comment 8 Linus Walleij 2007-08-17 22:24:38 UTC
Much has changed in this area in later gnomad2 releases. Can you
see if this bug still appears in F7?

Comment 9 Tom Horsley 2007-08-18 02:27:18 UTC
I have used gnomad2 recently, and I just verified that I'm not running
the silly wrapper I used to have to run the 32 bit version off my
32 bit boot partition, so I'd say this bug has indeed disappeared somewhere
between the report and the x86_64 f7 I'm running now.

Comment 10 Linus Walleij 2007-08-18 14:40:36 UTC
OK let's say it's working now...