Bug 156647 - (gcc -O2) elinks segfault on ppc & ia64
Summary: (gcc -O2) elinks segfault on ppc & ia64
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: elinks
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Karel Zak
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: FC4Blocker
TreeView+ depends on / blocked
 
Reported: 2005-05-02 20:58 UTC by Jeremy Katz
Modified: 2007-11-30 22:11 UTC (History)
3 users (show)

Fixed In Version: 0.10.3-2
Clone Of:
Environment:
Last Closed: 2005-05-10 12:37:08 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Add debugging printouts to find_in_cache() (1.16 KB, patch)
2005-05-07 22:11 UTC, Miloslav Trmač
no flags Details | Diff
Log from -O0 (2.88 KB, text/plain)
2005-05-07 22:12 UTC, Miloslav Trmač
no flags Details
Log from -O2 (2.72 KB, text/plain)
2005-05-07 22:13 UTC, Miloslav Trmač
no flags Details

Description Jeremy Katz 2005-05-02 20:58:52 UTC
elinks seems to segfault on ppc when going to http://gate.crashing.org/~benh/xorg

#0  0x10058934 in doc_loading_callback ()
#1  0x100507c0 in connect_info ()
#2  0x100507c0 in connect_info ()
(gdb)

Comment 1 Karel Zak 2005-05-05 19:01:05 UTC
... and on ia64 too.

Comment 2 Miloslav Trmač 2005-05-07 22:11:00 UTC
Created attachment 114132 [details]
Add debugging printouts to find_in_cache()

The failure on ppc goes away after changing
CFLAGS="-O2 -g -W -Wall $(getconf LFS_CFLAGS)"
by s/-O2/-O0/.

Maybe the attached data will help somebody figure it out,
I don't know the code at all: applying the attached patch
shows that find_in_cache() starts returning NULL with -O2.

Comment 3 Miloslav Trmač 2005-05-07 22:12:03 UTC
Created attachment 114133 [details]
Log from -O0

Comment 4 Miloslav Trmač 2005-05-07 22:13:46 UTC
Created attachment 114134 [details]
Log from -O2

Both logs are from running 'elinks http://gate.crashing.org/~benh/xorg 2>$log';

I wasn't able to test elinks on ia64.

Comment 5 Karel Zak 2005-05-08 00:10:09 UTC
Note, I think it's possible test it on arbitrary HTML page. I had a problem with
elinks from actual FC4 and with upstream version 0.10.5 on pages like
<html><body>foo</body></html>.

Comment 6 Warren Togami 2005-05-08 04:15:57 UTC
If this happens with -O2 and not -O0, shouldn't this be assigned to gcc?


Comment 7 Jakub Jelinek 2005-05-09 09:56:02 UTC
Generally, if something works with -O0 and does not with -O2, it is more often
an application bug than GCC bug.  Only when you debug it and prove it is indeed
a GCC bug it should be reassigned to GCC.
Particularly in this case, the bug goes away with -O2 -fno-strict-aliasing,
and there are 94 places where GCC warns about aliasing problems:
grep warning.*type-punned elinks.log | sort -u | wc -l
94
Plus there are several places where the code violates those but GCC does not
warn.
Say in find_in_cache, all the lists.h macros used there are buggy.
And error.h even shows that the authors see the problems, just for unknown
reason can't admit it is their bug and not a compiler bug:
/* This function does nothing, except making compiler not to optimize certains
 * spots of code --- this is useful when that particular optimization is buggy.
 * So we are just workarounding buggy compilers. */
/* This function should be always used only in context of compiler version
 * specific macros. */
void do_not_optimize_here(void *x);

#if defined(__GNUC__) && __GNUC__ == 2 && __GNUC_MINOR__ <= 7
#define do_not_optimize_here_gcc_2_7(x) do_not_optimize_here(x)
#else
#define do_not_optimize_here_gcc_2_7(x)
#endif

#if defined(__GNUC__) && __GNUC__ == 3
#define do_not_optimize_here_gcc_3_x(x) do_not_optimize_here(x)
#else
#define do_not_optimize_here_gcc_3_x(x)
#endif

#if defined(__GNUC__) && __GNUC__ == 3 && __GNUC_MINOR__ == 3
#define do_not_optimize_here_gcc_3_3(x) do_not_optimize_here(x)
#else
#define do_not_optimize_here_gcc_3_3(x)
#endif

The lists implementation is broken by design, it just can't work that way.
You can't access the same object through aliasing incompatible types.
But lists.h is doing that a lot, it sometimes accesses next/prev as void *,
sometimes as struct cache_entry *, etc.
Cleanest fix IMHO would be to use a void *next; void *prev; structure and
put that structure as first field into the various structures that are chained
into lists, say:
struct cache_entry
{
  struct list_head_elinks head;
  ...
}
and then the macro use cached->head.prev, etc.  What will also work
is just make the prev/next pointers void *, but directly in the structure, say
struct cache_entry
{
  void *next; void *prev;
  ...
}
and have
struct list_head_elinks
{
  void *next; void *prev;
};

But writing/reading through void ** pointer and then writing/reading through
struct cache_entry ** pointer is violation of ISO C99 6.5 (6,7).

Comment 8 Miloslav Trmač 2005-05-10 12:37:08 UTC
Jakub, thanks again.


Note You need to log in before you can comment on or make changes to this bug.