Bug 182742 - ExcludeArch: ppc from tla package
ExcludeArch: ppc from tla package
Status: CLOSED NEXTRELEASE
Product: Fedora
Classification: Fedora
Component: tla (Show other bugs)
rawhide
powerpc Linux
medium Severity medium
: ---
: ---
Assigned To: Josh Boyer
Fedora Extras Quality Assurance
:
Depends On:
Blocks: F-ExcludeArch-ppc
  Show dependency treegraph
 
Reported: 2006-02-24 08:39 EST by Josh Boyer
Modified: 2007-11-30 17:11 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-03-17 21:58:16 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Patch to ignore long double for alignment purposes (746 bytes, patch)
2006-02-26 18:22 EST, Josh Boyer
no flags Details | Diff

  None (edit)
Description Josh Boyer 2006-02-24 08:39:53 EST
Description of problem:

tla currently fails building on PPC due to what appears to be the recent ABI
change in rawhide.

Version-Release number of selected component (if applicable):

tla-1.3.3-5

How reproducible:

Always

Steps to Reproduce:
1. checkout tla package from Extras CVS
2. rebuild it on PPC
3.
  
Actual results:

================ unit-ar tests ================
/home/jwboyer/rpm/BUILD/tla-1.3.3/src/hackerlab/tests/arrays-tests/unit-ar.c:53:botched
invariant
    0 == ((unsigned long)ar & (16 - 1))
PANIC: exiting on botched invariant
make[3]: *** [tests-timestamp] Error 2
make[3]: Leaving directory
`/home/jwboyer/rpm/BUILD/tla-1.3.3/src/=build/hackerlab/tests/arrays-tests'
make[2]: *** [test] Error 2
make[2]: Leaving directory
`/home/jwboyer/rpm/BUILD/tla-1.3.3/src/=build/hackerlab/tests'
make[1]: *** [test] Error 2
make[1]: Leaving directory `/home/jwboyer/rpm/BUILD/tla-1.3.3/src/=build/hackerlab'
make: *** [test] Error 2
make: Leaving directory `/home/jwboyer/rpm/BUILD/tla-1.3.3/src/=build'
error: Bad exit status from /var/tmp/rpm-tmp.74562 (%check)


RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.74562 (%check)


Expected results:

It builds correctly.

Additional info:

This appears to be some kind of issue between the computed MACHINE_ALIGNMENT
macro and the actual values returned from the testcase.  I've pinged some people
on the fedora-ppc list about this and will look into it further when I have time.
Comment 1 Ville Skyttä 2006-02-24 11:42:49 EST
I have a local update to 1.3.4 available, verified to build on Rawhide x86_64
and FC4 ppc.  I suppose you don't mind if I go ahead and commit it?  I don't
have access to a Rawhide ppc box though.  Do you?

(By the way, for some reason I wasn't auto-added to Cc on this bug, so I'd
appreciate if you could keep an eye on that and add manually if needed in the
future.)
Comment 2 Josh Boyer 2006-02-24 11:56:51 EST
(In reply to comment #1)
> I have a local update to 1.3.4 available, verified to build on Rawhide x86_64
> and FC4 ppc.  I suppose you don't mind if I go ahead and commit it?  I don't
> have access to a Rawhide ppc box though.  Do you?

No I don't mind at all.  That was going to be my next step anyway :).  I do have
access to a rawhide PPC box so I can do a test build there.  I don't think an
upgrade with fix this, but I might be pleasantly surprised.

> (By the way, for some reason I wasn't auto-added to Cc on this bug, so I'd
> appreciate if you could keep an eye on that and add manually if needed in the
> future.)

I just noticed that.  I'll ping Elliot as to why owners.list didn't auto-add you
and remember to manually add you until it gets fixed.
Comment 3 Ville Skyttä 2006-02-24 12:17:34 EST
Ok, committed but not tagged yet nor any builds requested.  I don't have
anything to add to the package at the moment, so feel free...
Comment 4 Josh Boyer 2006-02-24 14:16:27 EST
(In reply to comment #3)
> Ok, committed but not tagged yet nor any builds requested.  I don't have
> anything to add to the package at the moment, so feel free...

Same fail:

================ unit-ar tests ================
/home/jwboyer/rpm/BUILD/tla-1.3.4/src/hackerlab/tests/arrays-tests/unit-ar.c:53:botched
invariant
    0 == ((unsigned long)ar & (16 - 1))
PANIC: exiting on botched invariant
make[3]: *** [tests-timestamp] Error 2
make[3]: Leaving directory
`/home/jwboyer/rpm/BUILD/tla-1.3.4/src/=build/hackerlab/tests/arrays-tests'
make[2]: *** [test] Error 2
make[2]: Leaving directory
`/home/jwboyer/rpm/BUILD/tla-1.3.4/src/=build/hackerlab/tests'
make[1]: *** [test] Error 2
make[1]: Leaving directory `/home/jwboyer/rpm/BUILD/tla-1.3.4/src/=build/hackerlab'
make: *** [test] Error 2
make: Leaving directory `/home/jwboyer/rpm/BUILD/tla-1.3.4/src/=build'
error: Bad exit status from /var/tmp/rpm-tmp.95932 (%check)


RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.95932 (%check)


Crap.  Have to go digging to figure this out.
Comment 5 Josh Boyer 2006-02-24 16:50:57 EST
Ok, did some more digging.

The testcase calls ar_copy, which will copy an array to another array.  ar_copy
looks like this (after adding some printfs of my own):

void *
ar_copy (void * base,
       alloc_limits limits,
       size_t szof)
{
  void * answer;

  answer = 0;
  printf("ar_size: %ld\n", ar_size(base, limits, szof));
  printf("answer: %ld\n", (unsigned long) answer);
  ar_setsize (&answer, limits, ar_size (base, limits, szof), szof);
  printf("answer: %ld\n", (unsigned long) answer);
  mem_move (answer, base, szof * ar_size (base, limits, szof));
  return answer;
}

ar_setsize calls ar_ref, which in turn calls lim_malloc.  lim_malloc calls
must_malloc, which in turn just calls malloc.  So in the output below:

================ unit-ar tests ================
sizes: 360, arrays: 268689456
ar_size: 360
answer: 0
answer: 268690920
sizes: 360, arrays: 268690920
/home/jwboyer/rpm/BUILD/tla-1.3.4/src/hackerlab/tests/arrays-tests/unit-ar.c:53:botched
invariant
    0 == ((unsigned long)ar & (16 - 1))
PANIC: exiting on botched invariant

we see that answer was NULL before calling ar_setsize.  After it goes through
ar_setsize -> ar_ref -> lim_malloc -> malloc, the address of 268690920 (or
0x1003E5E8) is returned.  This isn't 16 byte aligned.

So it looks like malloc really is returning a non-16 byte aligned address...
Comment 6 David Woodhouse 2006-02-26 12:34:04 EST
That would probably be a glibc bug then. Reassigning.
Comment 7 Josh Boyer 2006-02-26 12:47:22 EST
As I said in an email to the fedora-ppc list, I can recreate this will a simple
program too.  Here it is below:

[jwboyer@net2-102 ~]$ gcc -o malloc-check malloc-check.c
[jwboyer@net2-102 ~]$ ./malloc-check
foo is not 16-byte aligned: 10011008
[jwboyer@net2-102 ~]$

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
        void *foo = NULL;
        void *bar = NULL;
        int i;

        for (i = 0; i < 1000; i++) {
                foo = malloc(360);
                bar = malloc(360);

                if ((unsigned long) foo & 0xf) {
                        printf("foo is not 16-byte aligned: %lx\n",
                                (unsigned long) foo);
                        return -1;
                }
                if ((unsigned long) bar & 0xf) {
                        printf("bar is not 16-byte aligned: %lx\n",
                                (unsigned long) bar);
                        return -1;
                }

                if (i % 150) {
                        free(foo);
                        free(bar);
                        foo = NULL;
                        bar = NULL;
                }
        }

        return 0;
}
Comment 8 Jakub Jelinek 2006-02-26 14:07:51 EST
If this is ppc 32-bit, then relying on malloc returning 16 byte aligned
addresses is wrong.  malloc only guarantees 2 * sizeof (void *) alignment,
on ppc32 that's 8 byte.  If you need more, you should use
posix_memalign or memalign.
Comment 9 Josh Boyer 2006-02-26 17:48:14 EST
(In reply to comment #8)
> If this is ppc 32-bit, then relying on malloc returning 16 byte aligned
> addresses is wrong.  malloc only guarantees 2 * sizeof (void *) alignment,
> on ppc32 that's 8 byte.  If you need more, you should use
> posix_memalign or memalign.

Hm..  that's good and bad news at the same time.  Good because glibc is ok, bad
because now I have to go hack tla to generate the correct MACHINE_ALIGNMENT
macro on PPC32.

Currently, it "discovers" machine alignment by walking through the standard
types and getting the sizeof().  sizeof(long double) returns 16, which is where
it's getting it's value from.  I suppose ignoring long double would work.  I'll
hack up a patch shortly.
Comment 10 Josh Boyer 2006-02-26 18:22:43 EST
Created attachment 125287 [details]
Patch to ignore long double for alignment purposes

This patch allows tla to build successfully on ppc again.  It's not the
prettiest thing in that it completely ignores long double in the gen-alignment
program, but I personally consider that programs behavior to be semi-broken to
begin with.  I can't think of a better way to work around the problem at the
moment either.

Ville, if you have no objections I'll commit this to CVS and kick off a
rebuild.
Comment 11 Ville Skyttä 2006-02-27 01:50:00 EST
Sure, go ahead.  Out of curiosity, is __PPC__ defined on ppc64, and does this
patch do the right thing on it too?
Comment 12 Josh Boyer 2006-02-27 07:14:15 EST
(In reply to comment #11)
> Sure, go ahead.  Out of curiosity, is __PPC__ defined on ppc64, and does this
> patch do the right thing on it too?

Yes, __PPC__ is defined on ppc64.  I can't find a predefined macro that is only
defined on ppc32 that we can use.  So it'd have to be something like:

#ifndef __PPC__ and __PPC64__
...


However, the package doesn't build on PPC64 for other reasons as well.  And the
FE buildsys always uses --target ppc to rpmbuild.  I plan on opening a bug
against upstream tla for this issue, so I think this patch will get us by until
it's fixed upstream.
Comment 13 Josh Boyer 2006-02-27 09:08:37 EST
Upstream bug:

https://savannah.gnu.org/bugs/?func=detailitem&item_id=15915
Comment 14 Josh Boyer 2006-02-27 10:05:00 EST
tla built successfully on rawhide.
Comment 15 Thomas Lord 2006-02-27 13:13:50 EST
Wow.  I always knew that `gen-alignment' would eventually
come back and bite me somehow -- I just never knew exactly
how.  Now I do.

If I'm following the reports correctly, the real bug 
is in the invariant check.   The invariant check makes
a false assumption that the machine's strictest alignment
requirements are determined by the size of the largest
scalar type.

The macro MACHINE_ALIGNMENT is poorly named.  It should 
really be something like:
CONSERVATIVE_ESTIMATE_OF_MACHINE_ALIGNMENT

The primary use of MACHINE_ALIGNMENT is to safely wrap
`malloc' in an allocator that adds a header to allocated
memory.   `ar.c' is an example of such an allocator.
The user asks for N bytes.  We want a K byte header.
So we allocate:

   N + ALIGNMENT * ((K + ALIGNMENT - 1) / ALIGNMENT)

bytes, put the header at byte 0, and start the user data
at byte:

   ALIGNMENT * ((K + ALIGNMENT - 1) / ALIGNMENT)

The invariant that we are truly trying to maintain is that,
so long as `malloc' returns memory suitably aligned for anything,
so does our wrapper (such as the `ar' functions).

The actual invariant test in the code tests for a stricter 
condition that happens not to be true in this case.   It's
the invariant, not `gen-alignment', that is bogus.

-t
Comment 16 Thomas Lord 2006-02-27 13:16:56 EST
One more thing: the "fix" of ignoring sizeof (long double) 
introduces a bug and should be removed (if it's there).

-t
Comment 17 Josh Boyer 2006-02-27 14:02:29 EST
(In reply to comment #16)
> One more thing: the "fix" of ignoring sizeof (long double) 
> introduces a bug and should be removed (if it's there).

Which bug is that?  Perhaps I've misunderstood all the cases MACHINE_ALIGNMENT
is used for, but in my testing the patch at least allowed tla to build on Fedora
Extras.

I'll gladly revert the patch and add a patch for the real fix, as soon as I know
what that is.
Comment 18 Thomas Lord 2006-02-27 16:16:48 EST
> Which bug is that?

Allocation functions should return memory suitably aligned for
any type.   If gen-alignment ignores some scalar types, then some
hackerlab functions can *theoretically* return memory that is not
suitably aligned for those types.

A better work-around and arguably a real fix might be to simply
delete the call to `invariant' that caused the build to fail.

-t
Comment 19 Thomas Lord 2006-02-27 16:20:38 EST
At any rate, I don't mean to be harsh, but "at least allowed tla
to build" is kind of a random approach.   We all do that kind of
thing from time to time to solve some immediate nebishy little
problem but it isn't an appropriate approach to take for something
that is going to be distributed to others who are counting on
some level of quality control.

-t
Comment 20 Josh Boyer 2006-02-27 17:19:30 EST
(In reply to comment #18)
> > Which bug is that?
> 
> Allocation functions should return memory suitably aligned for
> any type.   If gen-alignment ignores some scalar types, then some
> hackerlab functions can *theoretically* return memory that is not
> suitably aligned for those types.

Yes, sure.  However, in this specific case the hackerlab functions are relying
on malloc to return addresses for such alignment and that isn't valid.

> A better work-around and arguably a real fix might be to simply
> delete the call to `invariant' that caused the build to fail.

I can't understand how this is any better.  The invariant was there to check for
the exact condition you described above, right?  So if we delete the invariant,
it's still not going to check for alignment on long double types.

If hackerlab functions want to guarantee alignment for all types, they need to
use memalign as Jakub suggested above.  That seems to be more of a real fix than
just deleting the invariant.

I'm not trying to argue here, I'm just trying to understand where you're coming
from given that I'm pretty new to the tla code.

Comment 21 Josh Boyer 2006-02-27 17:27:57 EST
(In reply to comment #19)
> At any rate, I don't mean to be harsh, but "at least allowed tla
> to build" is kind of a random approach.   We all do that kind of
> thing from time to time to solve some immediate nebishy little
> problem but it isn't an appropriate approach to take for something
> that is going to be distributed to others who are counting on
> some level of quality control.

That's all very fair and I appreciate the critcism.  However, in this specific
case it wasn't random.  I looked into the problem and based on what I saw,
disabling long double wouldn't cause any real-world problems.  It's a
workaround, not a fix and I understand that completely.

Now, it's always a possibility that I've misunderstood something in the code
here.  I've opened the upstream bug to make sure that this problem is tracked
and resolved in the correct manner.  I could have simply "called it good" and
let it go at that, but I didn't.  And we still have time before FE5 is released
so there is time to fix this properly.

I'm reopening the bug so people don't get the wrong impression that this is
really fixed.

Comment 22 Thomas Lord 2006-02-27 17:43:09 EST
> Yes, sure.  However, in this specific case the hackerlab functions are relying
> on malloc to return addresses [aligned for any type] and that isn't valid.

Is that really true?  This is a genuine question -- I'm unclear what 
various standards require on this point.

I'm currently content to have libhackerlab assume that `malloc' returns
an address suitably aligned for any type because, regardless of what
the standards say, to implement `malloc' any other way is dumb.

>> A better work-around and arguably a real fix might be to simply
>> delete the call to `invariant' that caused the build to fail.

> I can't understand how this is any better.  The invariant was there to check 
> for the exact condition you described above, right?  So if we delete the 
> invariant, it's still not going to check for alignment on long double types.

So, long double is 16 bytes on PPC-32 and glibc malloc returns memory aligned on
8 byte boundaries.   I'm *assuming* that PPC-32 can read or write long doubles
on any 8 byte boundary even though a long double is 16 bytes.

If glibc malloc is sometimes not returning memory suitably aligned for
long double values then, I don't care what the standards say -- that is a 
serious glibc bug.

Meanwhile, the libhackerlab invariant is testing for a condition that is
stronger than needed and that can not hold true on a reasonable system.
The whole MACHINE_ALIGNMENT hack is meant only to preserve the alignment
guarantee of (a sane) `malloc' in situations like `ar.c'.   It doesn't
have to literally figure out the machine's alignment requirements.  The 
invariant was probably handy when I wrote the code on an x86 box and 
ported it to solaris, but it looks from the discussion in this bug list that,
in general, it's a bogus invariant.

> If hackerlab functions want to guarantee alignment for all types, they need to
> use memalign as Jakub suggested above.  That seems to be more of a real fix 
> than just deleting the invariant.

AFAIK, memalign is for handling alignment requirements that go *beyond* the
machine architecture requirements.   For example, an I/O device or the OS might 
require that a chunk of memory be aligned on a *page* boundary, even though
no machine instruction has any such requirement.   That's what memalign is
for.

Meanwhile:

    my_type * x = malloc (sizeof (my_type));

should always yield NULL or a valid pointer for a `my_type' lvalue.  No?

> I'm not trying to argue here, I'm just trying to understand where you're 
> coming from given that I'm pretty new to the tla code.

And I'm trying not to be a jackass, to be helpful, and to explain what
is admittedly the pretty subtle reasoning behind MACHINE_ALIGNMENT.  
I'm open to the possibility I'm full of it on these points but I don't
think I am.   We're cool.

-t
Comment 23 Josh Boyer 2006-02-27 18:04:23 EST
(In reply to comment #22)
> >> A better work-around and arguably a real fix might be to simply
> >> delete the call to `invariant' that caused the build to fail.
> 
> > I can't understand how this is any better.  The invariant was there to check 
> > for the exact condition you described above, right?  So if we delete the 
> > invariant, it's still not going to check for alignment on long double types.
> 
> So, long double is 16 bytes on PPC-32 and glibc malloc returns memory aligned on
> 8 byte boundaries.   I'm *assuming* that PPC-32 can read or write long doubles
> on any 8 byte boundary even though a long double is 16 bytes.
> 
> If glibc malloc is sometimes not returning memory suitably aligned for
> long double values then, I don't care what the standards say -- that is a 
> serious glibc bug.

I've added Jakub on CC again.  Jakub, any change you could expand upon this
situation a bit?

> Meanwhile, the libhackerlab invariant is testing for a condition that is
> stronger than needed and that can not hold true on a reasonable system.
> The whole MACHINE_ALIGNMENT hack is meant only to preserve the alignment
> guarantee of (a sane) `malloc' in situations like `ar.c'.   It doesn't
> have to literally figure out the machine's alignment requirements.  The 
> invariant was probably handy when I wrote the code on an x86 box and 
> ported it to solaris, but it looks from the discussion in this bug list that,
> in general, it's a bogus invariant.
> 
> > If hackerlab functions want to guarantee alignment for all types, they need to
> > use memalign as Jakub suggested above.  That seems to be more of a real fix 
> > than just deleting the invariant.
> 
> AFAIK, memalign is for handling alignment requirements that go *beyond* the
> machine architecture requirements.   For example, an I/O device or the OS might 
> require that a chunk of memory be aligned on a *page* boundary, even though
> no machine instruction has any such requirement.   That's what memalign is
> for.
> 
> Meanwhile:
> 
>     my_type * x = malloc (sizeof (my_type));
> 
> should always yield NULL or a valid pointer for a `my_type' lvalue.  No?

I think the pointer should be valid, just not always aligned to sizeof(my_tpye).
 If that is the case, then I think I agree that the invariant can probably go.
Comment 24 Josh Boyer 2006-03-03 06:38:02 EST
I noticed this in today's rawhide report:

glibc-2.3.91-1
--------------
* Thu Mar 02 2006 Jakub Jelinek <jakub@redhat.com> 2.3.91-1
- update from CVS
  - fixes for various arches
- ensure malloc returns pointers aligned to at least
  MIN (2 * sizeof (size_t), __alignof__ (long double))
  (only on ppc32 this has not been the case lately with addition
   of 128-bit long double, #182742)

Jakub, can I assume that ppc32 malloc addresses not being aligned to sizeof(long
double) really was a bug then?
Comment 25 Josh Boyer 2006-03-03 08:14:22 EST
I just rebuilt tla without the hack patch and it succeeded.  In talking with
David this morning, it was indeed a glibc bug.

Patch reverted in CVS and a build has been queued up.  Once that completes, I'll
close both this and the upstream bug.
Comment 26 Jakub Jelinek 2006-03-03 08:22:04 EST
Yes, unfortunately this bugreport initially didn't mention the magic keyword
long double anywhere :(.
ISO C99 requires that malloc returns pointers suitably aligned for all ISO C99
standard types and long double is one of them.
BTW, sizeof (long double) is completely wrong thing to check when you check
alignment, you need to check __alignof__ (long double).  On many architectures,
some types have different alignment than their size, e.g. on sparc32 or s390
long double is 8 byte aligned, but 16 byte in size.
Comment 27 Jakub Jelinek 2006-03-06 02:53:09 EST
Can you please re-add the workaround, we decided to revert the glibc change
for FC5, it is too dangerous this late in the devel cycle.
While we fixed #183895, we have yet to discuss how to fix #183894, which
will be much harder.
Comment 28 Josh Boyer 2006-03-06 08:43:53 EST
(In reply to comment #27)
> Can you please re-add the workaround, we decided to revert the glibc change
> for FC5, it is too dangerous this late in the devel cycle.

Damn.

> While we fixed #183895, we have yet to discuss how to fix #183894, which
> will be much harder.

Yes, I think we can add it.  Tom, do you agree that removing the invariant for
this particular failure would be the workaround for now?
Comment 29 David Woodhouse 2006-03-06 08:51:14 EST
I'm slightly dubious about the plan to ship glibc with this known to be
misbehaving -- it would be nicer to make sure _new_ allocations are all aligned
to 16 bytes, while still allowing free() of objects which are less aligned than
that.

But since it can be fixed in an update to FC5 glibc, since 'long double' does
actually _work_ with 8-byte alignment anyway (despite the ABI requiring 16
bytes), and since 'long double' is fairly esoteric, I suppose it isn't the end
of the world.
Comment 30 Josh Boyer 2006-03-17 21:58:16 EST
Ok, I've removed the invariant check in Extras.  When glibc is fixed, I'll
revert this patch.  In the meantime, tla should build and function properly.
Comment 31 Thomas Lord 2006-03-17 22:04:50 EST
Not to be overly pedantic but I think it is more correct to *not* revert the
patch even after glibc is fixed.   The invariant should be fixed in the upstream
Arch, too.

The MACHINE_ALIGNMENT macro (poorly named) is a conservative but reasonable guess
about alignment requirements.   The invariant wrongly insists that the native 
`malloc' use the same value.   

Make sense?

-t
Comment 32 Josh Boyer 2006-03-17 22:12:26 EST
Sure, that makes sense.  And I like it when people are pedantic, it makes me
understand things better.

I'll rephrase:  I'll be sure to drop the patch when we pull in the next upstream
update that contains the same or a similar fix.

;)
Comment 33 Thomas Lord 2006-03-17 22:58:02 EST
It may be worth reviewing the code too (and I've noted this thread for upstream)
to double check that the bogus alignment assumption about malloc isn't presumed
-- especially in the `ar_' module.   I don't personally have time to do so, at
the moment.

-t
Comment 34 Andy Tai 2006-03-21 07:51:18 EST
As the upstream for now, I have noted of this problem and will review the ar_
code  and try to come up with a correct fix.
Comment 35 Andy Tai 2006-03-22 04:41:55 EST
I have a fix in the latest tla snapshot.  I would like to test the fix in a
PPC32 environment.  Any help for this is appreciated.

Note You need to log in before you can comment on or make changes to this bug.