Bug 667852 - Build from source RPM package fails
Summary: Build from source RPM package fails
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: dietlibc
Version: 13
Hardware: arm9
OS: Unspecified
low
medium
Target Milestone: ---
Assignee: Enrico Scholz
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-07 02:14 UTC by Gordan Bobic
Modified: 2011-06-27 12:41 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-27 12:41:11 UTC
Type: ---


Attachments (Terms of Use)
Patch to remove hard-coding optimizer flags to -O2/-Os in multiple places (4.59 KB, patch)
2011-01-07 02:14 UTC, Gordan Bobic
no flags Details | Diff
Patch to remove hard-coded -Os -g3 CFLAGS flags in the spec file (1.69 KB, patch)
2011-01-07 02:18 UTC, Gordan Bobic
no flags Details | Diff
Rawhide build log with the two git patches applied. (286.99 KB, text/plain)
2011-01-07 20:46 UTC, Gordan Bobic
no flags Details
segfaulting bin-arm/diet (180.72 KB, application/octet-stream)
2011-01-07 20:48 UTC, Gordan Bobic
no flags Details
Build log with the optimizer flag patches applied (367.33 KB, text/plain)
2011-01-07 23:53 UTC, Gordan Bobic
no flags Details
segfaulting atexit (5.40 KB, application/octet-stream)
2011-01-07 23:54 UTC, Gordan Bobic
no flags Details
segfaulting tst-printf (47.19 KB, application/octet-stream)
2011-01-07 23:56 UTC, Gordan Bobic
no flags Details
build log with patch in comment 9 applied (375.37 KB, text/plain)
2011-01-08 18:46 UTC, Gordan Bobic
no flags Details
Segfaulting mmap_test (376.46 KB, application/octet-stream)
2011-01-08 18:47 UTC, Gordan Bobic
no flags Details
Build log with the 3 git patches and the 3 optimizer flag patches applied (367.12 KB, text/plain)
2011-01-08 19:53 UTC, Gordan Bobic
no flags Details
segfaulting tst_strtod (89.60 KB, application/octet-stream)
2011-01-08 19:55 UTC, Gordan Bobic
no flags Details
Build log with only the git patches so far (no -O fixup) (375.55 KB, text/plain)
2011-01-09 00:52 UTC, Gordan Bobic
no flags Details
segfaulting tst_strtod (440.66 KB, application/octet-stream)
2011-01-09 00:53 UTC, Gordan Bobic
no flags Details
backtrace of alignment-faulting tst-strtod (2.84 KB, application/octet-stream)
2011-02-20 03:06 UTC, Gordan Bobic
no flags Details

Description Gordan Bobic 2011-01-07 02:14:18 UTC
Created attachment 472171 [details]
Patch to remove hard-coding optimizer flags to -O2/-Os in multiple places

Description of problem:
Build from the source package fails with a segfault on the ARM platform (armv5tel). There are several problems, some of which I am attaching patches for.

1) CFLAG passing is a complete mess. CFLAGS are being passed from:
a) rpmrc (good)
b) spec (bad)
c) the following files in the source distribution (very bad):
- dietlibc-0.32/findcflags.sh
- dietlibc-0.32/contrib/Makefile.dyn
- dietlibc-0.32/libpthread/Makefile

Version-Release number of selected component (if applicable):
Tested on F12 and F13 ARM rootfs distros (gcc 4.4.2 and 4.4.4 respectively), with dietlibc packages from F12, F13 and F14, all behave in the exact same way. 

dietlibc-0.32-0.fc12.src.rpm (F12/F13)
dietlibc-0.32-1400.fc14.src.rpm (F14)

Patches provided are against the F14 package (dietlibc-0.32-1400.fc14.src.rpm) because it is the most recent one.

The reason this is important is because building the "diet" binary fails when -Ox, x={s,2,3} is used. The source distribution provides some mechanisms for things like this to be handled in the Makefile, but this is largely clobbered by the extra -O flags being passed in.

How reproducible:
Every time.

Steps to Reproduce:
rpmbuild --rebuild dietlibc-0.32-1400.fc14.src.rpm
  
Actual results:
Build fails because the binary "bin-arm/diet" built during the build segfaults. Here are the last few lines:

----snip----
bin-arm/diet gcc -D__dietlibc__ -O2 -g -march=armv5te -fomit-frame-pointer -fno-exceptions -fno-asynchronous-unwind-tables -fno-stack-protector -Os -g3 -Werror-implicit-function-declaration  -o bin-arm/elftrunc contrib/elftrunc.c
make: *** [bin-arm/elftrunc] Segmentation fault
error: Bad exit status from /var/tmp/rpm-tmp.Y1qRwn (%build)
----snap----

Even when the attached patches are applied, the regular build stills fails with segfaults during the %check stage. I looked into the first failing test briefly, and gdb reveals the following:

----snip----
# gdb atexit
(gdb) run
Starting program: /usr/src/redhat/BUILD/dietlibc-0.32/test/atexit

Program received signal SIGSEGV, Segmentation fault.
0x00008230 in ?? ()
(gdb) backtrace
Cannot access memory at address 0x0
#0  0x00008230 in ?? ()
#1  0x000081dc in __libc_exit (code=0) at lib/atexit.c:25
#2  0x00008104 in _start () at arm/start.S:34
----snap----

So the problem seems to be in arm/start.S on line 34.

----snip----
_start:

        mov     fp, #0                  @ clear the frame pointer
        ldr     a1, [sp], #4            @ argc
        mov     a2, sp                  @ argv
        ldr     ip, .L3
        add     a3, a2, a1, lsl #2      @ &argv[argc]
        add     a3, a3, #4              @ envp
        str     a3, [ip, #0]            @ environ = envp
        bl      main

@
@ The exit status from main() is already in r0.
@ We need to branch to 'exit' in case we have linked with 'atexit'.
@
        bl      exit
----snap----

Line 34 is the one with "bl exit" on it.

Expected results:
Binary RPM package should be generated.

Additional info:
Attached patches (package patch and a spec file patch) make the package build (just about), but even so, debuginfo.list doesn't get generated, and %check stage fails in a number of places, so the only way to actually get this to build the binary RPMs is to use:

rpmbuild --define='%check exit 0' -bb dietlibc.spec --define='%debug_package %{nil}'

which skips the %check stage self-test and building of the debug packages.

This package also even fails to build cleanly on x86. On ARM the problem is just much worse (many more self-tests fail with segfaults).

Comment 1 Gordan Bobic 2011-01-07 02:18:13 UTC
Created attachment 472172 [details]
Patch to remove hard-coded -Os -g3 CFLAGS flags in the spec file

-g3 doesn't appear to be documented on gcc
-O flags should be coming from the defaults in rpmrc or in exceptional cases get overriden in the build source where necessary. In this instance most compiler invocations ended up having both -O2 and -Os (sometimes twice), which was breaking some code.

Comment 2 Enrico Scholz 2011-01-07 15:01:11 UTC
Can you please try the rawhide version?  You might need to apply the last two patches from github:

https://github.com/ensc/dietlibc/commit/749ea37e7793f58be8f0131b82d1affd249de244.patch

https://github.com/ensc/dietlibc/commit/0fb8d66c33252c784d3e0a5d16d1b78095c92d92.patch


When this version segfaults too, please attach one of the crashing binaries and the complete buildlog.

Comment 3 Gordan Bobic 2011-01-07 20:46:51 UTC
Created attachment 472294 [details]
Rawhide build log with the two git patches applied.

Comment 4 Gordan Bobic 2011-01-07 20:48:02 UTC
Created attachment 472295 [details]
segfaulting bin-arm/diet

Comment 5 Gordan Bobic 2011-01-07 23:29:28 UTC
It would appear that at least a part of the problem that dietlibc has on the ARM architecture is coming from breaking alignment. Running a build + test suite (with the patches I have attached here, which is the only way to make it successfully build on ARM) results in over a million alignment violations being logged. Each one will result in corrupted data being retrieved. That's a lot of corrupted data.

The alignment is an ARM specific issue. It can be partially worked around by enabling auto-fixing of alignment in the kernel, but this comes with a significant performance penalty so isn't really acceptable.

To try this, check /proc/cpu/alignment on an ARM machine before and after building the packages and building the test suite. Some ARM CPUs have automatic fixing for this in hardware, so /proc/cpu/alignment will never show any violations, but it still slows things down even with it done in hardware. A SheevaPlug is a good example of a machine with no hardware alignment fixer where these errors will clearly show up.

On a separate note, the rawhide version with the two git patches seems to pass a lot more of the self-tests than before, but still fails on a lot of them, and two of those are segfaults, even with the auto-fixing of alignment enabled in the kernel.

The alignment traps catch alignment violations in the following test files:

test-canon
tst-limits
tst-printf
tst-rand48
tst-sscanf
tst-strtod
tst-strtol
tst-strtoll

The test suite files that segfault are:
atexit
tst-printf

I will attach those segfaulting binaries and the build log with the two previously attached patches applied.

Comment 6 Gordan Bobic 2011-01-07 23:53:07 UTC
Created attachment 472323 [details]
Build log with the optimizer flag patches applied

Comment 7 Gordan Bobic 2011-01-07 23:54:42 UTC
Created attachment 472324 [details]
segfaulting atexit

Comment 8 Gordan Bobic 2011-01-07 23:56:11 UTC
Created attachment 472325 [details]
segfaulting tst-printf

Comment 10 Gordan Bobic 2011-01-08 18:34:34 UTC
That last patch fixes the diet executable segfault. It also seems to have fixed the tst_printf segfault.

However, now  mmap_test segfaults and throws misaligned access errors on ARM. This doesn't happen if my optimizer flag patch is applied.

I will attach the build log and the mmap_test binary.

Comment 11 Gordan Bobic 2011-01-08 18:46:29 UTC
Created attachment 472371 [details]
build log with patch in comment 9 applied

Comment 12 Gordan Bobic 2011-01-08 18:47:54 UTC
Created attachment 472372 [details]
Segfaulting mmap_test

Comment 13 Gordan Bobic 2011-01-08 19:52:53 UTC
Also - with the optimizer flag patches applied, mmap_test segfault goes away. atexit segfault remains. mmap_test also stops generating a misaligned access.

However, interestingly, it does cause tst_strtod to start segfaulting.

Build log and tst_strtod are attached.

Comment 14 Gordan Bobic 2011-01-08 19:53:41 UTC
Created attachment 472375 [details]
Build log with the 3 git patches and the 3 optimizer flag patches applied

Comment 15 Gordan Bobic 2011-01-08 19:55:16 UTC
Created attachment 472376 [details]
segfaulting tst_strtod

Comment 16 Enrico Scholz 2011-01-08 23:32:30 UTC
I detected and fixed two problems in mmap_test: the mmap() function was completely broken, and exit() executes random code (which caused the segfault).

https://github.com/ensc/dietlibc/commit/542652118de1889d18c4608f1a31a0e4ee640f5d.diff
https://github.com/ensc/dietlibc/commit/6747a03d7683e970c35ac147a7dfc16217b024ac.diff

Comment 17 Gordan Bobic 2011-01-09 00:52:17 UTC
Created attachment 472395 [details]
Build log with only the git patches so far (no -O fixup)

Attached is the build log with only the provided git patches applied. My optimizer flag patches weren't applied in this build.

tst-strtod segfaults.

Comment 18 Gordan Bobic 2011-01-09 00:53:48 UTC
Created attachment 472396 [details]
segfaulting tst_strtod

Comment 19 Enrico Scholz 2011-01-09 14:18:26 UTC
The tst_strtod segfaults on other architectures too and happens due to an endless recursion in __dtostr().  Must be fixed another time...

Else, can you test rawhide please?  I fixed time(2) + getrlimit(2) issues there and it would be nice when these changes (especially getrlimit()) could be tested on a real platform.

Comment 20 Gordan Bobic 2011-01-10 11:38:15 UTC
OK, thank you for clarifying the tst_strtod.

What about the other 6-7 unknown test failures as per the build logs above?
If these are also expected, then is it also expected that the package has to be built with --define='%check exit 0' ? If that is the case then perhaps it should be set in the spec file until the relevant fix-ups are in place.

Also, what about the optimizer flags? Having multiple -O and -g parameters being passed is at least confusing. In a number of cases gcc is invoked with -O2 -g -Os -g3.

Also in the findcflags.sh -march is being set purely according to the gcc version number - that doesn't seem right. Surely -march should only come from rpmrc, should it not?

Comment 21 Gordan Bobic 2011-01-10 11:42:29 UTC
Re: Rawhide, I have been testing with the rawhide package 0.33-1500 since you first mentioned it in comment 2. Is there a newer rawhide version now? I can't seem to see it on my local mirror at the moment.

Comment 22 Enrico Scholz 2011-01-10 18:58:08 UTC
http://koji.fedoraproject.org/koji/buildinfo?buildID=213379

See

http://pkgs.fedoraproject.org/gitweb/?p=dietlibc.git;a=blob;f=runtests-X.sh;h=a0dbfc46d17d32f320a410b9c7a00e298a67d5d1;hb=HEAD

for explanations of failed tests.  The expected failures will be ignored so that %check is expected to succeed.

Multiple -O or -g flags are ok (last one wins).  I will remove the -g3 in one of the next release (which was added to debug something, afair).

Results from findcflags.sh are ignored (they are overridden by the CFLAGS make option)

Comment 23 Gordan Bobic 2011-01-10 20:04:20 UTC
I understand that some tests are known to fail, but if you look at the build logs I attached, you'll see a few tests failed that WEREN'T expected to fail, and the package build still fails. Also, I notice you mention you fixed something to do with time(2). Interestingly, when I try to build util-vserver which builds against dietlibc by default, it fails to link with this error:


diet -Os gcc -O2 -g -march=armv5te -std=c99 -Wall -pedantic -W -funit-at-a-time -o src/filetime src/filetime.o  lib/libvserver.a  
src/filetime.o: In function `main':
/root/rpmbuild/BUILD/util-vserver-0.30.216-pre2926/src/filetime.c:74: undefined reference to `time'
collect2: ld returned 1 exit status
make[2]: *** [src/filetime] Error 1
make[2]: Leaving directory `/usr/src/redhat/BUILD/util-vserver-0.30.216-pre2926'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/usr/src/redhat/BUILD/util-vserver-0.30.216-pre2926'
make: *** [all] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.qfEQdh (%build)

It builds fine against glibc. Can you hazard a guess as to what might be going wrong? I thought it might be related since it is complaining about undefinied reference "time" when linking against dietlibc but not when linking against glibc.

Comment 24 Enrico Scholz 2011-01-10 21:22:23 UTC
all the unexpectedly failing tests seem to be due to "No such file or directory". This should be fixed in the comment 22 build which adds previously missing time(2) + getrlimit(2) implementations.

Comment 25 Gordan Bobic 2011-01-10 21:43:04 UTC
Oh, my bad. Are the other three patches from git also required against that package? Or are they already rolled in there?

Comment 26 Gordan Bobic 2011-01-11 00:21:18 UTC
I just tried the 0.33-1502 build. Builds cleanly on ARM, and doesn't trigger any alignment issues at all - awesome. :)

Still can't build util-vserver against it, though:


diet -Os gcc -O2 -g -march=armv5te -std=c99 -Wall -pedantic -W -funit-at-a-time -o src/lockfile src/lockfile.o   
src/lockfile.o: In function `main':
/root/rpmbuild/BUILD/util-vserver-0.30.216-pre2926/src/lockfile.c:124: undefined reference to `alarm'
collect2: ld returned 1 exit status
make[2]: *** [src/lockfile] Error 1
make[2]: Leaving directory `/usr/src/redhat/BUILD/util-vserver-0.30.216-pre2926'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/usr/src/redhat/BUILD/util-vserver-0.30.216-pre2926'
make: *** [all] Error 2
error: Bad exit status from /var/tmp/rpm-tmp.xSKMxN (%build)


I take it alarm() isn't implemented yet on ARM?

Comment 27 Enrico Scholz 2011-01-14 20:38:21 UTC
Please try recent rawhide git (http://koji.fedoraproject.org/koji/taskinfo?taskID=2722215)

Comment 28 Gordan Bobic 2011-01-24 16:29:20 UTC
I cannot find the 0.33-1503 source rpms in koji. Can you provide a direct link?

Comment 29 Gordan Bobic 2011-02-07 16:13:31 UTC
Any chance of a src.rpm for this?

Comment 30 Gordan Bobic 2011-02-20 01:51:45 UTC
OK, I've finally found 0.33-1504 in rawhide. It builds OK with alignment fix-up enabled, but something still causes alignment errors. Thousands of these flood the logs. From a single build run I see 32904 of these, all spewed in a 5 second window:

Alignment trap: lt-regression-t (30975) PC=0x4014a014 Instr=0xe0d310b2 Address=0x000e23d7 FSR 0x001

Comment 31 Gordan Bobic 2011-02-20 02:10:56 UTC
The alignment fault appears to happen in test tst-strtod, which segfaults anyway, so the build actually still succeeds even with alignment fix-up disabled.

This may, however, indicate an additional fault somewhere, on top of the known cause of the segfault.

Comment 32 Gordan Bobic 2011-02-20 02:54:16 UTC
Here is what gdb says:

# gdb --quiet tst-strtod core.20299
Reading symbols from /usr/src/redhat/BUILD/dietlibc-0.33.20101223/test/stdlib/tst-strtod...done.
[New Thread 20299]
Core was generated by `./tst-strtod'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000c448 in __aeabi_dcmple ()

Backtrace attached in a separate file.

Comment 33 Gordan Bobic 2011-02-20 03:06:23 UTC
Created attachment 479735 [details]
backtrace of alignment-faulting tst-strtod

Lines like this continue pretty much indefinitely, so the file is truncated. I killed GDB when the output got to 10MB.

Comment 34 Gordan Bobic 2011-02-20 03:20:43 UTC
Also, in the context of the util-vserver building mentioned earlier, which requires dietlibc, it now gets further, but still fails:


diet -Os gcc -O2 -g -march=armv5te -std=c99 -Wall -pedantic -W -funit-at-a-time -o src/vunify src/vunify.o  lib_internal/libinternal-diet.a lib/libvserver.a  
/usr/lib/dietlibc/lib-arm/libc.a(utime.o): In function `utime':
(.text+0x18): undefined reference to `__NR_utime'
collect2: ld returned 1 exit status

Does that mean that particular function isn't implemented in dietlibc on ARM yet?

Comment 35 Enrico Scholz 2011-02-20 16:10:25 UTC
try again; I hope that all missing syscall are now available...

Comment 36 Gordan Bobic 2011-02-20 17:15:15 UTC
0.33-1505 doesn't seem to build successfully on ARM:


gcc -D__dietlibc__ -I. -isystem include -O2 -g -march=armv5te -fomit-frame-pointer -fno-exceptions -fno-asynchronous-unwind-tables -fno-stack-protector -Os -g3 -Werror-implicit-function-declaration   -c lib/__utime.c -o bin-arm/__utime.o -D__dietlibc__
lib/__utime.c: In function 'utime':
lib/__utime.c:8: error: implicit declaration of function 'utimes'
make: *** [bin-arm/__utime.o] Error 1

Comment 37 Gordan Bobic 2011-02-21 20:24:28 UTC
0.33-1600 seems to resolve all of the build problems on ARM I mentioned so far. :)

The unaligned access is gone and the build errors I was seeing due to missing functions are gone, too.

There is one other problem that cropped up (a bus error) that I need to investigate further as I am not certain whether it is an issue in dietlibc.

Comment 38 Gordan Bobic 2011-02-22 11:59:48 UTC
Doing a little more digging into that bus error, it's possible that there may be something funny happening in this specific case on ARM in mmap or madvise.

See the thread I posted here:
http://archives.linux-vserver.org/201102/0058.html

It also seems reminiscent of this bug (symptoms are identical but there isn't much attached in the bug report that would indicate whether it is in fact a similar issue):
https://bugzilla.redhat.com/show_bug.cgi?id=442346

The package I am building against dietlibc is this one:
http://people.linux-vserver.org/~dhozac/t/uv-testing/util-vserver-0.30.216-pre2935.tar.bz2
(rpmbuild -tb)

Comment 39 Enrico Scholz 2011-03-12 21:41:53 UTC
I found a bug in dietlibc's sigjmp() code which might be responsible for the seen issue.  Please try recent rawhide.

Comment 40 Gordan Bobic 2011-03-24 10:55:16 UTC
The latest rawhide (dietlibc-0.33-0.1600.20110311.fc16.src.rpm) builds cleanly now on F13/armv5tel, and util-vserver builds cleanly against it.

Thank you for fixing this, please close the bug.

Comment 41 Gordan Bobic 2011-03-24 13:58:31 UTC
Any chance that this latest, rawhide version could be pushed to F13, F14 and F15? Otherwise we won't have a working dietlibc in ARM Fedora distro until F16.

Comment 42 Bug Zapper 2011-05-30 12:06:56 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 43 Bug Zapper 2011-06-27 12:41:11 UTC
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.