Bug 494026 - mono build is blocked by ppc-build.
mono build is blocked by ppc-build.
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: mono (Show other bugs)
rawhide
powerpc Linux
high Severity high
: ---
: ---
Assigned To: Paul F. Johnson
Fedora Extras Quality Assurance
: Reopened
Depends On:
Blocks: F-ExcludeArch-ppc FE-ExcludeArch-ppc64 F12Target
  Show dependency treegraph
 
Reported: 2009-04-03 14:43 EDT by Fabian Deutsch
Modified: 2010-01-25 11:14 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2010-01-25 11:14:44 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
patch to GC_test_and_set (1.09 KB, patch)
2009-04-09 16:44 EDT, Steven Munroe
no flags Details | Diff

  None (edit)
Description Fabian Deutsch 2009-04-03 14:43:48 EDT
As seen here http://koji.fedoraproject.org/koji/taskinfo?taskID=1260389
packaging mono is currently blocked by the ppc-build.
This has also been discussed here http://fcp.surfsite.org/modules/newbb/viewtopic.php?topic_id=68952&viewmode=flat&order=ASC&start=0

I just opened this bugreport to track this issue, as all the new features of mono 2.4 don't slip into fedora (e.g. ASP.NET MVC addin for MonoDevelop)
Comment 1 Paul F. Johnson 2009-04-03 16:39:45 EDT
I've spoken to some of the devs at Novell who cannot reproduce the problem on any of their systems (including whatever they use for opensuse). They have asked if we can run the build through gdb and report back the problems.

I've asked spot, but not had any response.
Comment 2 Toshio Ernie Kuratomi 2009-04-09 11:01:29 EDT
Blocking ppc and F11 trackers
Comment 3 Bill Nottingham 2009-04-09 11:39:25 EDT
The last successful build was RC1. The first failed build was RC2. RC2 did change ppc specific bits - ppc64 TLS was added, plus support for NPTL on both ppc32 and ppc64.
Comment 4 Steven Munroe 2009-04-09 13:40:34 EDT
pulled the mono-2.4 release and ran into a problem where gcc-4.4 is being more pedantic then gcc-4.2/3.

for example very early we see:

In file included from ./include/private/gc_priv.h:95,
                 from alloc.c:19:
./include/private/gc_locks.h: In function ‘GC_test_and_set’:
./include/private/gc_locks.h:162: error: ‘asm’ operand has impossible constraints
make[3]: *** [alloc.lo] Error 1


          __asm__ __volatile__(
               "1:\tlwarx %0,0,%3\n"   /* load and reserve               */
               "\tcmpwi %0, 0\n"       /* if load is                     */
               "\tbne 2f\n"            /*   non-zero, return already set */
               "\tstwcx. %2,0,%1\n"    /* else store conditional         */
               "\tbne- 1b\n"           /* retry if lost reservation      */
               "\tsync\n"              /* import barrier                 */
               "2:\t\n"                /* oldval is zero if we set       */
              : "=&r"(oldval), "=p"(addr)
              : "r"(temp), "1"(addr)
              : "cr0","memory");
Comment 5 Steven Munroe 2009-04-09 15:09:29 EDT
The latest from bdwgc is a little different:

  149   __asm__ __volatile__(
  150                "1:lwarx %0,0,%1\n"   /* load and reserve               */
  151                "cmpwi %0, 0\n"       /* if load is                     */
  152                "bne 2f\n"            /*   non-zero, return already set */
  153                "stwcx. %2,0,%1\n"    /* else store conditional         */
  154                "bne- 1b\n"           /* retry if lost reservation      */
  155                "2:\n"                /* oldval is zero if we set       */
  156               : "=&r"(oldval)
  157               : "r"(addr), "r"(temp)
  158               : "memory", "cr0");

Note that =p(addr) is gone and the back ref "1"(addr) is not needed.

I would guess gcc-4.4 is choking on the =p(addr)

Does mono need to update to a newer version of Boehm GC?
Comment 6 Steven Munroe 2009-04-09 16:44:54 EDT
Created attachment 338995 [details]
patch to GC_test_and_set

This patch fixes the compile error with gcc-4.4
Comment 7 Steven Munroe 2009-04-09 16:49:57 EDT
The above patch allows the make for mono-2.4 to complete given the configure options:

./configure --prefix=/usr/local --with-moonlight=no

Now I see regression errors in the make check.

Test run: image=/home/ppcteam/mono-ppc/mono-131222/mono/mini/basic-long.exe, opts=
Test 'test_2_neg' failed result (got 3, expected 2).
Test 'test_0_neg_large' failed result (got 1, expected 0).
Test 'test_1_simple_neg' failed result (got 0, expected 1).
Results: total tests: 88, failed: 3, cfailed: 0 (pass: 96.59%)
Elapsed time: 0.010395 secs (0.001288, 0.009107), Code size: 26908
Comment 8 Steven Munroe 2009-04-09 17:53:10 EDT
We get something different results with older versions of GCC 4.1/4.3 with those we see:

365 test(s) passed. 2 test(s) did not pass.

Failed tests:

finalizer-wait.exe
critical-finalizers.exe

so there could be latent PPC GCC-4.4 bug, but could also be due to other system differences.
Comment 9 Steven Munroe 2009-04-10 11:38:42 EDT
realized that I was using the trunk instead of the 2.4 branch so trying again with 

http://ftp.novell.com/pub/mono/sources/mono/mono-2.4.tar.bz2

Also updated the gcc packages from yum.
Comment 10 Paul F. Johnson 2009-04-10 16:33:46 EDT
I've alerted the Novell people (via the mono developers list) of this problem as well as posting this BZ url.

Hopefully, we'll be able to get a ppc build running soon.

/me wishes someone would donate a PPC box to him so he could help get this fixed....
Comment 11 Steven Munroe 2009-04-10 17:10:30 EDT
We can build mono-2.4 from the tar file on F11 with the attached patch. Working
with the vargaz from monodev this patch has been applied to trunk and mono-2-4
branch

But I am still see the the basic-long failure on of the make check. This is not
the case with other distros (with older gcc).

So please veriry that the mono-2-4 builds in the fedora build environment then
we can close this bugz and open new bugz for these make check failures.
Comment 12 Paul F. Johnson 2009-04-11 09:54:00 EDT
Nope. still not building, same problem.

http://koji.fedoraproject.org/koji/getfile?taskID=1290915&name=build.log

:-(
Comment 13 Steven Munroe 2009-04-13 10:15:26 EDT
Did you verify that the attached patch is applied. and that you gcc is up2date?
Comment 14 Steven Munroe 2009-04-13 10:20:24 EDT
Also this may be a clue

"make[6]: execvp: mcs: Permission denied"

please check you security settings!
Comment 15 Toshio Ernie Kuratomi 2009-04-13 11:50:47 EDT
Link to koji build page:
http://koji.fedoraproject.org/koji/taskinfo?taskID=1290915

From the build.log:
Patch #8 (mono-24-ppc-glocks.patch):
+ /bin/cat /builddir/build/SOURCES/mono-24-ppc-glocks.patch
+ /usr/bin/patch -s -p1 -b --suffix .glocks-ppc --fuzz=0

Checked cvs to ensure this is the patch provided in this bug report.


From the root.log:
DEBUG util.py:256:    gcc.ppc 0:4.4.0-0.32                 gcc-c++.ppc 0:4.4.0-0.32

I note that the spec currently has moonlight enabled:

%configure --with-ikvm=yes --with-jit=yes --with-xen_opt=yes \
           --with-moonlight=yes --disable-static --with-preview=yes \
           --with-libgdiplus=installed
Comment 16 Toshio Ernie Kuratomi 2009-04-13 13:03:05 EDT
I'm not 100% sure but I think that: "make[6]: execvp: mcs: Permission denied" is because Paul has changed the spec file to rebootstrap the package.  So /usr/bin/mcs is not being found.  After that, the code tries to use the mcslite bootstrapping binaries and fails.  We shouldn't have to rebootstrap, though, because releng put the old package, that had ppc builds back into the buildroot.
Comment 17 Toshio Ernie Kuratomi 2009-04-13 15:40:09 EDT
Okay, bootstrap code turned off, confirmed patch has been applied.  Latest build gets farther but still fails with a stack overflow on ppc:

http://koji.fedoraproject.org/koji/getfile?taskID=1295223&name=build.log

Build task is:
http://koji.fedoraproject.org/koji/taskinfo?taskID=1295223
Comment 18 Steven Munroe 2009-04-13 17:16:40 EDT
Ok this is still some goofey problem specific to your build environment, because I can build mono-2.4 from the svn branch (and the tar file with patch). I have verified that I can compile Mono.Xml.Xsl/PatternTokenizer.cs within my F11 mono-2.4 build.

one difference is I build --with-moonlight=no
Comment 19 Paul F. Johnson 2009-04-13 17:25:38 EDT
Can you try with moonlight=yes please? At least we can factor that one out then. That said, when I've pushed 2.4 release and RC2 + RC3 through they had moonlight=no as well (actually they didn't have any of the moonlight options taken as the default is no).
Comment 20 Toshio Ernie Kuratomi 2009-04-13 18:06:36 EDT
I've tried with --with-moonlight=no and also with just::
  %configure --with-moonlight=no --disable-static

In each of those cases it stops at the same point in the build.

Steven, is this error coming from mcs or some other tool?

One thing to remember is that this is being built with the bootstrapping code disabled, so we're using mcs from mono-core-2.4-RC1 in this build.  (Although Paul's build.log with bootstrapping enabled showed that the bootstrapping mcs will error at a different point in the build).

Can you also confirm that the tarball before patch has md5sum:
  da2bf1c0aba2958d26c5e8a9a49fd9d1  mono-2.4.tar.bz2
(I noticed that the RC's and final all have the same name. :-(

Finally, if you think this is something to do with the buildsystem's environment, can you try rpm --rebuild of the source rpm on your F11 system?  It's available here:
  curl 'http://koji.fedoraproject.org/koji/getfile?taskID=1295214&name=mono-2.4-14.fc11.src.rpm' > mono-2.4-14.fc11.src.rpm
  rpm --rebuild mono-2.4-14.fc11.src.rpm
Comment 21 Steven Munroe 2009-04-13 18:57:08 EDT
When I add the --disable-static I see the following failure:

echo "#define XSLT_PATTERN" > Mono.Xml.Xsl/PatternParser.cs
./../../jay/jay -ct Mono.Xml.Xsl/PatternParser.jay < ./../../jay/skeleton.cs >>Mono.Xml.Xsl/PatternParser.cs
./../../jay/jay: 3 rules never reduced
./../../jay/jay: 1 shift/reduce conflict, 46 reduce/reduce conflicts.
echo "#define XSLT_PATTERN" > Mono.Xml.Xsl/PatternTokenizer.cs
cat System.Xml.XPath/Tokenizer.cs >>Mono.Xml.Xsl/PatternTokenizer.cs
MCS     [basic] System.Xml.dll
Stack overflow in unmanaged: IP: 0xf8b4b54, fault addr: 0xff18bda0
Stack overflow in unmanaged: IP: 0xf8b4b54, fault addr: 0xff18abd0
Stack overflow in unmanaged: IP: 0xf8b4b54, fault addr: 0xff189ff0
Stack overflow in unmanaged: IP: 0xf8b4b54, fault addr: 0xff188e20
Stack overflow in unmanaged: IP: 0xf8b4b54, fault addr: 0xff187c50
Stack overflow in unmanaged: IP: 0xfd98510, fault addr: 0xff186ef0
Stack overflow in unmanaged: IP: 0xf8b4b54, fault addr: 0xff185ea0
Stack overflow in unmanaged: IP: 0xf8b4b54, fault addr: 0xff184cd0
Stack overflow: IP: 0xfd98510, fault addr: 0xff183f70
At Unmanaged

So the --disable-static seems to be part of the issue. 

Also verified the checksum:

 mono-ppc]$ md5sum mono-2.4.tar.bz2
da2bf1c0aba2958d26c5e8a9a49fd9d1  mono-2.4.tar.bz2
Comment 22 Toshio Ernie Kuratomi 2009-04-13 20:38:12 EDT
Following onto the previous comment:

If I don't have --disable-static but do have --with-static_mono=no, I get the Stack overflow.  --disable-static implies --with-static_mono=no

http://koji.fedoraproject.org/koji/taskinfo?taskID=1295972
Comment 23 Toshio Ernie Kuratomi 2009-04-13 22:38:31 EDT
And if we link statically against libmono, we are able to build on ppc:

https://koji.fedoraproject.org/koji/taskinfo?taskID=1296119
Comment 24 Paul F. Johnson 2009-04-14 12:27:23 EDT
Apparently, --disable-static is not supported or tested upstream and future releases may rely on the static libs (makes for a quicker runtime as well).

Can we allow mono to have static libs in?
Comment 25 Fabian Deutsch 2009-04-14 12:41:01 EDT
The --dibale-static option is also not used in the OpenSUSE build - which builds fine (according to some #mono-devel irc people).
It is also not present in their spec file.
Comment 26 Toshio Ernie Kuratomi 2009-04-14 14:15:43 EDT
We won't ship the static libs.  It would be okay to build mono itself against a static libmono.a *as a temporary workaround*.  We'd definitely want to fix this by F12.

The big issue I'd want an answer to is whether this is specific to mono dynamically linking to libmono.  If it's going to happen with other things that link against libmono to embed the runtime, then this is a much larger issue than if it just affects mono.
Comment 27 Paul F. Johnson 2009-04-14 14:26:14 EDT
By the looks, we build against libmono.a but then don't need to bundle it. It is also only an issue with mono and nothing else - yet.
Comment 28 Toshio Ernie Kuratomi 2009-04-14 14:51:58 EDT
(In reply to comment #27)
> By the looks, we build against libmono.a but then don't need to bundle it.

Yep.  We can rm -rf the static libraries.

> It is also only an issue with mono and nothing else - yet.  

This is an oversimplification.  Nothing we ship uses libmono ATM.  But that doesn't mean that people using Fedora aren't embedding libmono into their applications.  Since we don't know precisely what the trigger is other than linking to libmono dynamically, we don't know how far reaching this is.
Comment 29 Toshio Ernie Kuratomi 2009-04-14 20:01:11 EDT
Okay, new mono build is in the buildsystem:
  http://koji.fedoraproject.org/koji/taskinfo?taskID=1299249

This will be mono-2.4 final with the patch from Steven (Thanks!) and building against a static libmono.

Removing this from the F11Blocker bug and adding to the F12Blocker.  Things we need to do as soon as possible and definitely before the F12 release:

* Get mono linking dynamically against libmono again.
* Try to bootstrap mono onto ppc64.
Comment 30 Steven Munroe 2009-04-15 11:05:39 EDT
that will be a trick! 

The cross (compiling 64-bit on a 32-bit default system) requires lots of 64-bit packages that may or may not exist and some hacking around brain-dead pkgconfig/libtool isms. Starting with glib-2.0 and pcre I have had to hack pkgconfig and libtool la files to for 64-bit packages that did not bother to provide them. In some cases I resorted to running configure then hacking config.status to to replace lib with lib64.

Also you will need a 64-bit clean version of glibconfig.h as mono depends on it to get its int/pointer casts right. Without this fix any 64-bit mono build is doomed. It seems the glib is incapable of providing a biarch clean devel package.

Finally mkbundle is brain-dead as it blindly exec's as and ld without applying to the appropriate -m64/-m32 switches. I have patches to hack around that as well.

It is obviously simpler to build 64-bit on a 64-bit primary systems. I suspect it is just are hard to build a 32-bit mono on a 64-bit primary system, but I have not tried that yet.
Comment 31 Toshio Ernie Kuratomi 2009-04-15 14:27:05 EDT
Luckily we build a ppc64 primary system, we just don't ship it by default :-)

http://koji.fedoraproject.org/koji/taskinfo?taskID=1300604

Assuming that my next build with bootstrapping off works[1]_, we'll just have to figure out what's going wrong with dynamic linking to libmono.

.. _[1]: http://koji.fedoraproject.org/koji/taskinfo?taskID=1300753
Comment 32 Toshio Ernie Kuratomi 2009-04-15 16:37:54 EDT
Steven, I *think* I have a simpler test case.

Once you have mono-2.4 final built and installed:
  cd mono-2.4/samples/embed
  gcc -o teste teste.c `pkg-config --cflags --libs mono` -lm
  mcs test.cs
  ./teste test.exe
  Segmentation fault

And:
  cd mono-2.4/samples/embed
  gcc -Wall -o test-invoke test-invoke.c `pkg-config --cflags --libs mono` -lm
  mcs invoke.cs
  ./test-invoke invoke.exe
  Segmentation fault

These work when run on an F10 i386 box with mono-2.4 final rpms so it seems like it's related... although it could be I'm looking at a different bug now.
Comment 33 Paul F. Johnson 2009-04-16 11:21:45 EDT
Do you get this if you use gmcs instead of mcs?
Comment 34 Toshio Ernie Kuratomi 2009-04-16 12:26:50 EDT
Yes.
Comment 35 Toshio Ernie Kuratomi 2009-07-01 17:24:04 EDT
Just checked and this has not been fixed:
http://koji.fedoraproject.org/koji/taskinfo?taskID=1447139
Comment 36 Adam Williamson 2009-07-17 13:52:01 EDT
As discussed at today's blocker bug review meeting, since we have a current 'workaround' (actually blessed by upstream) with no catastrophic consequences, can't consider this blocking F12 release. Dropping to F12Target.

-- 
Fedora Bugzappers volunteer triage team
https://fedoraproject.org/wiki/BugZappers
Comment 37 Toshio Ernie Kuratomi 2009-07-17 14:42:31 EDT
Steven, do you have the time to look at this at some point or are you terribly busy?  By F13, ppc is going to be a secondary arch and we'll probably go to dynamic linking for the F13 rawhide cycle whether or not this bug is fixed.

Note You need to log in before you can comment on or make changes to this bug.