Bug 1128499

Summary: Compile using xbuild with mono hangs at random places
Product: [Fedora] Fedora Reporter: Ken Hall <kjhall55>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 22CC: dan, gansalmon, i, itamar, jonathan, kernel-maint, kevin, madhu.chinakonda, mavit, mchehab, moez.roy
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-22 11:22:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1089426, 1220138    
Bug Blocks: 1150993, 1222120, 1254357    

Description Ken Hall 2014-08-10 22:50:46 UTC
Description of problem:

Since update to kernel-3.15.7-200.fc20.x86_64, Opensim program will not compile under mono using xbuild.  Process hangs at random points in the compile sequence.  Starting strace of running process will usually cause sequence to resume, but will hang again at a later point at the following:

--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=2587, si_status=0, si_utime=47, si_stime=3} ---
futex(0x21748cc, FUTEX_WAIT_PRIVATE, 1, NULL

This problem is repeatable on two different machines using Fedora 20 at the same level.  Both machines are AMD Quad Core on MSI motherboard, but CPUs and mainboards are different models.

A version of Opensim that compiled successfully on kernel level 3.15.6-200 will not compile on either 3.15.7-200 or 3.15.8-200.  Mono components have not been updated since 2013.

xbuild is part of mono-devel-2.10.8-5.fc20.x86_64

Version-Release number of selected component (if applicable):
Fedora 20 under kernel 3.15.7-200 and above

How reproducible:
Every time, but at different locations in the compile stream.

Steps to Reproduce:
1. Install mono from Fedora repo
2. Download Opensim from www.osgrid.org
3. Compile using xbuild

Actual results:
Hangs at some point in the compile stream

Expected results:
Should complete normally.

Additional info:

Comment 1 Peter Oliver 2014-10-12 19:33:51 UTC
It appears that this also affects builds of keepass: https://koji.fedoraproject.org/koji/taskinfo?taskID=7840578

Comment 2 Dan HorĂ¡k 2014-11-10 21:07:44 UTC
seeing the hang in futex, it might actually be bug 1155291, so an update to latest kernel should fix it

Comment 3 Kevin Fenzi 2014-11-10 21:19:57 UTC
So, all the builders are running 3.17.2-200.fc20 now... 

Do any of these problems persist?

Comment 4 Peter Oliver 2014-11-11 20:02:58 UTC
I still see this, I'm afraid.  http://koji.fedoraproject.org/koji/taskinfo?taskID=8092197

Comment 5 Kevin Fenzi 2014-11-11 21:22:41 UTC
kojibui+ 31624  0.0  0.1 372280 27468 ?        Sl   Nov10   0:01 /usr/bin/mono /usr/lib/mono/4.0/xbuild.exe /target:KeePass /property:Configuration=Release

is the process that is hanging there. 

And it seems that stracing it caused it to complete. ;(

Comment 6 Ken Hall 2014-11-12 15:18:38 UTC
I'm still seeing it too, and running strace will often cause it to resume, but sometimes it stalls again later and strace is ineffective.

3.16.7-200.fc20.x86_64

Comment 7 Ken Hall 2014-11-18 18:52:41 UTC
Still seeing it on 3.17.2-200.fc20.x86_64.  It does appear to be load-related though, I have two nearly identical machines. On one lightly loaded, I was able to get through the build normally for the first time in months.  But on the "production" machine, which runs 10-20% busy, the build hung multiple times.  Running strace got it going again, but eventually it just hung on the futex:

futex(0xaad664, FUTEX_WAIT_PRIVATE, 1, NULL

Comment 8 Moez Roy 2015-01-07 21:47:27 UTC
I was getting the same issue building pinta for EPEL7.

It kept failing and failing.

Then I tried on December 25th during the night, and the build was successful.

Comment 9 Christopher Meng 2015-01-20 03:53:58 UTC
Can we first update the dated components in the Fedora? I don't know if upstream stil wants to dive into an EOL version..

Comment 10 Jaroslav Reznik 2015-03-03 16:11:45 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 22 development cycle.
Changing version to '22'.

More information and reason for this action is here:
https://fedoraproject.org/wiki/Fedora_Program_Management/HouseKeeping/Fedora22

Comment 11 Justin M. Forbes 2015-10-20 19:35:30 UTC
*********** MASS BUG UPDATE **************

We apologize for the inconvenience.  There is a large number of bugs to go through and several of them have gone stale.  Due to this, we are doing a mass bug update across all of the Fedora 22 kernel bugs.

Fedora 22 has now been rebased to 4.2.3-200.fc22.  Please test this kernel update (or newer) and let us know if you issue has been resolved or if it is still present with the newer kernel.

If you have moved on to Fedora 23, and are still experiencing this issue, please change the version to Fedora 23.

If you experience different issues, please open a new bug report for those.

Comment 12 Ken Hall 2015-10-21 21:11:10 UTC
Upgraded system to Fedora 22 with kernel 4.2.3-200.fc22.x86_64 and the problem does not seem to be occurring anymore, compiles are completing normally on both machines.

Comment 13 Josh Boyer 2015-10-22 11:22:21 UTC
Thanks for letting us know.