Bug 524511 - ORTE_ERROR_LOG: Not found while running freefem++-3.5 testsuite
Summary: ORTE_ERROR_LOG: Not found while running freefem++-3.5 testsuite
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: openmpi
Version: 13
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Jay Fenlason
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-09-21 00:18 UTC by Dominik 'Rathann' Mierzejewski
Modified: 2014-08-31 23:29 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-06-27 14:24:12 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
log from a failed x86_64 build (73 bytes, text/plain)
2009-09-21 00:18 UTC, Dominik 'Rathann' Mierzejewski
no flags Details
log from a failed x86_64 build (72 bytes, text/plain)
2009-09-21 00:19 UTC, Dominik 'Rathann' Mierzejewski
no flags Details
build log from a failed x86_64 build (1.44 MB, text/plain)
2009-12-06 01:00 UTC, Dominik 'Rathann' Mierzejewski
no flags Details
log from a failed local mock build (rawhide/x86_64) (1.45 MB, text/plain)
2009-12-06 01:56 UTC, Dominik 'Rathann' Mierzejewski
no flags Details

Description Dominik 'Rathann' Mierzejewski 2009-09-21 00:18:20 UTC
Created attachment 361848 [details]
log from a failed x86_64 build

Description of problem:
freefem++-3.5 testsuite fails consistently on rawhide, but it works with openmpi-1.3.3-2.fc11 on F11.

Version-Release number of selected component (if applicable):
openmpi-1.3.3-5

How reproducible:
Always

Steps to Reproduce:
1. /usr/bin/koji build --scratch dist-f12 'cvs://cvs.fedoraproject.org/cvs/pkgs?rpms/freefem++/devel#freefem++-3_5-1_fc12'
  
Actual results:
[xenbuilder4.fedora.phx.redhat.com:18024] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file ess_hnp_module.c at line 130
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
  orte_plm_base_select failed
  --> Returned value Not found (-13) instead of ORTE_SUCCESS

... and so on.

Expected results:
The tests should pass.

Additional info:
Some more failed scratch builds:
https://koji.fedoraproject.org/koji/taskinfo?taskID=1693624
https://koji.fedoraproject.org/koji/taskinfo?taskID=1693607
https://koji.fedoraproject.org/koji/taskinfo?taskID=1693589

Note these strange lines in the log:
++ MPI_COMPILER='openmpi-x86_64%{_cc_name_suffix}'
...
++ MPI_HOME=/usr/lib64/openmpi@
...

Comment 1 Dominik 'Rathann' Mierzejewski 2009-09-21 00:19:33 UTC
Created attachment 361849 [details]
log from a failed x86_64 build

Comment 2 Jay Fenlason 2009-10-01 19:08:52 UTC
I get an error when I attempt to access the link in the attachment.  Can you attach the actual log instead of just a link to it?

I wasn't able to get freefem++ to build locally on my rawhide box until I enclosed the %build section in "exec bash << EOF" and "EOF" lines.  Apparently the /etc/profile.d/modules.sh script contains some bash-isms that don't work when run by /bin/sh.  That might be a bug in the environment-modules package.  However, once I edited the spec file, freefem++-3.5-2 compiled against openmpi-1.3.3-6.fc12 without problem.

Comment 3 Bug Zapper 2009-11-16 12:41:52 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 4 Dominik 'Rathann' Mierzejewski 2009-12-06 00:58:17 UTC
New builds are failing in exact the same way on koji/dist-f13. See for yourself:
koji build --scratch --arch=x86_64 dist-f13 'cvs://cvs.fedoraproject.org/cvs/pkgs?rpms/freefem++/devel#HEAD'

It works fine in mock.

Comment 5 Dominik 'Rathann' Mierzejewski 2009-12-06 01:00:58 UTC
Created attachment 376386 [details]
build log from a failed x86_64 build

Build log from: http://koji.fedoraproject.org/koji/taskinfo?taskID=1851955

Comment 6 Dominik 'Rathann' Mierzejewski 2009-12-06 01:56:29 UTC
Created attachment 376391 [details]
log from a failed local mock build (rawhide/x86_64)

I was wrong, local mock rawhide build fails, too.

Comment 7 Dominik 'Rathann' Mierzejewski 2010-01-15 20:33:07 UTC
ping? still fails with current rawhide (in mock)

Comment 8 Doug Ledford 2010-01-15 21:01:09 UTC
I'm not sure that this can reasonably be expected to work.  The error is a failure to run the openmpi runtime during the build process.  While it is expected that you can build an openmpi using app in a build root, there is no guarantee that you can run the same app in the build root.  It may have worked in the past, and they may have been pure coincidence that the defaults for a totally unconfigured openmpi install allow you to run a single process mpi job.  We can look into it, but I make no promises that running an mpi job in a build root will be officially supported.

Comment 9 Dominik 'Rathann' Mierzejewski 2010-01-15 23:40:50 UTC
Is it possible to configure the openmpi environment in an automated way as part of the build process then? I don't mind adding a few lines of shell script before running the testsuite in %check.

Comment 10 Bug Zapper 2010-03-15 12:51:14 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 13 development cycle.
Changing version to '13'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 11 Jay Fenlason 2010-10-14 20:21:10 UTC
I don't believe you can run openmpi programs on the builders, because you can't make any assumptions about the network configurations on the build machines.  (Or even that they have networking at all.)

Comment 12 Bug Zapper 2011-06-02 17:42:45 UTC
This message is a reminder that Fedora 13 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 13.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '13'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 13's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 13 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 13 Bug Zapper 2011-06-27 14:24:12 UTC
Fedora 13 changed to end-of-life (EOL) status on 2011-06-25. Fedora 13 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 14 Jonathan Underwood 2012-01-13 16:16:40 UTC
I realize this bug is closed, but I just hit this problem myself while building some packages for internal consumption and found a workaround.

The issue is described more clearly here:

http://permalink.gmane.org/gmane.comp.clustering.open-mpi.user/966

And the workaround I used was to simply add BuildRequires: rsh to my spec file. An alternative would be touch /usr/bin/rsh I suppose.

I do wonder if openmpi-devel should Require the rsh package such that (single host) opnmpi can be ran during %build/%check.

Anyway, posting this here in case someone else runs into the problem in the future.


Note You need to log in before you can comment on or make changes to this bug.