Bug 1799473

Summary: gromacs: FTBFS in Fedora rawhide/f32
Product: [Fedora] Fedora Reporter: Fedora Release Engineering <releng>
Component: gromacsAssignee: Christoph Junghans <junghans>
Status: CLOSED RAWHIDE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 32CC: dakingun, dominik, junghans, orion
Target Milestone: ---Flags: junghans: needinfo+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-23 23:34:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1750908    
Attachments:
Description Flags
build.log
none
root.log
none
state.log none

Description Fedora Release Engineering 2020-02-06 17:16:54 UTC
gromacs failed to build from source in Fedora rawhide/f32

https://koji.fedoraproject.org/koji/taskinfo?taskID=41318028


For details on the mass rebuild see:

https://fedoraproject.org/wiki/Fedora_32_Mass_Rebuild
Please fix gromacs at your earliest convenience and set the bug's status to
ASSIGNED when you start fixing it. If the bug remains in NEW state for 8 weeks,
gromacs will be orphaned. Before branching of Fedora 33,
gromacs will be retired, if it still fails to build.

For more details on the FTBFS policy, please visit:
https://fedoraproject.org/wiki/Fails_to_build_from_source

Comment 1 Fedora Release Engineering 2020-02-06 17:16:57 UTC
Created attachment 1659221 [details]
build.log

file build.log too big, will only attach last 32768 bytes

Comment 2 Fedora Release Engineering 2020-02-06 17:16:59 UTC
Created attachment 1659222 [details]
root.log

file root.log too big, will only attach last 32768 bytes

Comment 3 Fedora Release Engineering 2020-02-06 17:17:01 UTC
Created attachment 1659223 [details]
state.log

Comment 4 Christoph Junghans 2020-02-06 17:43:04 UTC
The ppc64 error:
-- Could not find any flag to build test source (this could be due to either the compiler or binutils)
CMake Error at cmake/gmxManageSimd.cmake:51 (message):
  Cannot find IBM VSX compiler flag.  Use a newer compiler, or disable SIMD
  support (slower).
Call Stack (most recent call first):
  cmake/gmxManageSimd.cmake:265 (gmx_give_fatal_error_when_simd_support_not_found)
  CMakeLists.txt:719 (gmx_manage_simd)
-- Configuring incomplete, errors occurred!

Something is wrong with SIMD flag.

On aarch64 the error is:
Mdrun cannot use the requested (or automatic) number of ranks, retrying with 8.
Abnormal return value for ' gmx mdrun    -nb cpu   -notunepme >mdrun.out 2>&1' was 1
Retrying mdrun with better settings...
.....
98% tests passed, 1 tests failed out of 46
Label Time Summary:
GTest              =  33.68 sec*proc (40 tests)
IntegrationTest    =   6.53 sec*proc (5 tests)
MpiTest            =   2.22 sec*proc (3 tests)
SlowTest           =  20.76 sec*proc (1 test)
UnitTest           =   6.39 sec*proc (34 tests)
Total Test time (real) = 2697.26 sec
The following tests FAILED:
         43 - regressiontests/kernel (Timeout)
Errors while running CTest

Comment 5 Christoph Junghans 2020-02-06 17:53:57 UTC
Using "mock -r fedora-rawhide-ppc64le --no-clean gromacs-2019.5-2.fc32.1.src.rpm"
I get a:
+++ /usr/bin/ps -p 160 -ocomm=
Signal 4 (ILL) caught by ps (3.3.15).
/usr/bin/ps:ps/display.c:66: please report this bug
++ my_shell=

Comment 6 Ben Cotton 2020-02-11 17:06:25 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 32 development cycle.
Changing version to 32.

Comment 7 Christoph Junghans 2020-02-12 03:02:03 UTC
Details on the aarch64 error:
22/27 Test #22: UtilityMpiUnitTests ..............***Failed    0.52 sec
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Invalid error code (-2) (error ring index 127 invalid)
INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in MPID_nem_tcp_init:373
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(586)..............:
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................:
MPID_nem_init(324).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(401):
MPID_nem_tcp_init(373).............: gethostbyname failed, 9642102373514ac7b8330d80c6ee96d2 (errno 0)
Invalid error code (-2) (error ring index 127 invalid)
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(586)..............:
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................:
MPID_nem_init(324).................:
MPID_nem_tcp_init(175).............:
MPID_nem_tcp_get_business_card(401):
MPID_nem_tcp_init(373).............: gethostbyname failed, 9642102373514ac7b8330d80c6ee96d2 (errno 0)

So this seems to be a bug in mpich.

Comment 8 Christoph Junghans 2020-02-14 22:39:25 UTC
ppc64le issue reported upstream: https://redmine.gromacs.org/issues/3380

Comment 9 Christoph Junghans 2020-02-14 22:40:13 UTC
(In reply to Christoph Junghans from comment #7)
> Details on the aarch64 error:
> 22/27 Test #22: UtilityMpiUnitTests ..............***Failed    0.52 sec
> Invalid error code (-2) (error ring index 127 invalid)
> INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in
> MPID_nem_tcp_init:373
> Invalid error code (-2) (error ring index 127 invalid)
> INTERNAL ERROR: invalid error code fffffffe (Ring Index out of range) in
> MPID_nem_tcp_init:373
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(586)..............:
> MPID_Init(224).....................: channel initialization failed
> MPIDI_CH3_Init(105)................:
> MPID_nem_init(324).................:
> MPID_nem_tcp_init(175).............:
> MPID_nem_tcp_get_business_card(401):
> MPID_nem_tcp_init(373).............: gethostbyname failed,
> 9642102373514ac7b8330d80c6ee96d2 (errno 0)
> Invalid error code (-2) (error ring index 127 invalid)
> Fatal error in PMPI_Init_thread: Other MPI error, error stack:
> MPIR_Init_thread(586)..............:
> MPID_Init(224).....................: channel initialization failed
> MPIDI_CH3_Init(105)................:
> MPID_nem_init(324).................:
> MPID_nem_tcp_init(175).............:
> MPID_nem_tcp_get_business_card(401):
> MPID_nem_tcp_init(373).............: gethostbyname failed,
> 9642102373514ac7b8330d80c6ee96d2 (errno 0)
> 
> So this seems to be a bug in mpich.

MPICH issuue patched here: https://src.fedoraproject.org/rpms/mpich/pull-request/2

Comment 10 Fedora Release Engineering 2020-02-16 04:26:47 UTC
Dear Maintainer,

your package has not been built successfully in 32. Action is required from you.

If you can fix your package to build, perform a build in koji, and either create
an update in bodhi, or close this bug without creating an update, if updating is
not appropriate [1]. If you are working on a fix, set the status to ASSIGNED to
acknowledge this. Following the latest policy for such packages [2], your package
will be orphaned if this bug remains in NEW state more than 8 weeks.

A week before the mass branching of Fedora 33 according to the schedule [3],
any packages not successfully rebuilt at least on Fedora 31 will be
retired regardless of the status of this bug.

[1] https://fedoraproject.org/wiki/Updates_Policy
[2] https://docs.fedoraproject.org/en-US/fesco/Fails_to_build_from_source_Fails_to_install/
[3] https://fedoraproject.org/wiki/Releases/33/Schedule