Bug 479941

Summary: errors / warnings (udapl related) when running openmpi jobs
Product: Red Hat Enterprise Linux 5 Reporter: Mehdi Bozzo-Rey <mbozzore>
Component: openmpiAssignee: Doug Ledford <dledford>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: low    
Version: 5.3CC: cfeller, fenlason, gozen
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 08:56:43 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mehdi Bozzo-Rey 2009-01-14 07:59:17 UTC
Description of problem: tons of warnings when running openmpi jobs


Version-Release number of selected component (if applicable): 5.3 rc2


How reproducible:


Steps to Reproduce:
1. install openmpi (nodes have IB)
2. run basic example (don't specify any specific btl)
3.
  
Actual results: tons of errors


Expected results: no error


Additional info:

[mbozzore@compute-0-11 examples]$ mpirun -np 2 --hostfile ./hosts ./hello_c
DAT: library load failure: /usr/lib64/libdaplcma.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-cma" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
DAT: library load failure: /usr/lib64/libdaplcma.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-cma-1" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
DAT: library load failure: /usr/lib64/libdaplscm.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-mthca0-1" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
DAT: library load failure: /usr/lib64/libdaplscm.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-mthca0-2" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
DAT: library load failure: /usr/lib64/libdaplscm.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-mlx4_0-1" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
DAT: library load failure: /usr/lib64/libdaplscm.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-mlx4_0-2" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
DAT: library load failure: /usr/lib64/libdaplscm.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-ipath0-1" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
DAT: library load failure: /usr/lib64/libdaplscm.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-ipath1-1" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,0]: uDAPL on host 10.1.1.13 was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
DAT: library load failure: /usr/lib64/libdaplcma.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-cma" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
DAT: library load failure: /usr/lib64/libdaplcma.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-cma-1" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
DAT: library load failure: /usr/lib64/libdaplscm.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-mthca0-1" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
DAT: library load failure: /usr/lib64/libdaplscm.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-mthca0-2" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
DAT: library load failure: /usr/lib64/libdaplscm.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-mlx4_0-1" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
DAT: library load failure: /usr/lib64/libdaplscm.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-mlx4_0-2" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
DAT: library load failure: /usr/lib64/libdaplscm.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-ipath0-1" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
DAT: library load failure: /usr/lib64/libdaplscm.so.1: undefined symbol: dat_registry_add_provider
--------------------------------------------------------------------------

WARNING: Failed to open "OpenIB-ipath1-1" [DAT_PROVIDER_NOT_FOUND:DAT_NAME_NOT_REGISTERED].
This may be a real error or it may be an invalid entry in the uDAPL
Registry which is contained in the dat.conf file. Contact your local
System Administrator to confirm the availability of the interfaces in
the dat.conf file.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
[0,1,1]: uDAPL on host 10.1.1.13 was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------
Hello, world, I am 0 of 2
Hello, world, I am 1 of 2



OK TCP / OPENIB

[mbozzore@compute-0-11 examples]$ mpirun -np 2 --hostfile ./hosts ./hello_cls /
[mbozzore@compute-0-11 examples]$ mpirun -np 2 --hostfile ./hosts --mca btl tcp,self ./hello_c
Hello, world, I am 0 of 2
Hello, world, I am 1 of 2
[mbozzore@compute-0-11 examples]$ mpirun -np 2 --hostfile ./hosts --mca btl openib,self ./hello_c
Hello, world, I am 0 of 2
Hello, world, I am 1 of 2

Comment 9 errata-xmlrpc 2010-03-30 08:56:43 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2010-0292.html