Bug 1514542 - NWChem is broken on Fedora 27
Summary: NWChem is broken on Fedora 27
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: nwchem
Version: 28
Hardware: All
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: marcindulak
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 1432661
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-17 18:13 UTC by Henrique C. S. Junior
Modified: 2018-07-04 18:43 UTC (History)
3 users (show)

Fixed In Version: nwchem-6.8.1-4.fc28
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-04 18:43:41 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
Output (60.09 KB, text/plain)
2017-11-21 16:29 UTC, Henrique C. S. Junior
no flags Details
Input (1.07 KB, text/plain)
2017-11-21 16:30 UTC, Henrique C. S. Junior
no flags Details

Description Henrique C. S. Junior 2017-11-17 18:13:19 UTC
Description of problem:
Impossble to install NWChem on Fedora 27 due to missing dependencies.

Version-Release number of selected component (if applicable):


How reproducible:
Try to install parallel NWChem

Steps to Reproduce:
sudo dnf install nwchem*

Actual results:
Erro: 
 Problema 1: conflicting requests
  - nothing provides libga.so.0()(64bit)(mpich-x86_64) needed by nwchem-mpich-6.6.27746-32.fc27.x86_64
 Problema 2: conflicting requests
  - nothing provides libga.so.0()(64bit)(openmpi-x86_64) needed by nwchem-openmpi-6.6.27746-32.fc27.x86_64

Comment 1 marcindulak 2017-11-18 20:42:15 UTC
I'm having problems building a working version of nwchem on fedora 27.

Neither 6.6 https://koji.fedoraproject.org/koji/taskinfo?taskID=23207023
or 6.8 work properly https://koji.fedoraproject.org/koji/taskinfo?taskID=23207119

I'm getting

0:0:ga_diag_std_seq: dsyev failed:: 0

on the simple output, for both openmpi and mpich

module load mpi/openmpi-x86_64
echo -e "geometry\nH 0 0 0\nH 0 0 1\nend\nbasis\nH library STO-3G\nend\ntask dft energy" > test.nw && mpiexec -np 2 `which nwchem_openmpi` test.nw

It may have something to do with the ga version 5.6.1 available in fedora 27.
Maybe you want to investigate on nwchem forum, apparently a similar error has been reported in the past, and I can try to build nwchem with the bundled ga instead.

Comment 2 marcindulak 2017-11-19 16:34:04 UTC
Here is a version built using internally bundled ga https://koji.fedoraproject.org/koji/taskinfo?taskID=23231588 - works for me.

Comment 3 Henrique C. S. Junior 2017-11-21 12:41:18 UTC
Hi, Marcin, sorry for my delay.
It is working for me too.

Do you need any assistance, lets say, by contacting and reporting to upstream or performing more tests?

Thank you for the quick fix.

Comment 4 marcindulak 2017-11-21 13:08:20 UTC
There is a thing you could do: can you test whether scalapack features work properly with https://koji.fedoraproject.org/koji/taskinfo?taskID=23231588

This is a nwchem version built with the ga bundled by nwchem, but I'm not sure what and whether this ga has picked up any scalapack libraries during the build.

Comment 5 Henrique C. S. Junior 2017-11-21 16:29:46 UTC
Created attachment 1356853 [details]
Output

Comment 6 Henrique C. S. Junior 2017-11-21 16:30:14 UTC
Created attachment 1356855 [details]
Input

Comment 7 Henrique C. S. Junior 2017-11-21 16:31:38 UTC
It looks like NWChem heavily uses Scalapack in the diagonalization part of PW calculations. Looks fine in my test. Please, take a look at the attachments.

Comment 8 marcindulak 2017-11-21 20:36:12 UTC
Which part of the output proves scalapack is used?

Comment 9 Henrique C. S. Junior 2017-11-22 10:15:30 UTC
You,re right, my mistake here (I was reading a part of a book that stated that scalapack was used in the PW calculations, but despite citing NWChem it does not affirm that ths is how NWChem does it).

Let me contact Edoardo Apra

Comment 10 marcindulak 2017-11-22 10:32:13 UTC
It would be better to search nwchem forum how to verify how to the scalapack functionality and ask there if no examples are available.

Comment 11 Henrique C. S. Junior 2017-11-22 11:07:16 UTC
It is done.
http://www.nwchem-sw.org/index.php/Special:AWCforum/st/id2488

Let's wait.

Comment 12 marcindulak 2017-11-22 12:39:31 UTC
You can cross link bugzilla on the nwchem forum too.

Comment 13 Edoardo Apra 2017-11-22 21:39:33 UTC
Henrique/Marcin,
What GA was used in this RPM that exhibits the eigensolver failure? 
If I read correctly, you are not using the bundle GA, but a something from a  Fedora RPM, right?

Comment 14 Henrique C. S. Junior 2017-11-22 21:56:15 UTC
(In reply to marcindulak from comment #2)
> Here is a version built using internally bundled ga
> https://koji.fedoraproject.org/koji/taskinfo?taskID=23231588 - works for me.

Dear Edoardo, Marcin build a version that uses the bundled GA and that one is working, but he wants to test if Scalapack is working as expected too.

Comment 15 marcindulak 2017-11-22 22:20:07 UTC
(In reply to Edoardo Apra from comment #13)
> Henrique/Marcin,
> What GA was used in this RPM that exhibits the eigensolver failure? 
> If I read correctly, you are not using the bundle GA, but a something from a
> Fedora RPM, right?

Since ga is not well maintained in Fedora/EPEL (see bug #1432661) I'm looking into using the ga bundled by nwchem, as https://bugzilla.redhat.com/show_bug.cgi?id=1514542#c4 describes.

I don't know whether the failure to build a working nwchem executable against the ga 5.6.1 RPMS available currently in Fedora 27 is due to a mismatch between nwchem 6.6/6.8 and ga 5.6.1, or improper build of Fedora 27 ga 5.6.1 against scalapack.

Since I would need to modify the build process of the bundled ga in nwchem.spec to use Fedora's scalapack/blas, the question is what is the set of nwchem tests that exercise scalapack.

Here are the tests I'm executing using 2 cores during the nwchem RPM build process
https://src.fedoraproject.org/cgit/rpms/nwchem.git/tree/nwchem.spec#n426

timeout --preserve-status --kill-after 10 1800 time ./doafewqmtests.mpi 2

Comment 16 Edoardo Apra 2017-11-23 01:21:14 UTC
Yes, some of the test included in  doafewqmtests.mpi do stress scalapack.

However, the following one uses large matrices (but it is not a long test)

./runtests.mpi.unix procs 2 dft_siosi3

By the way, the NWChem repository has just moved to github under

https://github.com/nwchemgit/nwchem

Cheers, Edo

PS I have just updated my recipe to build NWChem 6.8 on Fedora 27

https://github.com/nwchemgit/nwchem/wiki/Compiling-NWChem#nwchem-68-on-centos-71fedora-27

Comment 17 marcindulak 2017-11-23 18:42:38 UTC
I need a test that forcibly tests scalapack, and fails if scalapack is not available.

Will nwchem perform any fallback if ga fails to discover scalapack during the build process?

Comment 18 Edoardo Apra 2017-11-27 19:46:45 UTC
Once the Global Arrays are built (in the src/tools directory), the NWChem build process checks if Global Arrays have detected the presence of Scalapack. If Global Arrays have failed to find usable a usable Scalapack library, NWChem uses the builtin Peigs library.

Here is the line of src/config/makefile.h that checks for the Global Arrays detection of Scalapack

_USE_SCALAPACK = $(shell cat ${NWCHEM_TOP}/src/tools/build/config.h | awk ' /HAVE_SCALAPACK\ 1/ {print "Y"}')

Comment 19 marcindulak 2017-11-27 19:59:28 UTC
Is GA performing this switch dynamically, I mean I start nwchem and GA discovers there is a so missing/mismatch and switches to Peigs?
Is there a way to see this happens somewhere in nwchem output?

Comment 20 Edoardo Apra 2017-11-27 20:04:50 UTC
You cannot see it in the output of a NWChem run at this time, only during the build phase.
Would you like to have a print option that write this information at the beginning of a NWChem output?

Comment 21 marcindulak 2017-11-27 20:59:51 UTC
Yes, and in addition, if scalapack is used some summary of blacs context.

Comment 22 Edoardo Apra 2017-11-28 03:47:00 UTC
Let me know if this is what you are looking for.
(I am not sure I understand your question about BLACS, please provide more details about it)

          Job information
           ---------------

    hostname        = lagrange
    program         = /tmp/nwchem/bin/LINUX64/nwchem
    date            = Mon Nov 27 19:44:56 2017

    compiled        = Mon_Nov_27_19:44:50_2017
    source          = /tmp/nwchem
    nwchem branch   = Development
    nwchem revision = nwchem_on_git-7-gb15e696289a21bd0a3c2e2bda421a2539db39847
    ga revision     = nwchem_on_git-7-gb15e696
    use scalapack   = F
    input           = nwchem.nw
    prefix          = h2o.
    data base       = ./h2o.db
    status          = startup
    nproc           =        1
    time left       =     -1s

Comment 23 marcindulak 2017-11-28 08:27:33 UTC
I mean nprow/npcol https://info.gwdg.de/wiki/doku.php?id=wiki:hpc:scalapack
and also the block size http://www.netlib.org/utk/papers/scalapack/node19.html

I imagine GA (?) will perform a heuristic choice of these based on the matrix size?

But "use scalapack" is already good.

Comment 24 Edoardo Apra 2017-11-29 02:12:30 UTC
Here is the URL for the new nwchem-6.8 beta3 tarball. It contains the "use scalapack" print out showed earlier

https://github.com/nwchemgit/nwchem/releases/tag/v6.8-beta.3

Comment 25 Edoardo Apra 2017-12-14 21:55:23 UTC
Marcin/Henrique
NWChem 6.8 is now available for download

https://github.com/nwchemgit/nwchem/releases/tag/v6.8-release

Comment 26 marcindulak 2018-05-06 11:27:16 UTC
Still broken in f28 bug #1573918. Nwchem needs rebuilding.

Comment 27 Henrique C. S. Junior 2018-05-07 07:45:09 UTC
The discussion about providing a flatpak has started in NWChem's GitHub. Maybe this is the best way to make the package easier to distribute for everyone.

Comment 28 marcindulak 2018-06-09 07:41:46 UTC
Rebuilding the 6.6.27746 version results still in segfaults on Fedora 28
https://koji.fedoraproject.org/koji/taskinfo?taskID=27497158

and updating the spec to the latest nwchem 6.8 is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1432661

Comment 29 Edoardo Apra 2018-06-11 20:37:16 UTC
NWChem 6.6.27746: it's likely due that while the NWChem build adopts 32-bit integer for BLAS (BLAS_SIZE=4), the GA library from the Fedora28 rpm was compiled with BLAS_SIZE=8 (default value, confirmed after inspecting ga.spec in ga-5.6.1-2.fc28.src.rpm)

Since the current NWChem 6.8 build is blocked by the lack of compatible GA RPMS, what prevents from using the bundles GA shipped with  NWChem 6.8?

By the way, there is a bunch of fixes released after NWChem 6.8 came available in the hotfix/release-6-8 branch (aka 6.8.1)

Comment 30 marcindulak 2018-06-12 07:54:13 UTC
I've asked fedora-devel about the possibility of using external ga source for the nwchem build

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/KUYUMGYUM73TGXBCSYLBKDPN3IHXLLOW/#KUYUMGYUM73TGXBCSYLBKDPN3IHXLLOW

Comment 31 Fedora Update System 2018-07-02 11:01:32 UTC
nwchem-6.8.1-4.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-dccd733c0a

Comment 32 Fedora Update System 2018-07-03 17:53:21 UTC
nwchem-6.8.1-4.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-dccd733c0a

Comment 33 Fedora Update System 2018-07-04 18:43:41 UTC
nwchem-6.8.1-4.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.