Bug 2137389 - gpaw opempi tests fail with no output
Summary: gpaw opempi tests fail with no output
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: gpaw
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: marcindulak
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On: 2141137 2142304
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-10-24 18:13 UTC by marcindulak
Modified: 2023-01-19 20:33 UTC (History)
1 user (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2023-01-19 20:33:20 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description marcindulak 2022-10-24 18:13:06 UTC
Description of problem:

Compare f38 (failed) https://koji.fedoraproject.org/koji/buildinfo?buildID=2074798 to f37 (working) https://koji.fedoraproject.org/koji/buildinfo?buildID=2074799. The build.log file contains in the former, failed case

```
...
+ timeout --preserve-status --kill-after 10 1800 time mpiexec -np 2 pytest -m ci -v
+ tee gpaw-test2_openmpi.log
============================= test session starts ==============================
platform linux -- Python 3.11.0rc2, pytest-7.1.3, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /builddir/build/BUILD/gpaw-22.8.0/python3, configfile: pytest.ini, testpaths: gpaw
collecting ... ============================= test session starts ==============================
platform linux -- Python 3.11.0rc2, pytest-7.1.3, pluggy-1.0.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /builddir/build/BUILD/gpaw-22.8.0/python3, configfile: pytest.ini, testpaths: gpaw
collecting ... ~/build/BUILD/gpaw-22.8.0
...
```

How reproducible: in koji


Additional info:

The change in openmpi is on the spec release level 4.1.4-4.fc37 vs 4.1.4-5.fc38. There are other spec release level changes like that, but a major change in openssh 8.8p1-7.fc37 vs 9.0p1-6.fc38.

The rpms need to be fetched onto a rawhide (f38) and gpaw openmpi debugged locally.

Comment 1 marcindulak 2022-11-12 11:25:57 UTC
With the libfabric-1.16.1-3.fc38 update in bug #2141137, x86_64, aarch64, ppc64le and s390x platforms don't hang in mpiexec, but i686 segmentation faults https://koji.fedoraproject.org/koji/taskinfo?taskID=94085441

```
timeout --preserve-status --kill-after 10 1800 time mpiexec -np 2 pytest -m ci -v
...
rootdir: /builddir/build/BUILD/gpaw-22.8.0/python3, configfile: pytest.ini, testpaths: gpaw
collecting ... --------------------------------------------------------------------------
A system call failed during shared memory initialization that should
not have.  It is likely that your MPI job will now either abort or
experience performance degradation.
  Local host:  642e8feb2ec5460ca024ce70a0cd26fb
  System call: mmap(2) 
  Error:       Bad address (errno 14)
--------------------------------------------------------------------------
Fatal Python error: Segmentation fault
Current thread 0xf7874700 (most recent call first):
  File "/builddir/build/BUILD/gpaw-22.8.0/python3/gpaw/broadcast_imports.py", line 52 in marshal_broadcast
  File "/builddir/build/BUILD/gpaw-22.8.0/python3/gpaw/broadcast_imports.py", line 141 in broadcast
  File "/builddir/build/BUILD/gpaw-22.8.0/python3/gpaw/broadcast_imports.py", line 162 in disable
  File "/builddir/build/BUILD/gpaw-22.8.0/python3/gpaw/broadcast_imports.py", line 172 in __exit__
  File "/builddir/build/BUILD/gpaw-22.8.0/python3/gpaw/__init__.py", line 50 in <module>
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap_external>", line 940 in exec_module
  File "<frozen importlib._bootstrap>", line 690 in _load_unlocked
  File "<frozen importlib._bootstrap>", line 1149 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1178 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1206 in _gcd_import
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1128 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1178 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1206 in _gcd_import
  File "<frozen importlib._bootstrap>", line 241 in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1128 in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 1178 in _find_and_load
  File "<frozen importlib._bootstrap>", line 1206 in _gcd_import
  File "/usr/lib/python3.11/importlib/__init__.py", line 126 in import_module
  File "/usr/lib/python3.11/site-packages/_pytest/pathlib.py", line 533 in import_path
  File "/usr/lib/python3.11/site-packages/_pytest/config/__init__.py", line 607 in _importconftest
  File "/usr/lib/python3.11/site-packages/_pytest/config/__init__.py", line 575 in _getconftestmodules
  File "/usr/lib/python3.11/site-packages/_pytest/main.py", line 539 in gethookproxy
  File "/usr/lib/python3.11/site-packages/_pytest/main.py", line 575 in _collectfile
  File "/usr/lib/python3.11/site-packages/_pytest/main.py", line 723 in collect
  File "/usr/lib/python3.11/site-packages/_pytest/runner.py", line 369 in <lambda>
  File "/usr/lib/python3.11/site-packages/_pytest/runner.py", line 338 in from_call
  File "/usr/lib/python3.11/site-packages/_pytest/runner.py", line 369 in pytest_make_collect_report
  File "/usr/lib/python3.11/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/usr/lib/python3.11/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/usr/lib/python3.11/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/usr/lib/python3.11/site-packages/_pytest/runner.py", line 537 in collect_one_node
  File "/usr/lib/python3.11/site-packages/_pytest/main.py", line 643 in perform_collect
  File "/usr/lib/python3.11/site-packages/_pytest/main.py", line 332 in pytest_collection
  File "/usr/lib/python3.11/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/usr/lib/python3.11/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/usr/lib/python3.11/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/usr/lib/python3.11/site-packages/_pytest/main.py", line 321 in _main
  File "/usr/lib/python3.11/site-packages/_pytest/main.py", line 268 in wrap_session
  File "/usr/lib/python3.11/site-packages/_pytest/main.py", line 315 in pytest_cmdline_main
  File "/usr/lib/python3.11/site-packages/pluggy/_callers.py", line 39 in _multicall
  File "/usr/lib/python3.11/site-packages/pluggy/_manager.py", line 80 in _hookexec
  File "/usr/lib/python3.11/site-packages/pluggy/_hooks.py", line 265 in __call__
  File "/usr/lib/python3.11/site-packages/_pytest/config/__init__.py", line 164 in main
  File "/usr/lib/python3.11/site-packages/_pytest/config/__init__.py", line 187 in console_main
  File "/usr/bin/pytest", line 8 in <module>
```


Note You need to log in before you can comment on or make changes to this bug.