Bug 1801313 - [abrt] clblast-tuners: clblast::CLCudaAPIError::Check(): clblast_tuner_xgemm killed by SIGABRT
Summary: [abrt] clblast-tuners: clblast::CLCudaAPIError::Check(): clblast_tuner_xgemm ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: clblast
Version: 31
Hardware: x86_64
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jerry James
QA Contact: Fedora Extras Quality Assurance
URL: https://retrace.fedoraproject.org/faf...
Whiteboard: abrt_hash:912e02fe4a2514c2f10ba55f653...
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-10 15:53 UTC by David
Modified: 2020-02-27 17:30 UTC (History)
1 user (show)

Fixed In Version: clblast-1.5.1-1.fc31
Clone Of:
Environment:
Last Closed: 2020-02-27 17:30:13 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
File: backtrace (25.73 KB, text/plain)
2020-02-10 15:53 UTC, David
no flags Details
File: cgroup (351 bytes, text/plain)
2020-02-10 15:53 UTC, David
no flags Details
File: core_backtrace (3.56 KB, text/plain)
2020-02-10 15:53 UTC, David
no flags Details
File: cpuinfo (2.33 KB, text/plain)
2020-02-10 15:53 UTC, David
no flags Details
File: dso_list (2.82 KB, text/plain)
2020-02-10 15:53 UTC, David
no flags Details
File: environ (5.95 KB, text/plain)
2020-02-10 15:53 UTC, David
no flags Details
File: limits (1.29 KB, text/plain)
2020-02-10 15:53 UTC, David
no flags Details
File: machineid (135 bytes, text/plain)
2020-02-10 15:53 UTC, David
no flags Details
File: maps (19.82 KB, text/plain)
2020-02-10 15:53 UTC, David
no flags Details
File: mountinfo (4.11 KB, text/plain)
2020-02-10 15:53 UTC, David
no flags Details
File: open_fds (330 bytes, text/plain)
2020-02-10 15:53 UTC, David
no flags Details
File: proc_pid_status (1.34 KB, text/plain)
2020-02-10 15:53 UTC, David
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github CNugteren CLBlast issues 374 0 None open Uncaught exception in tuner 2020-02-17 23:36:03 UTC

Description David 2020-02-10 15:53:06 UTC
Description of problem:
Launched the program without access to the GPU, setting CUDA_VISIBLE_DEVICES=''. It immediately crashed.

Version-Release number of selected component:
clblast-tuners-1.5.0-3.fc31

Additional info:
reporter:       libreport-2.11.3
backtrace_rating: 4
cmdline:        clblast_tuner_xgemm
crash_function: clblast::CLCudaAPIError::Check
executable:     /usr/bin/clblast_tuner_xgemm
journald_cursor: s=d1a52455c6e14670938f25393295ed7b;i=40bcc;b=03ca2d63652a45f9a5fd1b2871c6981c;m=4141cff210;t=59e38e430f476;x=3646678d7e3aa038
kernel:         5.4.17-200.fc31.x86_64
rootdir:        /
runlevel:       N 5
type:           CCpp
uid:            1000

Truncated backtrace:
Thread no. 1 (5 frames)
 #6 clblast::CLCudaAPIError::Check at /usr/src/debug/clblast-1.5.0-3.fc31.x86_64/src/clpp11.hpp:82
 #7 clblast::Platform::NumDevices at /usr/src/debug/clblast-1.5.0-3.fc31.x86_64/src/clpp11.hpp:191
 #8 clblast::Device::Device at /usr/src/debug/clblast-1.5.0-3.fc31.x86_64/src/clpp11.hpp:237
 #9 clblast::Tuner<float>(int, char**, int, std::function<clblast::TunerDefaults (int)>, std::function<clblast::TunerSettings (int, clblast::Arguments<float> const&)>, std::function<void (int, clblast::Arguments<float> const&)>, std::function<std::vector<clblast::Constraint, std::allocator<clblast::Constraint> > (int)>, std::function<clblast::LocalMemSizeInfo (int)>, std::function<void (int, clblast::Kernel&, clblast::Arguments<float> const&, std::vector<clblast::Buffer<float>, std::allocator<clblast::Buffer<float> > >&)>) at /usr/include/c++/9/bits/std_function.h:685
 #10 StartVariation<1> at /usr/include/c++/9/new:174

Comment 1 David 2020-02-10 15:53:10 UTC
Created attachment 1662177 [details]
File: backtrace

Comment 2 David 2020-02-10 15:53:11 UTC
Created attachment 1662178 [details]
File: cgroup

Comment 3 David 2020-02-10 15:53:13 UTC
Created attachment 1662179 [details]
File: core_backtrace

Comment 4 David 2020-02-10 15:53:14 UTC
Created attachment 1662180 [details]
File: cpuinfo

Comment 5 David 2020-02-10 15:53:15 UTC
Created attachment 1662181 [details]
File: dso_list

Comment 6 David 2020-02-10 15:53:16 UTC
Created attachment 1662182 [details]
File: environ

Comment 7 David 2020-02-10 15:53:18 UTC
Created attachment 1662183 [details]
File: limits

Comment 8 David 2020-02-10 15:53:19 UTC
Created attachment 1662184 [details]
File: machineid

Comment 9 David 2020-02-10 15:53:21 UTC
Created attachment 1662185 [details]
File: maps

Comment 10 David 2020-02-10 15:53:22 UTC
Created attachment 1662186 [details]
File: mountinfo

Comment 11 David 2020-02-10 15:53:24 UTC
Created attachment 1662187 [details]
File: open_fds

Comment 12 David 2020-02-10 15:53:25 UTC
Created attachment 1662188 [details]
File: proc_pid_status

Comment 13 Jerry James 2020-02-10 21:19:21 UTC
It looks like the tuning code tried to count the number of CUDA/OpenCL devices on the system.  When the count came out zero, it threw an exception, CLCudaAPIError.  Nothing catches that exception, so you got the default C++ abort due to an uncaught exception.

I have asked upstream to please catch that exception.  In the meantime, all I can suggest is that you don't try to tune 0 devices. :-)  Thanks for the report.

Comment 14 Fedora Update System 2020-02-19 16:20:59 UTC
FEDORA-2020-ea0c27ac1b has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-ea0c27ac1b

Comment 15 Fedora Update System 2020-02-20 05:45:18 UTC
clblast-1.5.1-1.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-ea0c27ac1b

Comment 16 Fedora Update System 2020-02-27 17:30:13 UTC
clblast-1.5.1-1.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.