Bug 1801313

Summary: [abrt] clblast-tuners: clblast::CLCudaAPIError::Check(): clblast_tuner_xgemm killed by SIGABRT
Product: [Fedora] Fedora Reporter: David <davidmenhur>
Component: clblastAssignee: Jerry James <loganjerry>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 31CC: loganjerry
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
URL: https://retrace.fedoraproject.org/faf/reports/bthash/317ded0c879fc38968df04fcbae07b9bd7a6392a
Whiteboard: abrt_hash:912e02fe4a2514c2f10ba55f65355b6827889a5f;
Fixed In Version: clblast-1.5.1-1.fc31 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-02-27 17:30:13 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
File: backtrace
none
File: cgroup
none
File: core_backtrace
none
File: cpuinfo
none
File: dso_list
none
File: environ
none
File: limits
none
File: machineid
none
File: maps
none
File: mountinfo
none
File: open_fds
none
File: proc_pid_status none

Description David 2020-02-10 15:53:06 UTC
Description of problem:
Launched the program without access to the GPU, setting CUDA_VISIBLE_DEVICES=''. It immediately crashed.

Version-Release number of selected component:
clblast-tuners-1.5.0-3.fc31

Additional info:
reporter:       libreport-2.11.3
backtrace_rating: 4
cmdline:        clblast_tuner_xgemm
crash_function: clblast::CLCudaAPIError::Check
executable:     /usr/bin/clblast_tuner_xgemm
journald_cursor: s=d1a52455c6e14670938f25393295ed7b;i=40bcc;b=03ca2d63652a45f9a5fd1b2871c6981c;m=4141cff210;t=59e38e430f476;x=3646678d7e3aa038
kernel:         5.4.17-200.fc31.x86_64
rootdir:        /
runlevel:       N 5
type:           CCpp
uid:            1000

Truncated backtrace:
Thread no. 1 (5 frames)
 #6 clblast::CLCudaAPIError::Check at /usr/src/debug/clblast-1.5.0-3.fc31.x86_64/src/clpp11.hpp:82
 #7 clblast::Platform::NumDevices at /usr/src/debug/clblast-1.5.0-3.fc31.x86_64/src/clpp11.hpp:191
 #8 clblast::Device::Device at /usr/src/debug/clblast-1.5.0-3.fc31.x86_64/src/clpp11.hpp:237
 #9 clblast::Tuner<float>(int, char**, int, std::function<clblast::TunerDefaults (int)>, std::function<clblast::TunerSettings (int, clblast::Arguments<float> const&)>, std::function<void (int, clblast::Arguments<float> const&)>, std::function<std::vector<clblast::Constraint, std::allocator<clblast::Constraint> > (int)>, std::function<clblast::LocalMemSizeInfo (int)>, std::function<void (int, clblast::Kernel&, clblast::Arguments<float> const&, std::vector<clblast::Buffer<float>, std::allocator<clblast::Buffer<float> > >&)>) at /usr/include/c++/9/bits/std_function.h:685
 #10 StartVariation<1> at /usr/include/c++/9/new:174

Comment 1 David 2020-02-10 15:53:10 UTC
Created attachment 1662177 [details]
File: backtrace

Comment 2 David 2020-02-10 15:53:11 UTC
Created attachment 1662178 [details]
File: cgroup

Comment 3 David 2020-02-10 15:53:13 UTC
Created attachment 1662179 [details]
File: core_backtrace

Comment 4 David 2020-02-10 15:53:14 UTC
Created attachment 1662180 [details]
File: cpuinfo

Comment 5 David 2020-02-10 15:53:15 UTC
Created attachment 1662181 [details]
File: dso_list

Comment 6 David 2020-02-10 15:53:16 UTC
Created attachment 1662182 [details]
File: environ

Comment 7 David 2020-02-10 15:53:18 UTC
Created attachment 1662183 [details]
File: limits

Comment 8 David 2020-02-10 15:53:19 UTC
Created attachment 1662184 [details]
File: machineid

Comment 9 David 2020-02-10 15:53:21 UTC
Created attachment 1662185 [details]
File: maps

Comment 10 David 2020-02-10 15:53:22 UTC
Created attachment 1662186 [details]
File: mountinfo

Comment 11 David 2020-02-10 15:53:24 UTC
Created attachment 1662187 [details]
File: open_fds

Comment 12 David 2020-02-10 15:53:25 UTC
Created attachment 1662188 [details]
File: proc_pid_status

Comment 13 Jerry James 2020-02-10 21:19:21 UTC
It looks like the tuning code tried to count the number of CUDA/OpenCL devices on the system.  When the count came out zero, it threw an exception, CLCudaAPIError.  Nothing catches that exception, so you got the default C++ abort due to an uncaught exception.

I have asked upstream to please catch that exception.  In the meantime, all I can suggest is that you don't try to tune 0 devices. :-)  Thanks for the report.

Comment 14 Fedora Update System 2020-02-19 16:20:59 UTC
FEDORA-2020-ea0c27ac1b has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2020-ea0c27ac1b

Comment 15 Fedora Update System 2020-02-20 05:45:18 UTC
clblast-1.5.1-1.fc31 has been pushed to the Fedora 31 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2020-ea0c27ac1b

Comment 16 Fedora Update System 2020-02-27 17:30:13 UTC
clblast-1.5.1-1.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.