Bug 1897347

Summary: Calling lapack subroutines does not work after upgrade to Fedora 33 (from 32)
Product: [Fedora] Fedora Reporter: Ivan <ivan.balog>
Component: lapackAssignee: Tom "spot" Callaway <spotrh>
Status: CLOSED WORKSFORME QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 33CC: frantisek.kluknavsky, spotrh
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-23 15:37:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ivan 2020-11-12 20:33:57 UTC
Description of problem:

After upgrading Fedora 32 to Fedora 33, I find that calling lapack subroutine ``dgetrf"  from a Fortran program produces a Segmentation fault during run-time of the program. 


Version-Release number of selected component (if applicable):

Related packages:

gcc-gfortran.x86_64                    10.2.1-6.fc33 
lapack.x86_64                        3.9.0-5.fc33
kernel.x86_64                      5.8.18-300.fc33

Where it gets interesting is that when I run exactly the same program in the ``toolbox" container with Fedora 32 image, with exactly the same gfortran, lapack and the kernel version 5.8.18-200.fc32, the program runs as it is supposed to run.

How reproducible:

Happens always


Steps to Reproduce:

1. Take this simple f90 program:

``
program fin_diff_mtx

implicit none

integer(4) :: m
parameter(m=3)
real(8) :: c(1:2*m,-m:m,-m:m)

real(8) :: a(-m:m,-m:m)
integer(4) :: i,j,k!,n
integer(4) :: jfact
integer(4) :: lworkin 

parameter(lworkin=50)

real(8) :: workin(lworkin)
integer(4) :: infoin
integer(4) :: ipivin(2*m+1)

!n=2*m+1

do k=0,2*m

   a(:,:)=0.d0

   do i=0,2*m

      a(i-m,-m)=1.d0
      jfact=1

      do j=1,2*m

         jfact=jfact*j
         a(i-m,j-m)=(dfloat(i-k)**dfloat(j))/jfact

      end do
   end do

   !invert a for each k
   call dgetrf(2*m+1,2*m+1,a,2*m+1,ipivin,infoin)  
   call dgetri(2*m+1,a,2*m+1,ipivin,workin,lworkin,infoin)

   c(1:2*m,k-m,-m:m)=a(1-m:m,-m:m)

end do

end program fin_diff_mtx
"

2. Compile it on Fedora 33 with 

``
gfortran -g -o fin_diff fin_diff_mtx.f90 -llapack -lblas
"

3. Run it and watch it crash when it calls lapack subroutines dgetrf and dgetri.
This is what the console says:

``
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f5f61c59c4f in ???
#1  0x7f5f622548b0 in ???
#2  0x7f5f62470924 in ???
#3  0x7f5f624706a0 in ???
#4  0x7f5f624706a0 in ???
#5  0x7f5f624706a0 in ???
#6  0x7f5f624706a0 in ???
#7  0x7f5f624706a0 in ???
#8  0x7f5f624706a0 in ???
#9  0x7f5f62470c44 in ???
#10  0x4012b5 in fin_diff_mtx
	at fin_diff_mtx.f90:64
#11  0x4013bd in main
	at fin_diff_mtx.f90:71
Segmentation fault (core dumped)
"  

4. Repeat steps from 1.-3. on Fedora 32 with same versions of the the pertaining components and see that the program finishes nicely.

Comment 1 Tom "spot" Callaway 2020-11-13 16:10:35 UTC
Hmm. I can't reproduce this on my new Fedora 33 x86_64 install (running bare metal with all updates applied):

[spot@localhost sandbox]$ uname -a
Linux localhost.localdomain 5.8.18-300.fc33.x86_64 #1 SMP Mon Nov 2 19:09:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[spot@localhost sandbox]$ rpm -q kernel gcc-gfortran lapack blas
kernel-5.8.15-301.fc33.x86_64
kernel-5.8.18-300.fc33.x86_64
gcc-gfortran-10.2.1-6.fc33.x86_64
lapack-3.9.0-5.fc33.x86_64
blas-3.9.0-5.fc33.x86_64
[spot@localhost sandbox]$ gfortran -g -o fin_diff fin_diff_mtx.f90 -llapack -lblas
[spot@localhost sandbox]$ ./fin_diff 
[spot@localhost sandbox]$

Comment 2 Ivan 2021-02-19 09:44:22 UTC
Thanks for the effort. No wonder you could not have reproduced it, as far as I can see by searching 
net, my 2 (completely different) computers seem to be the only 2 machines in the universe having 
the this problem. 

Anyways I have found a workaround solution and maybe a clue as to what went wrong. Apparently 
F33 introduced FlexiBLAS as a replacement for both LAPACK and BLAS. Despite LAPACK and BLAS nominally 
working on F33, maybe some configuration file in the linker was changed enough to make a problem? 

Anyways, installing FlexiBLAS and linking to it in compilation seems to create the program 
that works on the machine where linking to LAPACK and BLAS separately does not.

Comment 3 Tom "spot" Callaway 2021-02-23 15:37:49 UTC
Very weird! I'm glad you found a working solution, if you have this problem again, feel free to reopen.