Bug 1709538 - Incorrect code generation, stack overwrite with devtoolset-8-gcc-8.3.1-3.el6.x86_64 - works with devtoolset-8-gcc-8.2.1-7.el6.x86_64
Summary: Incorrect code generation, stack overwrite with devtoolset-8-gcc-8.3.1-3.el6....
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Developer Toolset
Classification: Red Hat
Component: gcc
Version: DTS 8.0 RHEL 6
Hardware: All
OS: Linux
medium
medium
Target Milestone: alpha
: 8.1
Assignee: Marek Polacek
QA Contact: Alexandra Petlanová Hájková
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-13 18:45 UTC by Paulo Andrade
Modified: 2019-12-10 07:49 UTC (History)
11 users (show)

Fixed In Version: devtoolset-9-gcc-9.1.1-1.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-10 07:49:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
dtrtri.f (3.73 KB, text/plain)
2019-05-13 18:45 UTC, Paulo Andrade
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4148061 0 Troubleshoot None devtoolset-8 has a serious bug that causes a failure in a R package (RxODE) 2019-05-15 15:26:40 UTC
Red Hat Product Errata RHEA-2019:4134 0 None None None 2019-12-10 07:49:45 UTC

Description Paulo Andrade 2019-05-13 18:45:05 UTC
Created attachment 1568074 [details]
dtrtri.f

# scl enable devtoolset-8 bash
# /opt/rh/devtoolset-8/root/usr/bin/gfortran  -fopenmp -fpic  -O2 -g  -c dtrtri.f -o dtrtri.o

  Checking all options enabled with -O2, it will not generate this
pattern:

 270:   48 c7 84 24 b8 00 00    movq   $0x1,0xb8(%rsp)
 277:   00 01 00 00 00 
 27c:   4c 89 ee                mov    %r13,%rsi
 27f:   4c 89 f7                mov    %r14,%rdi
 282:   48 c7 84 24 b0 00 00    movq   $0x1,0xb0(%rsp)

only with -fno-optimize-sibling-calls

  It works with -O1 or -O0.

  The problem appears to be that it only saves this stack space:

  1d:   48 83 ec 78             sub    $0x78,%rsp

  On a somewhat complex environment, in a chroot, building several third party
packages, debugging this code:

"""
template<typename eT, typename T1>
inline
bool
auxlib::inv_tr(Mat<eT>& out, const Base<eT,T1>& X, const uword layout)
  {
  arma_extra_debug_sigprint();

  #if defined(ARMA_USE_LAPACK)
    {
    out = X.get_ref();

    arma_debug_check( (out.is_square() == false), "inv(): given matrix must be square sized" );

    if(out.is_empty())  { return true; }

    arma_debug_assert_blas_size(out);

    char     uplo = (layout == 0) ? 'U' : 'L';
    char     diag = 'N';
    blas_int n    = blas_int(out.n_rows);
    blas_int info = 0;

    arma_extra_debug_print("lapack::trtri()");
    lapack::trtri(&uplo, &diag, &n, out.memptr(), &n, &info);

    if(info != 0)  { return false; }

    if(layout == 0)
      {
      out = trimatu(out);  // upper triangular
      }
    else
      {
      out = trimatl(out);  // lower triangular
      }

    return true;
    }
"""

I see this:

"""
Breakpoint 1, arma::auxlib::inv_tr<double, arma::Mat<double> > (layout=0, X=..., out=...) at /case_02366438/binaries/R-3.5.2/lib64/R/library/RcppArmadillo/include/armadillo_bits/auxlib_meat.hpp:235
235	    out = X.get_ref();
Missing separate debuginfos, use: debuginfo-install libgcc-4.4.7-23.el6.x86_64 libgfortran5-8.2.1-1.3.1.el6_10.x86_64 libgomp-4.4.7-23.el6.x86_64 libquadmath-7.2.1-1.2.1.el6.x86_64 libstdc++-4.4.7-23.el6.x86_64 ncurses-libs-5.7-4.20090207.el6.x86_64 readline-6.0-4.el6.x86_64
(gdb) p  &diag
$1 = 0x7ffffffd5887 ""
(gdb) p &uplo
$2 = 0x7ffffffd5886 ""
(gdb) watch *(char*)$1
Hardware watchpoint 2: *(char*)$1
(gdb) watch *(char*)$2
Hardware watchpoint 3: *(char*)$2
(gdb) c
Continuing.

Hardware watchpoint 3: *(char*)$2

Old value = 0 '\000'
New value = 85 'U'
0x00007fffebf9aad1 in arma::auxlib::inv_tr<double, arma::Mat<double> > (layout=0, X=..., out=...) at /case_02366438/binaries/R-3.5.2/lib64/R/library/RcppArmadillo/include/armadillo_bits/debug.hpp:1140
1140	arma_assert_blas_size(const T1& A)
(gdb) c
Continuing.

Hardware watchpoint 2: *(char*)$1

Old value = 0 '\000'
New value = 78 'N'
0x00007fffebf9aad6 in arma::auxlib::inv_tr<double, arma::Mat<double> > (layout=0, X=..., out=...) at /case_02366438/binaries/R-3.5.2/lib64/R/library/RcppArmadillo/include/armadillo_bits/debug.hpp:1140
1140	arma_assert_blas_size(const T1& A)
(gdb) c
Continuing.

Hardware watchpoint 2: *(char*)$1

Old value = 78 'N'
New value = 0 '\000'
dtrtri (uplo=..., diag=..., n=1, a=..., lda=1, info=0, _uplo=<optimized out>, _diag=<optimized out>) at dlapack.f:162116
162116	      END
"""

The second write to "$1" is zeroing the "diag" variable, because an out
of bounds write in the stack.

Testing on other environments with gcc 8 and checking the assembly,
the "movq $1,0xXX(%rsp)" pattern does not appear.

  The watchpoint triggers in the movq instruction.

(gdb) x/10i $rip-32
   0x7fffede7707e <dtrtri_+622>:	mov    %ebx,%eax
   0x7fffede77080 <dtrtri_+624>:	movq   $0x1,0xb8(%rsp)
   0x7fffede7708c <dtrtri_+636>:	mov    %r13,%rsi
   0x7fffede7708f <dtrtri_+639>:	mov    %r14,%rdi
   0x7fffede77092 <dtrtri_+642>:	movq   $0x1,0xb0(%rsp)
=> 0x7fffede7709e <dtrtri_+654>:	add    $0x78,%rsp
   0x7fffede770a2 <dtrtri_+658>:	pop    %rbx
   0x7fffede770a3 <dtrtri_+659>:	pop    %rbp
   0x7fffede770a4 <dtrtri_+660>:	pop    %r12
   0x7fffede770a6 <dtrtri_+662>:	pop    %r13

Comment 2 Jakub Jelinek 2019-05-14 08:12:13 UTC
I don't see any out of bounds stores in that routine, it is a normal tail call.  Both the dtrtri_ and dtrti2_ functions have actually under the hood 8 64-bit arguments (because the first two arguments have character type and thus are passed as a pointer/reference and then at the end of arguments the length of the string).  The first 6 arguments are passed on x86_64 in registers, the remaining two on the stack, so there is 16-byte argument stack area, below that 8-byte return address on the stack, below that 48 bytes where the dtrtri function saves the call saved registers and below that 120 bytes of local stack in the function.  Before performing a tail call, the function modifies the argument registers to hold whatever is needed, also updates the argument area (those are the two stored 1s in there), then updates stack pointer by that 120 bytes and pops the 48 bytes from the stack and performs the jump (tail call).

You haven't shown the definition of lapack::trtri, whether all the required arguments to Fortran dtrtri_ are actually passed (including the 1L, 1L at the end) because both character variables are single byte).  So, one possibility is that this is a bug in whatever defines the lapack:trtri.  And the second option is that it provides the arguments, but as 32-bit only, which was the gfortran ABI in GCC 7 and earlier, but in GCC 8 it changed to size_t:

See http://gcc.gnu.org/gcc-8/changes.html:
"Character variables longer than HUGE(0) elements are now possible on 64-bit targets. Note that this changes the procedure call ABI for all procedures with character arguments on 64-bit targets, as the type of the hidden character length argument has changed. The hidden character length argument is now of type INTEGER(C_SIZE_T)."

If lapack::trtri etc. is provided by some RHEL package, the bug is in there, if it is provided by something else, it needs to be fixed there.

Comment 3 Jakub Jelinek 2019-05-14 09:18:11 UTC
Also see https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gfortran/Argument-passing-conventions.html where the argument passing is documented.

Comment 4 Paulo Andrade 2019-05-14 13:17:19 UTC
Thanks for confirming it is a not a gcc or gfortran bug.
I will forward the information to the authors of the related code.
Apparently it uses a single blas_int data type, that needs to be updated.

lapack::trtri is defined at 
https://gitlab.com/conradsnicta/armadillo-code/blob/9.400.x/include/armadillo_bits/wrapper_lapack.hpp#L97

Looks like the code (an R wrapper) at https://github.com/RcppCore/RcppArmadillo

needs to define ARMA_BLAS_LONG based on GCC version (or more likely not use blas_int
for the string length argument type)
https://gitlab.com/conradsnicta/armadillo-code/blob/9.400.x/include/armadillo_bits/config.hpp#L71

in its bundled version of armadillo https://github.com/RcppCore/RcppArmadillo/tree/master/inst/include/armadillo_bits

as well as upstream armadillo needs to update the code.

Either way, on a quick hack, defining ARMA_BLAS_LONG pass all tests.

Again, many thanks!

Comment 5 Jakub Jelinek 2019-05-14 13:41:20 UTC
I don't think changing what blas_int is is the correct fix, the problem is that all the calls to Fortran functions/subroutines that have CHARACTER arguments need extra arguments (and the type of those arguments needs to be then dependent on the gfortran version), for older gfortran int, for newer size_t and the callers need to be adjusted to actually pass those extra arguments.

Comment 6 Conrad S 2019-05-14 13:58:20 UTC
I'm the upstream author of Armadillo. It's not exactly clear what is the bug here.

Armadillo doesn't use any extra hidden arguments when calling Fortran functions from BLAS and LAPACK.

In Armadillo, lapack::trtri(&uplo, &diag, &n, out.memptr(), &n, &info) is defined as:

```
  template<typename eT>
  inline
  void
  trtri(char* uplo, char* diag, blas_int* n, eT* a, blas_int* lda, blas_int* info)
    {
    arma_type_check(( is_supported_blas_type<eT>::value == false ));
    
    if(is_float<eT>::value)
      {
      typedef float T;
      arma_fortran(arma_strtri)(uplo, diag, n, (T*)a, lda, info);
      }
    else
    if(is_double<eT>::value)
      {
      typedef double T;
      arma_fortran(arma_dtrtri)(uplo, diag, n, (T*)a, lda, info);
      }
    ... // more declarations for complex number versions
    }
```

In turn, arma_fortran(arma_dtrtri) is defined as:

```
void arma_fortran(arma_dtrtri)(const char* uplo, const char* diag, const blas_int* n, double* a, const blas_int* lda, blas_int* info);
```

There is no hidden variable here.
The only difference between lapack::trtri() and arma_fortran(arma_dtrtri)() is the use of const char* instead of char*.

arma_fortran() is a macro that converts the name `arma_dtrtri` to `dtrtri_` (note the underscore), for the use case of Armadillo embedded within RcppArmadillo.

So what exactly is incorrect here?

Comment 7 Conrad S 2019-05-14 14:05:52 UTC
I'm not familiar with "DTS 8.0 RHEL 6", though it seems to provide newer compilers (gcc 8) on the (rather old) RHEL 6.

Is it possible that there is a mismatch between the ABI provided by BLAS/LAPACK libraries on RHEL 6 (compiled with gcc 4.4?) and the assumed ABI by gcc 8?

Or perhaps "DTS 8.0 RHEL 6" provides its own BLAS/LAPACK libraries compiled with gcc 8 ?

Comment 8 Paulo Andrade 2019-05-14 15:03:59 UTC
  Reopening the bug.

  I got confused, due to not understanding fortran internals and the C bindings,
and assumed the two extra integer arguments were the string lengths while attempting
to locate myself :)

  This needs to be addressed by gfortran experts/developers.

  The source code dtrtri.f earlier attached is just a cut&paste from
https://raw.githubusercontent.com/wch/r-source/tags/R-3-5-2/src/modules/lapack/dlapack.f
so, I do not see clearly where is the hidden binding for the argument telling
the strings length.

  Marek, can you make it clear? Or forward to gfortran developers?

  From my understanding, the problem is now being hidden with
https://github.com/nlmixrdevelopment/RxODE/commit/c02dbadcc4a6d7d4e43408ba3fd3577a82c34b24
For extra data I see

https://github.com/nlmixrdevelopment/RxODE/issues/84 
https://github.com/RcppCore/RcppArmadillo/issues/254

  The problem happens with gcc-8.3 because of the "movq $1,0xXX(%rsp)" while other
gcc-8.x generate different code; the other workaround of using -fno-optimize-sibling-calls
is again apparently only hiding the problem, that somewhat match code generated by other
gcc-8.x.

Comment 9 Jakub Jelinek 2019-05-14 15:14:21 UTC
See the https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gfortran/Argument-passing-conventions.html I've mentioned above, which explains it in detail.
The arguments are:
      SUBROUTINE DTRTRI( UPLO, DIAG, N, A, LDA, INFO )
      CHARACTER          DIAG, UPLO
      INTEGER            INFO, LDA, N
      DOUBLE PRECISION   A( LDA, * )
which the compiler passes as if it was a C function
void dtrtri_ (char *uplo, char *diag, int *n, double *a, int *lda, int *info, size_t _uplo, size_t _diag)
where those _uplo and _diag arguments are the hidden arguments described above, added for the character arguments.
Those arguments should be passed the length of the corresponding character arguments, so 1L, 1L in this case.

Comment 10 Jakub Jelinek 2019-05-14 15:20:01 UTC
From what I see in the issues you've linked, it seems the upstream of whatever C/C++ wrappers call these fortran functions doesn't know about the hidden arguments at all, the patch to disable tail calls is just a hack, the undefined behavior of not calling those functions properly is still there and can break any time.

Comment 11 Conrad S 2019-05-14 15:54:06 UTC
@Jakub Jelinek:  The problem with these "hidden variables" is that they appear completely compiler dependent.

Every example on the internet which shows how to call LAPACK functions from C or C++ has absolutely no mention of these hidden variables.

I suspect there are many other libraries and frameworks which call LAPACK functions in exactly the same fashion as Armadillo.  Without checking their source code I can't say for certain, but I suspect this would include widely used packages such as Octave, R, as well as scipy and/or numpy (Python).

It's also not clear why Armadillo "just works" on Fedora 29 (x86-64) which has g++ 8.3.  I'm tempted to think that things would be falling apart left, right and center if these hidden variables are not being taken into account.

Comment 12 Jakub Jelinek 2019-05-14 16:10:54 UTC
gfortran works that way since forever, e.g. in gfortran 4.0 when gfortran has been added to GCC.
And, from what I can see, Intel ifort does the same, have a look at:
https://fortran.godbolt.org/z/UWM5RP
The thing is, if you have character(len=*), then some kind of hidden argument is required, if you don't pass the length, then the callee doesn't know how long the character dummy argument is.
And, usually the Fortran ABIs are designed in a way that whether the length is * or explicit constant doesn't affect the ABI, so you can in the caller use len=* and in the callee some particular length or vice versa.
If you look at the assembly in the above godbolt link, f1 shows that with len=* there is really no other way but to return that hidden argument, for len=5 the compiler can optimize and ignore that passed argument, but it is still just an optimization and undefined behavior if you don't pass it.  And f3 shows that both gfortran and ifort when calling a function with len=5 character dummy argument the hidden argument is passed.

Comment 13 Jakub Jelinek 2019-05-14 16:18:11 UTC
In xlc fortran https://www-01.ibm.com/support/docview.wss?uid=swg27045454&aid=9 the same thing is documented as well:
"Whenever Fortran passes a string argument to another routine, it appends ahidden argument that provides the length of the string argument. This lengthargument must be explicitly declared in C. The C code should not assume a nullterminator; the supplied or declared length should always be used."
So, can you show a compiler that doesn't actually use hidden string lengths?

Comment 14 Conrad S 2019-05-14 16:47:01 UTC
The ABI change is rather disconcerting, as it breaks existing workflows/assumptions that were built up over decades (especially how to call BLAS and LAPACK functions from C). Worse, for users not familiar with compiler internals, it's not immediately clear how things break or why.

It is rather surprising that a lot of software worked (and seemingly continues to work) even when not taking into account these pesky hidden variables. If we take the example of Armadillo, I still can't reproduce the issue on Fedora 29.

It seems that the R development team is taking the stance of enabling "-fno-optimize-sibling-calls" to disable problematic optimisations for Fortran code.
https://github.com/RcppCore/RcppArmadillo/issues/254#issuecomment-492278697

It might be pragmatic to make -fno-optimize-sibling-calls the default when compiling Fortran code, as it appears many people are still unaware of the implications of the recent ABI change.

Comment 15 Jakub Jelinek 2019-05-14 16:57:19 UTC
That is misunderstanding what happens.  You could get the exact same problem before the ABI change, just in that case the two extra arguments don't have size_t types, but int, but could very well be modified any time, with or without -fno-optimize-sibling-calls, the argument slots are owned by the callee, not the caller and if you don't pass the arguments at all, whatever ends up on the stack at that point might be modified.  You might be lucky and nothing important is in there, or unlucky and you get crashes, silent wrong code.

The only fix is pass the hidden string length arguments, anything else might appear to work, but might break next month or year.

It is just functions/subroutines that have character dummy arguments, or functions that return character, or some dummy arguments with OPTIONAL and VALUE, or COARRAYs that have the hidden extra arguments, the rest (which is very likely most of the LAPACK etc.) don't.

Comment 16 Conrad S 2019-05-14 17:17:22 UTC
I'm not clear on "character dummy arguments" -- there are many LAPACK functions which take character arguments, and they're used for carrying information (so they're not "dummy" in the sense of being useless).

The problem with the ABI change in gfortan is two-fold:

1. RHEL 6 has been in production for a long while (since 2010) and people expect things to work as they have for the past 10 years. devtoolset-8-gcc-8.3.1 breaks this expectation. Surely RHEL is about stability, no?  What is the practical gain for creating breakages in user-code?  Similar argument applies to RHEL 7, which is supposed to last 2024.

2. Things worked seemingly without problems under previous assumptions for decades when using C/C++ to call LAPACK functions with character arguments (where each character argument holds only 1 character). Since there is no official fortran ABI, the de-facto use establishes how the functions should behave in such cases. So why break this assumption?  What is the practical gain here?  Surely there must be a way to acknowledge that "1 character" arguments are a common enough case that should work as established via de-facto use, and that special handling should be taken.

Comment 17 Jakub Jelinek 2019-05-14 18:15:13 UTC
Dummy argument is a standard fortran term (C/C++ talks just about function arguments).
The ABI change only changed the size of these hidden arguments, not that they need to be passed.
All of gfortran, ifort, xlc fortran ABIs need those arguments to be passed, it is not something specific to gfortran, or even latest gfortran.
Developer toolset is a product that allows using a newer compiler than the system compiler (or other parts of the toolchain), there was no ABI change for software compiled with the system gfortran.
De-facto all the compilers need those arguments, the fact that sometimes you are lucky doesn't prove anything.
It is the same thing as defining
void foo (int a, int b, int c, int d, int e, int f, int g, int h) { ... }
in C in one TU and declaring
void foo (int a);
in another TU and calling it foo (5);
You might be lucky if it "works", or it can crash, silently misbehave.
If you don't want to use a hidden length argument for 1 character arguments, use C binding on the Fortran side.

Comment 18 Conrad S 2019-05-14 18:21:00 UTC
Directly relevant discussion (and summary of context) at GCC Bugzilla:

Incompatibility between gfortran and C lapack calls 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90329

Quote:

I would suggest that we make -fno-optimize-sibling-calls
the default on these branches.  Maintaining binary compatibility
(even if it is bug compatibility) with existing packages is
something we should strive for, especially with such
important software packages as BLAS and LAPACK. These packages
are one important reason why people still use Fortran, and
I would hate to push them towards flang with this.

Comment 19 Conrad S 2019-05-15 02:16:58 UTC
This has the potential to affect the just released RHEL 8.

Please do not underestimate of impact of breaking de-facto assumed C interfaces to BLAS and LAPACK.

The vast majority of scientific/engineering software uses BLAS and LAPACK (and/or OpenBLAS) either directly or indirectly.  (There is large chain of dependencies on LAPACK et al in a modern Linux distro).

The same applies for probably a decent chunk of business process software that does any kind of data analysis (eg. statistical sales trends, etc). This includes software provided by ISVs for the RHEL platform.

Software in C and C++ (and likely Java) has been written over _several decades_ with the assumption that existing de-facto C interfaces to Fortran BLAS and LAPACK routines "just work".

Whether this relied on "undefined behaviour" is besides the point.  The behaviour exhibited by gcc 4.x, 5, 6 (and 7?) is in fact the de-facto correct behaviour when encountering LAPACK functions with single char variables.

It would be a reasonable assumption to state that the vast majority of C and C++ developers of user software (and probably most Fortran developers) are completely unaware of the hidden variables within the Fortran ABI.

From a user's point of view, software that uses LAPACK, and has worked without problems on RHEL 4, 5, 6, 7, is not broken. Along comes RHEL 8 and breaks things. The user would be right to think that RHEL 8 has a regression, and not their software.

Rather than sticking to the retort of "but you've relied on effectively undefined behaviour for 30 years!", I would suggest taking a more pragmatic approach and acknowledging that the de-facto assumed C interfaces are in fact what people want. In that sense, the de-facto assumed C interfaces are the "correct" ones.

There is precedent for doing this. The Linux kernel has a policy of not breaking user space, even if that means working around bugs in user space.

Quoting Linus Torvalds, via https://felipec.wordpress.com/2011/11/21/no-project-is-more-important-than-its-users/
"The biggest thing any program can do is not the technical details of the program itself; it’s how useful the program is to users. So any time any program (like the kernel or any other project), breaks the user experience, to me, that’s the absolute worst failure that a software project can make."

Comment 20 Paulo Andrade 2019-05-15 12:53:11 UTC
I believe this is basically a summary of the problem:

o Historically fortran to C wrappers used to assume variadic C functions;
o Not all abis can work as variadic and non variadic; extra arguments passed
  on registers and stack at the end of prototype;
o Fortran kind of has/use a variadic prototype, with '*';
o The spec says string length must be passed as a declared C parameter;
o The problem is not really with strings, but one character arguments. ? (question)
  If that is the case, this is another inconsistency.

The "correct" fix would be to fix all wrapper prototypes to include
the string length parameter(s), so code can be optimized, run faster, etc.

The "incorrect" fix would be to assume all wrapper declarations are variadic,
this would create incompatibility with code the follows the spec declaring the
parameter. And likely would require some kind of ugly hack somewhere between
gfortan and gcc to work on all architectures.

A third option would be for all projects using wrappers to manually mark
functions that pass hidden arguments as variadic, e.g.
void trtri(char* uplo, char* diag, blas_int* n, eT* a, blas_int* lda, blas_int* info, ...)
still not being compatible with code that follows the spec.

All 3 options would still be hiding the problem, create incompatible libraries
and break at some point, with very difficult to debug problems. The "correct"
fix means fixing prototypes everywhere, in every project, what may not happen
fast enough. The recent gfortran change to use size_t instead of int32_t for
the string length just made the problem more visible, but it already existed.

Comment 21 Jakub Jelinek 2019-05-15 13:03:56 UTC
It is actually not variadic vs. non-variadic, that is something that has been wrong just inside of the compiler and that prevented tail call optimization.
The hidden string length arguments are not variadic, they are simply artificial arguments added to the end of the normal argument list, one for each character argument (and similar other cases).

The correct fix is to change the prototypes in C/C++ from
void trtri(char* uplo, char* diag, blas_int* n, eT* a, blas_int* lda, blas_int* info);
to
void trtri(char* uplo, char* diag, blas_int* n, eT* a, blas_int* lda, blas_int* info, blas_charlen _uplo, blas_charlen _diag);
etc., where
#ifdef __GNUC__
#ifdef __GNUC__ > 7
typedef __SIZE_TYPE__ blas_charlen;
#else
typedef int blas_charlen;
#endif
#else
// whatever type is needed for __INTEL_COMPILER, xlC or whatever other compilers you want to support.
#endif
and call trtri(uplo, diag, n, a, lda, info, 1, 1); instead of trtri(uplo, diag, n, a, lda, info);
Whether the argument values are 1 or something else depends on the actual length of the passed strings.

Comment 22 Thomas König 2019-05-15 13:26:31 UTC
The _real real_ fix would be for everybody to use the standard
C binding interface, which has been around since Fortran 2003
and is guaranteed to work.  A BIND(C) Fortran procedure actually
has the same function declaration that people have been
expecting on the C side, so

      SUBROUTINE FOO(C)
      CHARACTER*1 C
      END

would correspond to void foo(char +c), so no hidden
character lenghts.

Drawbacks: This would require changes to existing
Fortran 77 code, and it is not possible to pass a
Fortran character with length > 1 to a BIND(C) routine.
It _is_ possible to pass an array of characters, so
this

      SUBROUTINE FOO(C)
      CHARACTER*1 C(*)
      END

would work.

The next best thing is to add prototypes which matches what
compilers are emitting. They are actually quite consistent,
going back at least to f2c. For a few days, gfortran has had
an option on trunk and gcc-9 which emits these prototypes,
to facilitate exactly this task.

So, run

rm all.h
for a in *.f; do gfortran -fsyntax-only -fc-prototypes-external $a >> all.h; done

#include "all.h" in your C/C++ sources and fix any breakage that
may occur. (This feature is extremely new, so if you encounter
a bug, please let fortran.org know).  You might want to pipe
the output through sed to replace any occurrence of size_t with whatever
type name you prefer for your string length.

Comment 23 Conrad S 2019-05-16 06:07:59 UTC
@Thomas König - Thanks for the suggestions. Unfortunately adding the BIND(C)
declaration for externally facing BLAS and LAPACK functions may result in
breaking software that actually takes into account the hidden variables.

Usage of both versions of dtrtri() should be possible,
ie. with and without using the "hidden" variables:
dtrtri(uplo, diag, n, a, lda, info);
dtrtri(uplo, diag, n, a, lda, info, 1, 1);
where uplo and diag are the single char variables.

These usage patterns has worked for many years (especially the first one),
so there is an expectation that they would continue working.

The sheer amount of software using LAPACK et al
makes applying the real solutions difficult.
It's an unfortunate situation, but here we are.

Comment 25 Jakub Jelinek 2019-05-29 16:19:03 UTC
Workaround backported upstream in http://gcc.gnu.org/r271744 (for GCC 8) and http://gcc.gnu.org/r271743 (for GCC 9).

Comment 26 Jakub Jelinek 2019-05-30 09:39:14 UTC
And http://gcc.gnu.org/r271772 (GCC 8) / http://gcc.gnu.org/r271771 (GCC 9).

Comment 27 Marek Polacek 2019-05-30 12:51:38 UTC
Further typo fix:  http://gcc.gnu.org/r271779

Comment 28 Marek Polacek 2019-07-09 19:09:16 UTC
Patches available.

Comment 32 errata-xmlrpc 2019-12-10 07:49:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:4134


Note You need to log in before you can comment on or make changes to this bug.