Bug 1956587 - Fedora 33 GCC 10.3.1 internal compiler error compiling pytorch
Summary: Fedora 33 GCC 10.3.1 internal compiler error compiling pytorch
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: gcc
Version: 33
Hardware: x86_64
OS: Linux
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
Reported: 2021-05-04 02:10 UTC by Mike Neilly
Modified: 2021-05-06 00:39 UTC (History)
10 users (show)

ii files for failing GCC 10/NVCC compile (2.83 MB, application/gzip)
2021-05-06 00:22 UTC, Mike Neilly
Preprocessed source generated by crash (436.16 KB, application/gzip)
2021-05-06 00:24 UTC, Mike Neilly
GNU Compiler Collection 100240 0 P3 RESOLVED Compiler crashes with segmentation fault on a chrono library using nvcc 2021-05-06 00:39:57 UTC

Description Mike Neilly 2021-05-04 02:10:15 UTC
Description of problem:

I'm trying to build pytorch on Fedora 33 with cuda-toolkit-11-2 and receiving an "internal compiler error" as follows:

#8 327.9 [ 60%] Built target nccl_slim_external
#8 340.4 /usr/include/c++/10/chrono: In substitution of ‘template<class _Rep, class _Period> template using __is_harmonic = std::__bool_constant<(std::ratio<((_Period
2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) * (_Period::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, Period::den))), ((
Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) * (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num))
)>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]’:
#8 340.4 /usr/include/c++/10/chrono:473:154: required from here
#8 340.4 /usr/include/c++/10/chrono:428:27: internal compiler error: Segmentation fault
#8 340.4 428 | _S_gcd(intmax_t __m, intmax_t __n) noexcept
#8 340.4 | ^~~~~~
#8 340.4 Please submit a full bug report,
#8 340.4 with preprocessed source if appropriate.
#8 340.4 See http://bugzilla.redhat.com/bugzilla for instructions.

Version-Release number of selected component (if applicable):

Ferdora 33, GCC 10.3.1

How reproducible:

Create Dockerfile as follows:

FROM fedora:33

RUN dnf install -y \
    dnf-plugins-core \
    git \
    g++ \
    cmake3 \
    python3-wheel \
    python3-pyyaml \
    python3-devel \
    python3-typing-extensions \

RUN dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora33/x86_64/cuda-fedora33.repo

RUN dnf install -y cuda-toolkit-11-2


docker build -t fedora33pt .
docker run -it fedora33pt /bin/bash
cd /tmp
git clone https://github.com/pytorch/pytorch.git
cd pytorch
git checkout 82d245faef88c7aa5d5c47771801789c85a2165b # the version I used may or may not be necessary...
git submodule update --init --recursive
python3 setup.py bdist_wheel

Actual results:

internal compiler error: segmentation fault

Expected results:

no compiler error

Additional info:

Comment 1 Marek Polacek 2021-05-05 18:06:14 UTC
Is there a chance you can provide the preprocessed source file?  Just run the compiler incantation that fails with -save-temps and post the .ii file here, thanks.

Comment 2 Mike Neilly 2021-05-06 00:22:33 UTC
Created attachment 1780040 [details]
ii files for failing GCC 10/NVCC compile

Comment 3 Mike Neilly 2021-05-06 00:24:42 UTC
Created attachment 1780041 [details]
Preprocessed source generated by crash

Comment 4 Mike Neilly 2021-05-06 00:25:37 UTC
The command used to generate the .ii files.

[root@30a48f0aa8f1 gloo_cuda.dir]# pwd
[root@30a48f0aa8f1 gloo_cuda.dir]# /usr/local/cuda/bin/nvcc /tmp/pytorch/third_party/gloo/gloo/cuda_private.cu -c -o /tmp/pytorch/build/third_party/gloo/gloo/CMakeFiles/gloo_cuda.dir//./gloo_cuda_generated_cuda_private.cu.o -ccbin /usr/bin/cc -m64 -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl -std=c++14 -Xcompiler -fPIC --expt-relaxed-constexpr --expt-extended-lambda -Wno-deprecated-gpu-targets -Xcudafe --diag_suppress=cc_clobber_ignored -Xcudafe --diag_suppress=integer_sign_change -Xcudafe --diag_suppress=useless_using_declaration -Xcudafe --diag_suppress=set_but_not_used -DNVCC -I/usr/local/cuda/include -I/tmp/pytorch/cmake/../third_party/googletest/googlemock/include -I/tmp/pytorch/cmake/../third_party/googletest/googletest/include -I/tmp/pytorch/third_party/protobuf/src -I/tmp/pytorch/third_party/gemmlowp -I/tmp/pytorch/third_party/neon2sse -I/tmp/pytorch/third_party/XNNPACK/include -I/tmp/pytorch/cmake/../third_party/benchmark/include -I/tmp/pytorch/third_party -I/tmp/pytorch/cmake/../third_party/eigen -I/usr/include/python3.9 -I/tmp/pytorch/cmake/../third_party/pybind11/include -I/tmp/pytorch/third_party/gloo -I/tmp/pytorch/build/third_party/gloo -save-temps
/usr/include/c++/10/chrono: In substitution of ‘template<class _Rep, class _Period> template<class _Period2> using __is_harmonic = std::__bool_constant<(std::ratio<((_Period2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) * (_Period::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den))), ((_Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) * (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)))>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]’:
/usr/include/c++/10/chrono:473:154:   required from here
/usr/include/c++/10/chrono:428:27: internal compiler error: Segmentation fault
  428 |  _S_gcd(intmax_t __m, intmax_t __n) noexcept
      |                           ^~~~~~
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
Preprocessed source stored into /tmp/cceKnkJW.out file, please attach this to your bugreport.

Comment 5 Mike Neilly 2021-05-06 00:26:42 UTC
I do see there is a GCC issue filed on this failing on Ubuntu as well that I hadn't seen before: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100240

Comment 6 Marek Polacek 2021-05-06 00:39:57 UTC
Great, thanks.  It's probably the same problem as in PR100240 though this ICE started with r231913.

