Bug 1956587 - Fedora 33 GCC 10.3.1 internal compiler error compiling pytorch
Summary: Fedora 33 GCC 10.3.1 internal compiler error compiling pytorch
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: gcc
Version: 33
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Jakub Jelinek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-05-04 02:10 UTC by Mike Neilly
Modified: 2021-05-06 00:39 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)
ii files for failing GCC 10/NVCC compile (2.83 MB, application/gzip)
2021-05-06 00:22 UTC, Mike Neilly
no flags Details
Preprocessed source generated by crash (436.16 KB, application/gzip)
2021-05-06 00:24 UTC, Mike Neilly
no flags Details


Links
System ID Private Priority Status Summary Last Updated
GNU Compiler Collection 100240 0 P3 RESOLVED Compiler crashes with segmentation fault on a chrono library using nvcc 2021-05-06 00:39:57 UTC

Description Mike Neilly 2021-05-04 02:10:15 UTC
Description of problem:

I'm trying to build pytorch on Fedora 33 with cuda-toolkit-11-2 and receiving an "internal compiler error" as follows:

#8 327.9 [ 60%] Built target nccl_slim_external
#8 340.4 /usr/include/c++/10/chrono: In substitution of ‘template<class _Rep, class _Period> template using __is_harmonic = std::__bool_constant<(std::ratio<((_Period
2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) * (_Period::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, Period::den))), ((
Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) * (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num))
)>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]’:
#8 340.4 /usr/include/c++/10/chrono:473:154: required from here
#8 340.4 /usr/include/c++/10/chrono:428:27: internal compiler error: Segmentation fault
#8 340.4 428 | _S_gcd(intmax_t __m, intmax_t __n) noexcept
#8 340.4 | ^~~~~~
#8 340.4 Please submit a full bug report,
#8 340.4 with preprocessed source if appropriate.
#8 340.4 See http://bugzilla.redhat.com/bugzilla for instructions.


Version-Release number of selected component (if applicable):

Ferdora 33, GCC 10.3.1

How reproducible:

Create Dockerfile as follows:

FROM fedora:33

RUN dnf install -y \
    dnf-plugins-core \
    git \
    g++ \
    cmake3 \
    python3-wheel \
    python3-pyyaml \
    python3-devel \
    python3-typing-extensions \
    wget

RUN dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora33/x86_64/cuda-fedora33.repo

RUN dnf install -y cuda-toolkit-11-2


Then 

docker build -t fedora33pt .
docker run -it fedora33pt /bin/bash
cd /tmp
git clone https://github.com/pytorch/pytorch.git
cd pytorch
git checkout 82d245faef88c7aa5d5c47771801789c85a2165b # the version I used may or may not be necessary...
git submodule update --init --recursive
python3 setup.py bdist_wheel



Actual results:

internal compiler error: segmentation fault


Expected results:

no compiler error



Additional info:

Comment 1 Marek Polacek 2021-05-05 18:06:14 UTC
Is there a chance you can provide the preprocessed source file?  Just run the compiler incantation that fails with -save-temps and post the .ii file here, thanks.

Comment 2 Mike Neilly 2021-05-06 00:22:33 UTC
Created attachment 1780040 [details]
ii files for failing GCC 10/NVCC compile

Comment 3 Mike Neilly 2021-05-06 00:24:42 UTC
Created attachment 1780041 [details]
Preprocessed source generated by crash

Comment 4 Mike Neilly 2021-05-06 00:25:37 UTC
The command used to generate the .ii files.

[root@30a48f0aa8f1 gloo_cuda.dir]# pwd
/tmp/pytorch/build/third_party/gloo/gloo/CMakeFiles/gloo_cuda.dir
[root@30a48f0aa8f1 gloo_cuda.dir]# /usr/local/cuda/bin/nvcc /tmp/pytorch/third_party/gloo/gloo/cuda_private.cu -c -o /tmp/pytorch/build/third_party/gloo/gloo/CMakeFiles/gloo_cuda.dir//./gloo_cuda_generated_cuda_private.cu.o -ccbin /usr/bin/cc -m64 -Xfatbin -compress-all -DONNX_NAMESPACE=onnx_torch -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -gencode arch=compute_86,code=compute_86 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=integer_sign_change,--diag_suppress=useless_using_declaration,--diag_suppress=set_but_not_used,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=implicit_return_from_non_void_function,--diag_suppress=unsigned_compare_with_zero,--diag_suppress=declared_but_not_referenced,--diag_suppress=bad_friend_decl -std=c++14 -Xcompiler -fPIC --expt-relaxed-constexpr --expt-extended-lambda -Wno-deprecated-gpu-targets -Xcudafe --diag_suppress=cc_clobber_ignored -Xcudafe --diag_suppress=integer_sign_change -Xcudafe --diag_suppress=useless_using_declaration -Xcudafe --diag_suppress=set_but_not_used -DNVCC -I/usr/local/cuda/include -I/tmp/pytorch/cmake/../third_party/googletest/googlemock/include -I/tmp/pytorch/cmake/../third_party/googletest/googletest/include -I/tmp/pytorch/third_party/protobuf/src -I/tmp/pytorch/third_party/gemmlowp -I/tmp/pytorch/third_party/neon2sse -I/tmp/pytorch/third_party/XNNPACK/include -I/tmp/pytorch/cmake/../third_party/benchmark/include -I/tmp/pytorch/third_party -I/tmp/pytorch/cmake/../third_party/eigen -I/usr/include/python3.9 -I/tmp/pytorch/cmake/../third_party/pybind11/include -I/tmp/pytorch/third_party/gloo -I/tmp/pytorch/build/third_party/gloo -save-temps
/usr/include/c++/10/chrono: In substitution of ‘template<class _Rep, class _Period> template<class _Period2> using __is_harmonic = std::__bool_constant<(std::ratio<((_Period2::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)) * (_Period::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den))), ((_Period2::den / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::den, _Period::den)) * (_Period::num / std::chrono::duration<_Rep, _Period>::_S_gcd(_Period2::num, _Period::num)))>::den == 1)> [with _Period2 = _Period2; _Rep = _Rep; _Period = _Period]’:
/usr/include/c++/10/chrono:473:154:   required from here
/usr/include/c++/10/chrono:428:27: internal compiler error: Segmentation fault
  428 |  _S_gcd(intmax_t __m, intmax_t __n) noexcept
      |                           ^~~~~~
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
Preprocessed source stored into /tmp/cceKnkJW.out file, please attach this to your bugreport.

Comment 5 Mike Neilly 2021-05-06 00:26:42 UTC
I do see there is a GCC issue filed on this failing on Ubuntu as well that I hadn't seen before: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100240

Comment 6 Marek Polacek 2021-05-06 00:39:57 UTC
Great, thanks.  It's probably the same problem as in PR100240 though this ICE started with r231913.


Note You need to log in before you can comment on or make changes to this bug.