Bug 2216594 - Blender GPU rendering silently crashes with HIP enabled
Summary: Blender GPU rendering silently crashes with HIP enabled
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: rocclr
Version: 38
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Jeremy Newton
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-06-22 00:09 UTC by Luya Tshimbalanga
Modified: 2023-07-01 03:08 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Embargoed:


Attachments (Terms of Use)
Backtrace using sysprof (1.20 MB, model/x.stl-binary)
2023-06-22 00:10 UTC, Luya Tshimbalanga
no flags Details

Description Luya Tshimbalanga 2023-06-22 00:09:21 UTC
When enabling HIP rendering with the latest rocm-hip update (https://koji.fedoraproject.org/koji/buildinfo?buildID=2217754) and applied patch on Blender (https://koji.fedoraproject.org/koji/buildinfo?buildID=2218279), the rendering silently failed without a backtrace

Reproducible: Always

Steps to Reproduce:
1.Install Blender and rocm-hip component then start the application
2.Make sure HIP is enabled for the AMD hardware and switch rendering to GPU Compute
3.Render a model
Actual Results:  
Silent failure

Expected Results:  
Rendering should be successful

See attachment

Comment 1 Luya Tshimbalanga 2023-06-22 00:10:33 UTC
Created attachment 1972007 [details]
Backtrace using sysprof

Comment 2 Luya Tshimbalanga 2023-06-22 00:14:57 UTC
Adding Tom Rix to take a look at the crash.

Comment 3 Tom Rix 2023-06-22 03:09:32 UTC
is this a blender build --with rocm ?
The other recent change on blender is toolchain changed to clang, has that been ruled out ?
i have local blender build going with the latest rocclr.  will try to reproduce this in the morning.

Comment 4 Luya Tshimbalanga 2023-06-22 06:22:12 UTC
Yes according to the spec file (rawhide as an exampel): https://src.fedoraproject.org/rpms/blender/c/caea81a044c4574dcfae5da78c1488df9727dc03?branch=rawhide
The build still uses gcc compiler by default and clang for rocclr component from this log: https://kojipkgs.fedoraproject.org//packages/blender/3.5.1/7.fc38/data/logs/x86_64/build.log

Comment 5 Tom Rix 2023-06-22 13:00:33 UTC
my system is rawhide, so bear with me, i know the problem was reported on f38.
Local building blender with and without -with rocm works now
mock of the default fails with
Error: Transaction test error:
  file /usr/lib64/libpcre.so conflicts between attempted installs of pcre-devel-8.45-1.fc38.3.x86_64 and openCOLLADA-devel-1.6.70-4.fc39.x86_64

The basic startup of blender works for both
can you suggest a file to load or a rendering stress test I could run ?

Comment 6 Luya Tshimbalanga 2023-06-22 15:01:41 UTC
Any file using cycle rendering like https://www.blender.org/download/demo-files/#cycles

Comment 7 Luya Tshimbalanga 2023-06-22 15:03:07 UTC
I think the issue may affect rawhide as well.

Comment 8 Tom Rix 2023-06-25 23:50:25 UTC
my card maybe too old to run, it falls back to cpu :(
rocminfo
  Name:                    gfx803                             
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon RX 550 / 550 Series     
  Vendor Name:             AMD

Comment 9 Luya Tshimbalanga 2023-06-26 01:57:24 UTC
Your card is a Polaris which is unsupported by rocclr (Minimum requirement is at least gfx900 (Vega series)).

Name:                    gfx1030                            
Uuid:                    GPU-XX               
Marketing Name:          AMD Radeon RX 6950 XT              
Vendor Name:             AMD   

The issue also affect APU like Ryzen 7 5825U.

Comment 10 Luya Tshimbalanga 2023-06-27 15:34:05 UTC
(In reply to Tom Rix from comment #5)
> my system is rawhide, so bear with me, i know the problem was reported on
> f38.
> Local building blender with and without -with rocm works now
> mock of the default fails with
> Error: Transaction test error:
>   file /usr/lib64/libpcre.so conflicts between attempted installs of
> pcre-devel-8.45-1.fc38.3.x86_64 and openCOLLADA-devel-1.6.70-4.fc39.x86_64
> 
> The basic startup of blender works for both
> can you suggest a file to load or a rendering stress test I could run ?

I forgot to mention you can use mock to build Fedora using the command: "fedpkg --release=f38 scratch-build --target=f38-build-side-69204 --arch=x86_64 --srpm"

Comment 11 Luya Tshimbalanga 2023-07-01 03:08:33 UTC
Detailed traceback from journalctl when running GPU cycle (in this example on Radeon RX 6950XT)

Jun 20 16:54:51 systemd-coredump[72648]: Process 72540 (blender) of user 1000 dumped core.
                                         Module blender from rpm blender-3.5.1-7.fc38.x86_64
                                         #1  0x0000557fd8ddf657 _ZL14print_resourceRSoRKN7blender3gpu6shader16ShaderCreateInfo8ResourceEb (blender + 0x2bca657)
                                         #2  0x0000557fd8e2ac78 _ZNK7blender3gpu8GLShader17resources_declareB5cxx11ERKNS0_6shader16ShaderCreateInfoE (blender + 0x2c15c78)
                                         #3  0x0000557fd8d78125 GPU_shader_create_from_info (blender + 0x2b63125)
                                         #4  0x0000557fd72368bd OVERLAY_grid_cache_init (blender + 0x10218bd)
                                         #5  0x0000557fd723bff1 _ZL18OVERLAY_cache_initPv.lto_priv.0 (blender + 0x1026ff1)
                                         #6  0x0000557fd71ae2d5 drw_engines_cache_init.lto_priv.0 (blender + 0xf992d5)
                                         #7  0x0000557fd71e7e64 DRW_draw_render_loop_2d_ex (blender + 0xfd2e64)
                                         #8  0x0000557fd7f37907 image_main_region_draw.lto_priv.0 (blender + 0x1d22907)
                                         #9  0x0000557fd75c0ad5 ED_region_do_draw (blender + 0x13abad5)
                                         #10 0x0000557fd7074b65 wm_draw_update (blender + 0xe5fb65)
                                         #11 0x0000557fd6ad943c main (blender + 0x8c443c)
                                         #14 0x0000557fd6b20835 _start (blender + 0x90b835)
Jun 20 16:54:49 audit[72540]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=3 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=72540 comm="blender" exe="/usr/bin/blender" sig=11 res=1
Jun 20 16:54:49 kernel: blender[72646]: segfault at 0 ip 0000000000000000 sp 00007f869ce66078 error 14 in blender[557fd6215000+459a000] likely on CPU 21 (core 11, socket 0)
Jun 20 16:54:49 audit[72540]: ANOM_ABEND auid=1000 uid=1000 gid=1000 ses=3 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=72540 comm="blender" exe="/usr/bin/blender" sig=11 res=1
Jun 20 16:54:49 blender.desktop[72540]: Writing: /tmp/bmw27_gpu.crash.txt
Jun 20 16:54:49 blender.desktop[72540]: Read blend: /home/luya/Documents/design stuff/blender/bmw27/bmw27_gpu.blend
Jun 20 16:54:49 blender.desktop[72540]: Read prefs: /home/luya/.config/blender/3.5/config/userpref.blend
Jun 20 16:54:43 systemd[2030]: Started app-gnome-blender-72540.scope - Application launched by gnome-shell


Note You need to log in before you can comment on or make changes to this bug.