Bug 2227061 - uit64_t-variant patch inteferes with rocfft build
Summary: uit64_t-variant patch inteferes with rocfft build
Keywords:
Status: NEW
Alias: None
Product: Fedora
Classification: Fedora
Component: rocclr
Version: 39
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Jeremy Newton
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-07-27 16:00 UTC by Tim Flink
Modified: 2023-08-16 21:30 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Tim Flink 2023-07-27 16:00:50 UTC
There is a patch in the rocfft package which, according to comments, was added to help with building blender. It turns out that the patch interferes with the current packaging effort for rocfft in that the rocfft kernel cache building doesn't work. The upstream issue is https://github.com/ROCmSoftwarePlatform/rocFFT/issues/422

The patch in question is https://src.fedoraproject.org/rpms/rocclr/blob/rawhide/f/0001-add-uint64_t-variant-for-__ffsll.patch

Reproducing is simple - build the upstream project on a Fedora system using Fedora packaged dependencies and the build process will fail. If rocclr is rebuilt without the uint64_t-variant patch, the rocfft build finishes (eventually - the kernel cache process takes a long time - at least an hour on my system).

Disabling the kernel cache isn't a realistic solution because of the time required to build the kernels and if those kernels aren't cached, they will be built at runtime. On my system (ryzen 7 5700X, 64GB memory), the kernel cache build takes over an hour. While building the kernels at runtime wouldn't take that long, it still seems like an unreasonable demand of users.

Additionally, I have been trying to triage an issue with rocfft that was exclusive to Fedora - built without the cached kernels. Simple code from the documentation (https://rocfft.readthedocs.io/en/rocm-5.6.0/#example) would throw errors and 100% of the test suite would fail with rocfft built against rocclr as packaged in Fedora. It isn't filed anywhere because I was still triaging the issue.

When I built rocfft against the de-patched rocclr, the example code no longer errors out and produces the same results as when it is built and run on Debian sid built with Debian packaged dependencies and RHEL 9.2 built with AMD supplied dependencies. Additionally, the test suite passes when built against rocfft without the patch.

I can provide details on the runtime errors I was seeing if desired.

Comment 1 Fedora Release Engineering 2023-08-16 08:06:01 UTC
This bug appears to have been reported against 'rawhide' during the Fedora Linux 39 development cycle.
Changing version to 39.

Comment 2 Jeremy Newton 2023-08-16 19:42:22 UTC
I believe Tom Rix landed this patch and I pulled it in.
I can revert it, but I would need to look into this a bit more to understand the issue.

Sorry I've been sick so I'm un-burying myself in unanswered emails.

Comment 3 Tom Rix 2023-08-16 21:30:16 UTC
This change was to fix a build error with blender.
Reverting it will likely break blender again.
IIRC - the default hander was a template for any int type, the rocm handler handled just 2 type.
A better solution would be for the rocm handler to be more like the template handler and handle any int type.


Note You need to log in before you can comment on or make changes to this bug.