Bug 2307795 - Plasma Coredump installation of openjph/libopenjph
Summary: Plasma Coredump installation of openjph/libopenjph
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: openjph
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Simone Caronni
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: MultimediaSIG
TreeView+ depends on / blocked
 
Reported: 2024-08-25 16:52 UTC by Gerald Cox
Modified: 2024-12-21 03:40 UTC (History)
14 users (show)

Fixed In Version: openjph-0.18.2-1.el8 openjph-0.18.2-1.fc40 openjph-0.18.2-1.fc41 openjph-0.18.2-1.el9
Clone Of:
Environment:
Last Closed: 2024-12-21 00:31:38 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
plasma coredump (53.19 KB, text/plain)
2024-08-25 16:54 UTC, Gerald Cox
no flags Details
journalctl output for plasmashell (29.25 KB, text/plain)
2024-08-25 17:15 UTC, Gerald Cox
no flags Details
Kcrash report (15.75 KB, text/plain)
2024-08-25 17:31 UTC, Gerald Cox
no flags Details
Crash file after patch applied (15.75 KB, text/plain)
2024-09-05 17:01 UTC, Gerald Cox
no flags Details
patch as requested (746 bytes, patch)
2024-09-05 17:02 UTC, Gerald Cox
no flags Details | Diff
kcrash for libheif recompile with openjph (15.75 KB, text/plain)
2024-09-06 17:26 UTC, Gerald Cox
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github aous72 OpenJPH issues 152 0 None closed Installation causes KDE Plasma 6 to crash 2024-11-17 17:00:47 UTC
KDE Software Compilation 492220 0 NOR RESOLVED Plasma Coredump after installation of openjph/libopenjph 2024-11-17 17:00:51 UTC

Description Gerald Cox 2024-08-25 16:52:44 UTC
After update SDDM has black screen.  I then switch to GDM and then Plasma fails with coredump.  I tried to backout, but apparently the dependencies are such that it's practically impossible, so in the meantime I'm using GNOME.  I've attached the coredump.  

Reproducible: Always

Comment 1 Gerald Cox 2024-08-25 16:54:35 UTC
Created attachment 2044828 [details]
plasma coredump

Happens immediately upon login.

Comment 2 Gerald Cox 2024-08-25 17:15:36 UTC
Created attachment 2044829 [details]
journalctl output for plasmashell

Comment 3 Gerald Cox 2024-08-25 17:31:35 UTC
Created attachment 2044831 [details]
Kcrash report

Comment 4 Gerald Cox 2024-08-25 21:16:51 UTC
Changing priority and title.  Found that installation of openjph/libopenjph causes kde plasma to crash.

[KCrash Handler]
#4  0x00007f4e640bd295 in _sub_I_65535_0.0 () at /lib64/libopenjph.so.0.15
#5  0x00007f4ea9d76437 in call_init (l=<optimized out>, argc=2, argv=0x7ffddc116e38, env=0x5561b168f430) at dl-init.c:74
#6  call_init (l=<optimized out>, argc=2, argv=0x7ffddc116e38, env=0x5561b168f430) at dl-init.c:26
#7  0x00007f4ea9d7652d in _dl_init (main_map=0x7f4e240380d0, argc=2, argv=0x7ffddc116e38, env=0x5561b168f430) at dl-init.c:121
#8  0x00007f4ea9d725c2 in __GI__dl_catch_exception (exception=exception@entry=0x0, operate=operate@entry=0x7f4ea9d7d560 <call_dl_init>, args=args@entry=0x7f4e655feed0) at dl-catch.c:211
#9  0x00007f4ea9d7d4fc in dl_open_worker (a=a@entry=0x7f4e655ff080) at dl-open.c:829
#10 0x00007f4ea9d72523 in __GI__dl_catch_exception (exception=exception@entry=0x7f4e655ff060, operate=operate@entry=0x7f4ea9d7d460 <dl_open_worker>, args=args@entry=0x7f4e655ff080) at dl-catch.c:237
#11 0x00007f4ea9d7d904 in _dl_open (file=0x7f4e2400e400 "/usr/lib64/libheif/libheif-jphenc.so", mode=<optimized out>, caller_dlopen=0x7f4e721c17bd <PluginLibrary_Unix::load_from_file(char const*)+29>, nsid=<optimized out>, argc=2, argv=0x7ffddc116e38, env=0x5561b168f430) at dl-open.c:905

Reporting upstream.

Comment 5 Aous Naman 2024-08-27 01:35:35 UTC
Hi Gerald,

If Antonio's comment (KDE) is correct about the code being compiled with (-march=native), then you should not.

Native might generate some code that is not supported in another machine.  For example, if the machine on which you are compiling OpnJPH supports AVX512, then the library might use these instructions.  However, when you later run the code on a machine that does not support it, illegal exception will occur.

Also what does "RESOLVED DOWNSTREAM" mean the problem has been resolved?

Kind regards,
Aous.

Comment 6 Gerald Cox 2024-08-27 15:50:57 UTC
(In reply to Aous Naman from comment #5)
> Hi Gerald,
> 
> If Antonio's comment (KDE) is correct about the code being compiled with
> (-march=native), then you should not.
> 
> Native might generate some code that is not supported in another machine. 
> For example, if the machine on which you are compiling OpnJPH supports
> AVX512, then the library might use these instructions.  However, when you
> later run the code on a machine that does not support it, illegal exception
> will occur.
> 
> Also what does "RESOLVED DOWNSTREAM" mean the problem has been resolved?
> 
> Kind regards,
> Aous.

Hi Aous,
Thanks so much for your quick response.  Antonio's comment is correct. I checked
the SPEC file and -DOJPH_DISABLE_AVX512 was OFF.  I changed it to ON, recompiled
and it works fine.

RESOLVED DOWNSTREAM means the solution isn't with KDE.  It would also be the solution
in the github ticket.  I'll close it out.

Thanks again for your help.  I'll let the packager know the options are causing issues
with some machines.

Comment 7 Gerald Cox 2024-08-27 15:57:24 UTC
Correction both DISABLE_SIMD and DISABLE_AVX512 are ON for it to work for me.

Comment 8 Simone Caronni 2024-08-29 17:44:45 UTC
Will have a look and recompile OpenJPH if needed.

Comment 9 Gerald Cox 2024-08-30 15:51:21 UTC
I looked at the code and  the SIMD=OFF does turn off all the SIMD extensions.  But if it is left on, then all are enabled.  In my case, I checked proc/cpuinfo and found that my processor does not support AVX2 or NEON - at least they didn't show up in cpu.  So I modified the spec file as follows and that fixed the issue for me:

%cmake \
 -DOJPH_DISABLE_SIMD=OFF \
 -DOJPH_DISABLE_AVX2=ON \
 -DOJPH_DISABLE_NEON=ON \                                                                               
 -DOJPH_DISABLE_AVX512=ON

Comment 10 Gerald Cox 2024-08-30 16:18:54 UTC
After checking the supported architectures, looks like 
probably should conform to V1, baseline:
https://fedoraproject.org/wiki/Architectures#Primary_Architectures
which means excluding AVX, SSE4, SSSE3, etc.

Comment 11 Aous Naman 2024-08-31 00:43:55 UTC
Hi Gerald,

Thank you for keeping an eye on this.

I am not sure what is going on.

The library has multiple paths for different SIMD instructions (SSE, AVX, ... etc.).
The code in src/core/common/ojph_arch.h and src/core/others/ojph_arch.cpp detects at run time what kind of SIMD is supported by the CPU, and selects the fastest path.
-DOJPH_DISABLE_SIMD disables all fast paths
The others disable or enable specific paths.
If the library crashes on loading, then it is possible that the code in these two files are causing the crash.
If the library crashes on usage, then it is following a path that is not supported by the CPU -- this is less likely.

Perhaps, make a debug build of the library, and identify which line in the source code is crashing, and what the illegal instruction is.

It can be good for me to look at what is going on.

Kind regards,
Aous.

Comment 12 Gerald Cox 2024-08-31 01:57:52 UTC
Hi Aous,

It's not a issue with your code.  It's a packaging issue and the Fedora maintainer is aware and will resolve.
We just need to set the options such that they agree with the Fedora Primary Architecture standards.  See comment #10.
If you're interested in knowing more, check out:

https://fedoraproject.org/wiki/Architectures#Primary_Architectures

Thanks so much for your help with this, but there isn't any action that you need to take.

Comment 13 Gerald Cox 2024-08-31 02:06:32 UTC
Hi Aous,

Just a thought, you might consider changing the defaults to match the V1 architecture standards by default, i.e.

-DOJPH_DISABLE_SIMD=OFF \
-DOJPH_DISABLE_AVX2=ON \
-DOJPH_DISABLE_NEON=ON \
-DOJPH_DISABLE_AVX=ON \
-DOJPH_DISABLE_SSE4=ON \
-DOJPH_DISABLE_SSSE3=ON \
-DOJPH_DISABLE_AVX512=ON

That way the vast majority of processors will be supported without any changes.  If people want to 
enable advanced features for the cpu, they can.

That would avoid having unsupported features activated.

Comment 14 Simone Caronni 2024-09-03 15:19:26 UTC
@aous if there are no explicit OJPH_DISABLE_XXX lines in the CMake configuration, does this leave *any* hardware extension enabled by default?

Comment 15 Simone Caronni 2024-09-03 15:21:03 UTC
@gbcox do you know if aarch64 supports NEON instructions by default?

Comment 16 Aous Naman 2024-09-03 23:34:19 UTC
(In reply to Simone Caronni from comment #14)
> @aous if there are no explicit OJPH_DISABLE_XXX lines in the CMake
> configuration, does this leave *any* hardware extension enabled by default?

OJPH_DISABLE_SIMD=ON 
should disable all extensions, and make the code C++ only.

The complete list is
OJPH_DISABLE_SIMD
OJPH_DISABLE_SSE
OJPH_DISABLE_SSE2
OJPH_DISABLE_SSSE3 
OJPH_DISABLE_SSE4
OJPH_DISABLE_AVX
OJPH_DISABLE_AVX2
OJPH_DISABLE_AVX512
OJPH_DISABLE_NEON

I think all existing hardware support SSE/SSE2 and possibly SSSE3.  
No code has been written to support NEON instructions (it is just a place holder for future work); also the OJPH_DISABLE_NEON macro is not processed unless you are building on ARM -- I may change OJPH_DISABLE_NEON to something else.

It is also worth noting that the code, at runtime, detect CPU features to decide what code path to use.  In other words, you do not need to disable AVX512 if your machine does not support it, because the code will detect that it is not supported and will not use AVX512 code.  In fact, the machine I work on does not support AVX512 and I do NOT disable AVX512.

I think it is important to find out where the problem is happening in the code; the line number and the illegal instruction would be great.
My suspension is that it is happening during hardware feature detection -- Is the problem happening on x64 or aarch64?

I am happy to help, but there is little I can work with now.

Kind regards,
Aous.

Comment 17 Simone Caronni 2024-09-04 07:55:59 UTC
Hi Aous, thanks for the detailed explanation.

I've removed all lines that disables the extensions, leaving autodetection in place.

@Gerald we'll wait for your debugging.

Thanks!

Comment 18 Gerald Cox 2024-09-04 15:54:12 UTC
(In reply to Simone Caronni from comment #15)
> @gbcox do you know if aarch64 supports NEON instructions by default?

I think so, this is from GEMINI:
Yes, AArch64 supports NEON instructions by default. NEON is a fundamental part of the AArch64 architecture, providing advanced SIMD (Single Instruction, Multiple Data) capabilities for efficient processing of data in parallel.

NEON instructions are designed to accelerate various tasks, including:

    Multimedia processing: Image and video manipulation, audio processing
    Scientific computing: Matrix operations, signal processing
    Cryptography: Encryption and decryption algorithms

To leverage NEON instructions in your code, you can use:

    Compiler intrinsics: These provide a C-like interface to NEON instructions, making it easier to write portable code.
    Assembly language: For more direct control over the generated code, you can write NEON instructions directly in assembly.

Note: While NEON is a standard part of AArch64, specific implementations may vary slightly in terms of supported features or performance. However, the core NEON instruction set is universally available across AArch64 processors.

Comment 19 Gerald Cox 2024-09-04 16:13:50 UTC
(In reply to Simone Caronni from comment #17)
> Hi Aous, thanks for the detailed explanation.
> 
> I've removed all lines that disables the extensions, leaving autodetection
> in place.
> 
> @Gerald we'll wait for your debugging.
> 
> Thanks!

Hey Simone,
Didn't work.  I modified the SPEC file as follows:

%cmake \
 -DOJPH_DISABLE_SIMD=OFF \
 -DOJPH_DISABLE_AVX2=ON \
 -DOJPH_DISABLE_AVX=ON \
 -DOJPH_DISABLE_SSE4=ON \
 -DOJPH_DISABLE_SSSE3=ON \
 -DOJPH_DISABLE_AVX512=ON

I keep SIMD=OFF because I didn't want to disable all extensions, I wanted to keep
those in specified in: https://fedoraproject.org/wiki/Architectures#Primary_Architectures
(i.e. SSE, SSE2)

The ones that are not in the primary architectures document, I set to "=ON"
to make sure they are excluded.

I went through the cmaketxt and didn't see anything that would 
automagically disable anything. Looks to me like all options are
enabled, which is what is causing the issue.  

I believe we want the options in specified in V1 of the architecture document, 
which is why I did it this way.  Here is the code from the cmakelists.txt and
you can see that everything is enabled by default.

option(OJPH_DISABLE_SIMD "Disables the use of SIMD instructions -- agnostic to architectures" OFF)
option(OJPH_DISABLE_SSE "Disables the use of SSE SIMD instructions and associated files" OFF)
option(OJPH_DISABLE_SSE2 "Disables the use of SSE2 SIMD instructions and associated files" OFF)
option(OJPH_DISABLE_SSSE3 "Disables the use of SSSE3 SIMD instructions and associated files" OFF)
option(OJPH_DISABLE_SSE4 "Disables the use of SSE4 SIMD instructions and associated files" OFF)
option(OJPH_DISABLE_AVX "Disables the use of AVX SIMD instructions and associated files" OFF)
option(OJPH_DISABLE_AVX2 "Disables the use of AVX2 SIMD instructions and associated files" OFF)
option(OJPH_DISABLE_AVX512 "Disables the use of AVX512 SIMD instructions and associated files" OFF)
option(OJPH_DISABLE_NEON "Disables the use of NEON SIMD instructions and associated files" OFF)

Comment 20 Gerald Cox 2024-09-04 16:18:55 UTC
(In reply to Aous Naman from comment #16)
> (In reply to Simone Caronni from comment #14)
> > @aous if there are no explicit OJPH_DISABLE_XXX lines in the CMake
> > configuration, does this leave *any* hardware extension enabled by default?
> 
> OJPH_DISABLE_SIMD=ON 
> should disable all extensions, and make the code C++ only.
> 
> The complete list is
> OJPH_DISABLE_SIMD
> OJPH_DISABLE_SSE
> OJPH_DISABLE_SSE2
> OJPH_DISABLE_SSSE3 
> OJPH_DISABLE_SSE4
> OJPH_DISABLE_AVX
> OJPH_DISABLE_AVX2
> OJPH_DISABLE_AVX512
> OJPH_DISABLE_NEON
> 
> I think all existing hardware support SSE/SSE2 and possibly SSSE3.  

I just turned off support for anything that wasn't in the Fedora Architecture guideline.

 
> It is also worth noting that the code, at runtime, detect CPU features to
> decide what code path to use.  In other words, you do not need to disable
> AVX512 if your machine does not support it, because the code will detect
> that it is not supported and will not use AVX512 code.  In fact, the machine
> I work on does not support AVX512 and I do NOT disable AVX512.

Yeah, that isn't working.  

> 
> I think it is important to find out where the problem is happening in the
> code; the line number and the illegal instruction would be great.
> My suspension is that it is happening during hardware feature detection --
> Is the problem happening on x64 or aarch64?

I haven't tested on aarch64.  I can only confirm it is happening on x64.

Would turning on TESTS assist in finding the line number or is that for
something else.

> 
> I am happy to help, but there is little I can work with now.

Understood.  Thanks for your assistance!  Much appreciated.

Comment 21 Aous Naman 2024-09-05 01:50:33 UTC
@gbcox 
Hi Gerald,

Thank you for your feedback.

I think I found the culprit; it is in the CPU feature detection code.
In /src/core/others/ojph_arch.cpp, I use the xgetbv instruction, which is a v3 instruction.
This code always run, unless OJPH_DISABLE_SIMD=ON.
Question, can I use xgetbv after the library is loaded, not during the load?

I need a day or two to think about the best fix.
I will also install Fedora to get a better understanding.

@negativo17, @gbcox 
Does this mean that Fedora does not support AVX/AVX2 instructions? 
Perhaps some guidance here is useful to get the best performance possible.

Kind regards,
Aous

Comment 22 Gerald Cox 2024-09-05 02:58:13 UTC
(In reply to Aous Naman from comment #21)
> @gbcox 
> Hi Gerald,
> 
> Thank you for your feedback.
> 
> I think I found the culprit; it is in the CPU feature detection code.
> In /src/core/others/ojph_arch.cpp, I use the xgetbv instruction, which is a
> v3 instruction.
> This code always run, unless OJPH_DISABLE_SIMD=ON.
> Question, can I use xgetbv after the library is loaded, not during the load?

I found this on xgetbv: https://stackoverflow.com/questions/72522885/are-the-xgetbv-and-cpuid-checks-sufficient-to-guarantee-avx2-support

> 
> I need a day or two to think about the best fix.
> I will also install Fedora to get a better understanding.
Remember, for me this problem does not show up in the GNOME/GDM environment.  It only fails in KDE PLASMA/SDDM environment.  

> 
> @negativo17, @gbcox 
> Does this mean that Fedora does not support AVX/AVX2 instructions? 
> Perhaps some guidance here is useful to get the best performance possible.

For x86_64, v1 baseline is the primary architecture for Fedora.  From the guideline:
These are architectures with the majority of the users, the most common architectures. Build failures on these architectures are fatal: no packages push to the repositories if they fail to build for a primary architecture. Fedora package maintainers are required to make sure that their package builds properly for this architecture (or is properly ExcludeArch'd).  https://fedoraproject.org/wiki/Architectures#Primary_Architectures

To directly answer your question, AVX/AVX2 are NOT in the V1 Baseline.  SSE3, SSSE3, SSE4_1 and SSE4_2 are also NOT in the V1 Baseline. 
Here is the link: https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels

Here is what is compatible with the V1 baseline:
 -DOJPH_DISABLE_SIMD=OFF \
 -DOJPH_DISABLE_AVX2=ON \
 -DOJPH_DISABLE_AVX=ON \
 -DOJPH_DISABLE_SSE4=ON \
 -DOJPH_DISABLE_SSSE3=ON \
 -DOJPH_DISABLE_AVX512=ON

Comment 23 Aous Naman 2024-09-05 09:49:36 UTC
Hi Gerald,

Thank you for your reply, and for the link
https://stackoverflow.com/questions/72522885/are-the-xgetbv-and-cpuid-checks-sufficient-to-guarantee-avx2-support

I installed Fedora, and tested OpenJPH on its own, and everything was fine -- no fiddling with instruction set.
I switched to Fedora KDE Plasma, and tested OpenJPH on its own, and everything was fine.
On Fedora KDE Plasma, I also compiled libheif and tested heif-enc and that was fine as well.

Thinking more, here are some questions.

1) At the top of crash report, you see libheif-jphenc.so, which tries to load libopenjph.so.0.15, and then crash.
Is it possible that libheif-jphenc is binary-incompatible with libopenjph.so.0.15, expecting different ABI; i.e, libheif was compiled for a different version of libopenjph?

2) Now I think that xgetbv might not be the problem.  In OpenJPH, I test for the availability of the xgetbv instruction before using it.  So if the OS is setup correctly, it should show that the instruction is not available.

// According to Intel documentation, if osxsave is available, then xgetbv should be available.
// osxsave is bit 27 (0 base).
osxsave_avail = ((mmx_abcd[2] & 0x08000000) == 0x08000000);
if (osxsave_avail)
{
  xcr_val = read_xcr(0); // <-- calling xgetbv
  ...
}

Perhaps, you can manually modify the code in /src/core/others/ojph_arch.cpp by changing

  ////////////////////////////////////////////////////////////////////////////
  uint64_t read_xcr(uint32_t index)
  {
  #ifdef OJPH_COMPILER_MSVC
    return _xgetbv(index);
  #else
    uint32_t eax = 0, edx = 0;
    __asm__ ( "xgetbv" : "=a" (eax), "=d" (edx) : "c" (index) );
    return ((uint64_t)edx << 32) | eax;
  #endif
  }
  
to

  ////////////////////////////////////////////////////////////////////////////
  uint64_t read_xcr(uint32_t index)
  {
    return 0;
  }  

and see if this solves the problem.

Thank you, Gerald.

Kind regards,
Aous.

Comment 24 Gerald Cox 2024-09-05 16:59:45 UTC
(In reply to Aous Naman from comment #23)
 
> I installed Fedora, and tested OpenJPH on its own, and everything was fine
> -- no fiddling with instruction set.
> I switched to Fedora KDE Plasma, and tested OpenJPH on its own, and
> everything was fine.
> On Fedora KDE Plasma, I also compiled libheif and tested heif-enc and that
> was fine as well.
Yeah, that might be because of our processor differences.  Mine is:

CPU: AMD FX(tm)-8350 (8) @ 3.54 GHz
GPU: AMD Radeon HD 7850 / R7 265 / R9 270 1024SP [Discrete]

> 
> Thinking more, here are some questions.
> 
> 1) At the top of crash report, you see libheif-jphenc.so, which tries to
> load libopenjph.so.0.15, and then crash.
> Is it possible that libheif-jphenc is binary-incompatible with
> libopenjph.so.0.15, expecting different ABI; i.e, libheif was compiled for a
> different version of libopenjph?

My libheif is from negativo, so that would be a question for Simone.
I wouldn't believe that should be the case however since when I manually 
disable options:


%cmake \
 -DOJPH_DISABLE_SIMD=OFF \
 -DOJPH_DISABLE_AVX2=ON \
 -DOJPH_DISABLE_AVX=ON \
 -DOJPH_DISABLE_SSE4=ON \
 -DOJPH_DISABLE_SSSE3=ON \
 -DOJPH_DISABLE_AVX512=ON

everything works fine.


> 
> 2) Now I think that xgetbv might not be the problem.  In OpenJPH, I test for
> the availability of the xgetbv instruction before using it.  So if the OS is
> setup correctly, it should show that the instruction is not available.
> 
> // According to Intel documentation, if osxsave is available, then xgetbv
> should be available.
> // osxsave is bit 27 (0 base).
> osxsave_avail = ((mmx_abcd[2] & 0x08000000) == 0x08000000);
> if (osxsave_avail)
> {
>   xcr_val = read_xcr(0); // <-- calling xgetbv
>   ...
> }
> 
> Perhaps, you can manually modify the code in /src/core/others/ojph_arch.cpp
> by changing
> 
>  
> ////////////////////////////////////////////////////////////////////////////
>   uint64_t read_xcr(uint32_t index)
>   {
>   #ifdef OJPH_COMPILER_MSVC
>     return _xgetbv(index);
>   #else
>     uint32_t eax = 0, edx = 0;
>     __asm__ ( "xgetbv" : "=a" (eax), "=d" (edx) : "c" (index) );
>     return ((uint64_t)edx << 32) | eax;
>   #endif
>   }
>   
> to
> 
>  
> ////////////////////////////////////////////////////////////////////////////
>   uint64_t read_xcr(uint32_t index)
>   {
>     return 0;
>   }  
> 
> and see if this solves the problem.
 
I created a patch to do the above and it still crashes with what I believe to
be the same result.  I've attached the kcrash out for the crash with the patch,
and also have include the patch to make sure I did what you requested.

If you can think of anything else to try, please LMK and I'll try to assist.

I do believe if we simply:

%cmake \
 -DOJPH_DISABLE_SIMD=OFF \
 -DOJPH_DISABLE_AVX2=ON \
 -DOJPH_DISABLE_AVX=ON \
 -DOJPH_DISABLE_SSE4=ON \
 -DOJPH_DISABLE_SSSE3=ON \
 -DOJPH_DISABLE_AVX512=ON

that will satisfactorily circumvent the issue.

Comment 25 Gerald Cox 2024-09-05 17:01:17 UTC
Created attachment 2045528 [details]
Crash file after patch applied

Comment 26 Gerald Cox 2024-09-05 17:02:28 UTC
Created attachment 2045529 [details]
patch as requested

Comment 27 Aous Naman 2024-09-06 04:51:07 UTC
Hi Gerald,

Thank you for testing the patch; it look all good to me.

If 
%cmake \
 -DOJPH_DISABLE_SIMD=OFF \
 -DOJPH_DISABLE_AVX2=ON \
 -DOJPH_DISABLE_AVX=ON \
 -DOJPH_DISABLE_SSE4=ON \
 -DOJPH_DISABLE_SSSE3=ON \
 -DOJPH_DISABLE_AVX512=ON
works for you, then that is fine with me.

I still do not understand why this is the case; this points to a bug in my configuration.

Thank you again.

Kind regards,
Aous.

Comment 28 Simone Caronni 2024-09-06 11:12:04 UTC
Hi everyone, thanks for debugging this.

A bit of context:

> Thinking more, here are some questions.
> 1) At the top of crash report, you see libheif-jphenc.so, which tries to load libopenjph.so.0.15, and then crash.
> Is it possible that libheif-jphenc is binary-incompatible with libopenjph.so.0.15, expecting different ABI; i.e, libheif was compiled for a different version of libopenjph?

I've pushed OpenJPH as an update to Fedora, but in the meanwhile, I'm hosting it also in my repository and linking libheif to it until it reaches stable updates in Fedora. libheif in Fedora is not yet linked to OpenJPH.

@Gerald let me check my libheif package maybe there is some extra commit or fix which is not yet in a libheif release to address this.

Comment 29 Gerald Cox 2024-09-06 17:25:13 UTC
(In reply to Simone Caronni from comment #28)
> 
> I've pushed OpenJPH as an update to Fedora, but in the meanwhile, I'm
> hosting it also in my repository and linking libheif to it until it reaches
> stable updates in Fedora. libheif in Fedora is not yet linked to OpenJPH.
Simone, did you forget to push?  I still see the old versions on both negativo
and fedora.  Just an FYI...

> 
> @Gerald let me check my libheif package maybe there is some extra commit or
> fix which is not yet in a libheif release to address this.

I tried using the source copies from negativo (both openjph and libheif) and modified
the source of openjph to do the automatic selection, i.e.

%build
%cmake
%cmake_build

and then recompiled using the new devel from openjph and got the same result.  So
I'm thinking going down the libheif path might be a red herring.

I've attached the dump from this experiment.

I've changed back to:
%cmake \
 -DOJPH_DISABLE_SIMD=OFF \
 -DOJPH_DISABLE_AVX2=ON \
 -DOJPH_DISABLE_AVX=ON \
 -DOJPH_DISABLE_SSE4=ON \
 -DOJPH_DISABLE_SSSE3=ON \
 -DOJPH_DISABLE_AVX512=ON

and it is working fine.

Aous, no worries we have a good circumvention.

Comment 30 Gerald Cox 2024-09-06 17:26:49 UTC
Created attachment 2045627 [details]
kcrash for libheif recompile with openjph

Comment 31 Nerijus Baliūnas 2024-09-09 15:13:51 UTC
I have 2 old PCs - one with Intel Core i5-3570K CPU, flags fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts vnmi md_clear flush_l1d. Telegram starts ok here.

Another one with Intel Core i7-4720HQ CPU, flags fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm const ant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb ssbd ibrs ibpb stibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts vnmi md_clear flush_l1d. Telegrams crashes with Illegal instruction, see bug 
2307698.

Comment 32 Simone Caronni 2024-09-10 13:49:14 UTC
@nerijus.net nothing is using OpenJPH in Fedora yet, the only thing using it is libheif compiled from my repository, and then some other package having a dependency on libheif.

Are you by chance using my repository as you mentioned in the Bodhi update? If yes, can you try to remove the libheif plugin for OpenJPH or switch to Fedora's libheif and the latest OpenJPH build in updates-testing?

I fear the bug is in libheif, not here in OpenJPH. Thanks.

Comment 33 Simone Caronni 2024-09-10 13:49:15 UTC
@nerijus.net nothing is using OpenJPH in Fedora yet, the only thing using it is libheif compiled from my repository, and then some other package having a dependency on libheif.

Are you by chance using my repository as you mentioned in the Bodhi update? If yes, can you try to remove the libheif plugin for OpenJPH or switch to Fedora's libheif and the latest OpenJPH build in updates-testing?

I fear the bug is in libheif, not here in OpenJPH. Thanks.

Comment 34 Gerald Cox 2024-09-10 16:28:39 UTC
@Simone, @Aous:  I've tested today with the updated libheif from negativo (1.1.18-2) and openjph (0.15.0-6)
and I'm happy to report the auto detection appears to be working now.  No more plasma crashes at login.

@nerijus I'm also using Telegram (albeit the version from Telegram, not from rpmfusion) and it is working
fine.

I'm don't know what happened with my testing I reported in comment #29.  I thought I had built with the
appropriate version of openjph - apparently not.  

So appears the problem is now fixed.  I'm ok because I use the negativo version of libheif.  Does the Fedora 
repo version of libheif also need to be recompiled?

Comment 35 Nerijus Baliūnas 2024-09-10 17:48:04 UTC
You are right - after downgrading from libheif-1:1.18.2-2.fc40.x86_64 from fedora-multimedia to libheif-1.17.6-1.fc40.x86_64 from updates telegram starts OK.

Comment 36 Simone Caronni 2024-09-11 06:48:30 UTC
@nerijus.net The issue is as follows, I usually add stuff to my repo first and then put it for inclusion in Fedora after sorting out all the details:

- I built the OpenJPH package with some optimizations in my repo
- Built libheif linked to it in my repo
- Pushed an update in Fedora with OpenJPH with autodetection of extensions enabled (with a newer release tag than my repo, to update it)

Then people started using my libheif with the OpenJPH with different compile options from Fedora and this is where the issue started, on anything that is using libheif would pick the not-yet-recompiled libheif from my repository.

Now I've put the same version in my repository and rebuilt as well libheif, and you can see the issue is gone. I'll push the updates as soon as I can since it's solved on my side and it's not a bug on the Fedora side.

Thanks everyone!

Comment 37 Peter Robinson 2024-09-25 21:13:16 UTC
> To directly answer your question, AVX/AVX2 are NOT in the V1 Baseline. 
> SSE3, SSSE3, SSE4_1 and SSE4_2 are also NOT in the V1 Baseline. 
> Here is the link:
> https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels

In Fedora in means you can't depend on the AVX extensions, hence they have to be run time detected for optimisation and be able to fall back to other instructions if they don't.

Comment 38 Simone Caronni 2024-11-16 22:10:36 UTC
@aous72, OpenJPH 0.18.0 does indeed fix the crash!

Comment 39 Aous Naman 2024-11-16 23:05:16 UTC
Thank you for this confirmation.

If you are interested in what the problem was, have a look at
https://github.com/aous72/OpenJPH/issues/157#issuecomment-2466540994

Kind regards,
Aous.

Comment 40 Simone Caronni 2024-11-17 19:14:41 UTC
Hi @aous72 , sorry but we have another one. I counted on victory too soon:

https://bodhi.fedoraproject.org/updates/FEDORA-2024-8b82acf2f8#comment-3816667

Again, libheif compiled with openjph makes programs crash. Btw, here we are on Gcc, not Clang. Thanks.

Comment 41 Aous Naman 2024-11-18 00:51:01 UTC
Happy to help, but it is not clear why this happens, or what instruction is causing the problem -- the openjph library was compiled with no symbols.
I will try to replicate the setup on my side this week, and see.

Cheers,
Aous.

Comment 42 Dominik 'Rathann' Mierzejewski 2024-11-18 10:36:10 UTC
I was told this was occurring on a qemu VM configured with Nehalem CPU (-machine q35,smm=on -cpu Nehalem) running Fedora 42/rawhide and it was crashing with SIGILL.

Comment 43 Aous Naman 2024-11-24 10:16:17 UTC
Hi guys,

I have run the following test -- writing from memory.
Compiler: GCC 11.4.0
openjph 0.18.0; I compiled it and installed it to the machine.
libheif: v1.19.5, compiled and installed, after openjph.
For libheif, I only added support for openjph, and png.
I used heif-enc to encode an image using openjph.
Using sde64, I tested with Nehalem and a few older CPUs: Penryn, Merom, and Pentium 4 Prescott.
sde (sde64) from intel https://www.intel.com/content/www/us/en/developer/articles/tool/software-development-emulator.html is used to emulate these CPUs.

There was no problem.

Perhaps, it is worth saying that although versions 0.17 and 0.18 start with 0 (their major version number is 0), I think they are not ABI-compatible.

Please let me know if there are specific tests you want me to run.

Kind regards,
Aous.

PS: I searched for why SIGILL happens, and it is either privileged or ma-formed instruction.

Comment 44 Sergio Basto 2024-11-24 15:37:04 UTC
libheif-1.19.5-1.fc42 was pushed to rawhide , can you check it ?

Comment 45 Aous Naman 2024-11-24 22:47:03 UTC
Thank you Sergio. Happy to.

I cannot find it; I can only find libheif-1.19.3-3.fc42.

Cheers,
Aous.

Comment 46 Sergio Basto 2024-11-25 00:18:21 UTC
https://bodhi.fedoraproject.org/updates/FEDORA-2024-717d9b9432 , should be available today/tommorrow on rawhide compose of 20241125

Comment 47 Aous Naman 2024-11-25 04:33:53 UTC
Hi Sergio,

Sorry for the long email.

I installed Fedora-Workstation-Live-x86_64-Rawhide-20241124.n.0.iso in qemu with this configuration
qemu-system-x86_64 \
        -m 4G -boot d \
        -machine q35,smm=on \
        -cpu Nehalem \
        -enable-kvm \
        -smp 2 \
        -net nic \
        -net user \
        -hda ./hdd.img \
#       -cdrom ${HOME}/Downloads/Fedora-Workstation-Live-x86_64-Rawhide-20241124.n.0.iso

---> libheif-1.19.3-3.fc42 does not have the openjph plugin complied/installed.

The distribution comes with openjph-0.18.0-1.fc42.x86_64.
---> The included version crashes when the command ojph_compress or ojph_expand is invoked.  Running valgrind produces

==3964== Memcheck, a memory error detector
==3964== Copyright (C) 2002-2024, and GNU GPL'd, by Julian Seward et al.
==3964== Using Valgrind-3.24.0 and LibVEX; rerun with -h for copyright info
==3964== Command: ojph_compress
==3964==
vex amd64->IR: unhandled instruction bytes: 0xC5 0xF9 0x6F 0x5 0x7B 0xE0 0x4 0x0 0x66 0x89
vex amd64->IR:   REX=0 REX.W=0 REX.R=0 REX.X=0 REX.B=0
vex amd64->IR:   VEX=0 VEX.L=0 VEX.nVVVV=0x0 ESC=NONE
vex amd64->IR:   PFX.66=0 PFX.F2=0 PFX.F3=0
==3964== valgrind: Unrecognised instruction at address 0x487478d.
==3964==    at 0x487478D: _sub_I_65535_0.0 (ojph_block_encoder.cpp:192)
==3964==    by 0x4005356: call_init (dl-init.c:74)
==3964==    by 0x4005356: call_init (dl-init.c:26)
==3964==    by 0x400542C: _dl_init (dl-init.c:121)
==3964==    by 0x401D5DF: ??? (in /usr/lib64/ld-linux-x86-64.so.2)
==3964== Your program just tried to execute an instruction that Valgrind
==3964== did not recognise.  There are two possible reasons for this.
==3964== 1. Your program has a bug and erroneously jumped to a non-code
==3964==    location.  If you are running Memcheck and you just saw a
==3964==    warning about a bad jump, it's probably your program's fault.
==3964== 2. The instruction is legitimate but Valgrind doesn't handle it,
==3964==    i.e. it's Valgrind's fault.  If you think this is the case or
==3964==    you are not sure, please let us know and we'll try to fix it.
==3964== Either way, Valgrind will now raise a SIGILL signal which will
==3964== probably kill your program.
==3964==
==3964== Process terminating with default action of signal 4 (SIGILL): dumping core
==3964==  Illegal opcode at address 0x487478D
==3964==    at 0x487478D: _sub_I_65535_0.0 (ojph_block_encoder.cpp:192)
==3964==    by 0x4005356: call_init (dl-init.c:74)
==3964==    by 0x4005356: call_init (dl-init.c:26)
==3964==    by 0x400542C: _dl_init (dl-init.c:121)
==3964==    by 0x401D5DF: ??? (in /usr/lib64/ld-linux-x86-64.so.2)
==3964==
==3964== HEAP SUMMARY:
==3964==     in use at exit: 73,728 bytes in 1 blocks
==3964==   total heap usage: 1 allocs, 0 frees, 73,728 bytes allocated
==3964==
==3964== LEAK SUMMARY:
==3964==    definitely lost: 0 bytes in 0 blocks
==3964==    indirectly lost: 0 bytes in 0 blocks
==3964==      possibly lost: 0 bytes in 0 blocks
==3964==    still reachable: 73,728 bytes in 1 blocks
==3964==         suppressed: 0 bytes in 0 blocks
==3964== Rerun with --leak-check=full to see details of leaked memory
==3964==
==3964== For lists of detected and suppressed errors, rerun with: -s
==3964== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Illegal instruction (core dumped)

So, the distribution has openjph that does not work correctly; then when libheif is built on this version, it does not work as well.

When I download the source of openjph-0.18.0-1.fc42.src.rpm, and compile it myself on the machine.
The commands ojph_compress and ojph_expand work correctly.

Personally, I suspect this is a compiler issue -- I always feel uncomfortable saying this, because I think it is more likely my problem.
On Ventura, and more recent, macOS running on Intel I had a problem with ojph_block_decoder.cpp, and had to break it into ojph_block_decoder32.cpp and ojph_block_decoder64.cpp.  On older macOS, there was no problem.  Note the macOS version decides which Xcode you use.
I suspect if I break ojph_block_encoder.cpp into ojph_block_encoder32.cpp and ojph_block_encoder64.cpp, the problem will go away.
I will do this in the next few days to produce 0.18.1.

Please let me know what you think.

Cheers,
Aous.

Comment 48 Sergio Basto 2024-11-25 14:10:58 UTC
Fedora-Workstation-Live-x86_64-Rawhide-20241125.n.0.iso have this change: [1] 
I'd like that you test it with libheif-1.19.5 please 

Thank you 

[1]
Package:      libheif-1.19.5-1.fc42
Old package:  libheif-1.19.3-3.fc42
Summary:      HEIF and AVIF file format decoder and encoder
RPMs:         heif-pixbuf-loader libheif libheif-devel libheif-tools
Size:         3.22 MiB
Size change:  44.57 KiB
Changelog:
  * Sun Nov 24 2024 Packit <hello> - 1.19.5-1
  - Update to version 1.19.5
  - Resolves: rhbz#2327307

Comment 49 Aous Naman 2024-11-26 01:00:23 UTC
Hi Sergio,

The library libheif-1.19.5 has no problems, but it does NOT have openjph.

I installed Fedora-Workstation-Live-x86_64-Rawhide-20241125.n.0.iso.
I ran:
> sudo dnf install libheif-tools.
Then run:
> heif-enc --htj2k lena.png -o lena.heif
I got:
> No HT-J2K encoder available.
No crashes.

Cheers,
Aous.

PS: qemu with -machine q35,smm=on -cpu Nehalem

Comment 50 Dominik 'Rathann' Mierzejewski 2024-11-26 12:40:35 UTC
Hi, Aous!

(In reply to Aous Naman from comment #49)
> The library libheif-1.19.5 has no problems, but it does NOT have openjph.

Correct, it's disabled in the official repos because of this very bug.
For testing with openjph, the package needs to be rebuilt with the bcond
at the top of the spec file set to 1.

I've done that here: https://src.fedoraproject.org/rpms/libheif/pull-request/11

Try installing the scratch build from "Fedora CI - scratch build" link there once it completes.

Thanks a lot for trying to help us figure this out. It's much appreciated!

Comment 51 Aous Naman 2024-11-26 20:46:39 UTC
Hi Dominik,

Thank you for putting this in.

I tested, and "heif-enc" crashed, with an illegal instruction.

I tracked the problem.  It is in the binary of libopenjph-0.18.0-1.fc2 -- libheif uses libopenjph internally.
I extracted openjph-0.18.0-1.fc42.src.rpm, compiled it on the same machine using cmake and gcc/g++, installed libopenjph to /usr/lib64, and 
> heif-enc --htj2k lena.png -o lena.heif
worked correctly.

Kind regards,
Aous.

Comment 52 Dominik 'Rathann' Mierzejewski 2024-11-26 22:21:41 UTC
Thanks for testing, Aous.
Have you used the same cmake configuration and compiler flags?
They can be extracted from the build logs at: https://koji.fedoraproject.org/koji/buildinfo?buildID=2583378

Comment 53 Aous Naman 2024-11-27 06:33:32 UTC
Thank you for the suggestion.

I compiled with the flags of the build logs, libopenjph-0.18.0-1.fc2 crashes on load.
I discovered that removing -flto=auto -ffat-lto-objects from CFLAGS and CXXFLAGS fixes the problem -- heif-enc runs properly.

-ffat-lto-objects is ignored if -flto=auto is removed.
Did not test removing the flags from either CFLAGS or CXXFLAGS; I removed them from both.

In the coming days, I will explore modifying me openjph to fix this issue.

Kind regards,
Aous.

Comment 54 Dominik 'Rathann' Mierzejewski 2024-11-27 10:28:32 UTC
(In reply to Aous Naman from comment #53)
[...]
> I compiled with the flags of the build logs, libopenjph-0.18.0-1.fc2 crashes
> on load.

Ok. It's good that you can reproduce the issue as well.

> I discovered that removing -flto=auto -ffat-lto-objects from CFLAGS and
> CXXFLAGS fixes the problem -- heif-enc runs properly.

Interesting. We could disable LTO temporarily while we're investigating the issue.
LTO is known to expose previously missed bugs.

I checked the build log, too, and saw this warning:
...
/usr/bin/g++ -fPIC -O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64 -march=x86-64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -mtls-dialect=gnu2 -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer -DNDEBUG -Wl,-z,relro -Wl,--as-needed  -Wl,-z,pack-relative-relocs -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -Wl,--build-id=sha1 -specs=/usr/lib/rpm/redhat/redhat-package-notes -shared -Wl,-soname,libopenjph.so.0.18 -o libopenjph.so.0.18.0 CMakeFiles/openjph.dir/codestream/ojph_codeblock.cpp.o CMakeFiles/openjph.dir/codestream/ojph_codeblock_fun.cpp.o CMakeFiles/openjph.dir/codestream/ojph_codestream.cpp.o CMakeFiles/openjph.dir/codestream/ojph_codestream_gen.cpp.o CMakeFiles/openjph.dir/codestream/ojph_codestream_local.cpp.o CMakeFiles/openjph.dir/codestream/ojph_params.cpp.o CMakeFiles/openjph.dir/codestream/ojph_precinct.cpp.o CMakeFiles/openjph.dir/codestream/ojph_resolution.cpp.o CMakeFiles/openjph.dir/codestream/ojph_subband.cpp.o CMakeFiles/openjph.dir/codestream/ojph_tile.cpp.o CMakeFiles/openjph.dir/codestream/ojph_tile_comp.cpp.o CMakeFiles/openjph.dir/coding/ojph_block_common.cpp.o CMakeFiles/openjph.dir/coding/ojph_block_decoder32.cpp.o CMakeFiles/openjph.dir/coding/ojph_block_decoder64.cpp.o CMakeFiles/openjph.dir/coding/ojph_block_encoder.cpp.o CMakeFiles/openjph.dir/others/ojph_arch.cpp.o CMakeFiles/openjph.dir/others/ojph_file.cpp.o CMakeFiles/openjph.dir/others/ojph_mem.cpp.o CMakeFiles/openjph.dir/others/ojph_message.cpp.o CMakeFiles/openjph.dir/transform/ojph_colour.cpp.o CMakeFiles/openjph.dir/transform/ojph_transform.cpp.o CMakeFiles/openjph.dir/codestream/ojph_codestream_sse.cpp.o CMakeFiles/openjph.dir/transform/ojph_colour_sse.cpp.o CMakeFiles/openjph.dir/transform/ojph_transform_sse.cpp.o CMakeFiles/openjph.dir/codestream/ojph_codestream_sse2.cpp.o CMakeFiles/openjph.dir/transform/ojph_colour_sse2.cpp.o CMakeFiles/openjph.dir/transform/ojph_transform_sse2.cpp.o CMakeFiles/openjph.dir/coding/ojph_block_decoder_ssse3.cpp.o CMakeFiles/openjph.dir/codestream/ojph_codestream_avx.cpp.o CMakeFiles/openjph.dir/transform/ojph_colour_avx.cpp.o CMakeFiles/openjph.dir/transform/ojph_transform_avx.cpp.o CMakeFiles/openjph.dir/codestream/ojph_codestream_avx2.cpp.o CMakeFiles/openjph.dir/transform/ojph_colour_avx2.cpp.o CMakeFiles/openjph.dir/transform/ojph_transform_avx2.cpp.o CMakeFiles/openjph.dir/coding/ojph_block_decoder_avx2.cpp.o CMakeFiles/openjph.dir/coding/ojph_block_encoder_avx2.cpp.o CMakeFiles/openjph.dir/coding/ojph_block_encoder_avx512.cpp.o CMakeFiles/openjph.dir/transform/ojph_transform_avx512.cpp.o
/builddir/build/BUILD/openjph-0.18.0-build/OpenJPH-0.18.0/src/core/coding/ojph_block_decoder32.cpp:581:12: warning: type ‘struct frwd_struct’ violates the C++ One Definition Rule [-Wodr]
  581 |     struct frwd_struct {
      |            ^
/builddir/build/BUILD/openjph-0.18.0-build/OpenJPH-0.18.0/src/core/coding/ojph_block_decoder_ssse3.cpp:582:12: note: a different type is defined in another translation unit
  582 |     struct frwd_struct {
      |            ^
/builddir/build/BUILD/openjph-0.18.0-build/OpenJPH-0.18.0/src/core/coding/ojph_block_decoder32.cpp:583:12: note: the first difference of corresponding definitions is field ‘tmp’
  583 |       ui64 tmp;         //!<temporary buffer of read data
      |            ^
/builddir/build/BUILD/openjph-0.18.0-build/OpenJPH-0.18.0/src/core/coding/ojph_block_decoder_ssse3.cpp:584:11: note: a field of same name but different type is defined in another translation unit
  584 |       ui8 tmp[48];      //!<temporary buffer of read data + 16 extra
      |           ^
/builddir/build/BUILD/openjph-0.18.0-build/OpenJPH-0.18.0/src/core/coding/ojph_block_encoder.cpp:342:12: warning: type ‘struct vlc_struct’ violates the C++ One Definition Rule [-Wodr]
  342 |     struct vlc_struct {
      |            ^
/builddir/build/BUILD/openjph-0.18.0-build/OpenJPH-0.18.0/src/core/coding/ojph_block_encoder_avx2.cpp:308:12: note: a different type is defined in another translation unit
  308 |     struct vlc_struct {
      |            ^
/builddir/build/BUILD/openjph-0.18.0-build/OpenJPH-0.18.0/src/core/coding/ojph_block_encoder.cpp:349:11: note: the first difference of corresponding definitions is field ‘tmp’
  349 |       int tmp;       //temporary storage of coded bits
      |           ^
/builddir/build/BUILD/openjph-0.18.0-build/OpenJPH-0.18.0/src/core/coding/ojph_block_encoder_avx2.cpp:315:12: note: a field of same name but different type is defined in another translation unit
  315 |       ui64 tmp;       //temporary storage of coded bits
      |            ^
/builddir/build/BUILD/openjph-0.18.0-build/OpenJPH-0.18.0/src/core/coding/ojph_block_encoder.cpp:342:12: note: type ‘int’ should match type ‘ui64’
  342 |     struct vlc_struct {
      |            ^

This looks worth investigating.

[...] 
> In the coming days, I will explore modifying me openjph to fix this issue.

Thanks a lot for checking. Note that it could also be a bug in the Fedora
toolchain (gcc/ld), but I'd try fixing the above warning first.

Comment 55 Aous Naman 2024-12-05 01:28:49 UTC
Thank you Dominik for looking into this, and for removing the lto flag from OpenJPH compilation.
Only found the time to come to this issue today.

If you recall, OpenJPH can benefit from SIMD CPU features, but only after testing that the host CPU supports them.

I could not fix the illegal instruction -- it is avx512 instruction -- even after getting rid of the above -Wodr warnings.
LTO somehow throws AVX512 and AVX instruction from other branches of code into places it is not supposed to.
Perhaps, if I was more careful with function names, the problem would go away -- who knows.

In the end I found this https://gcc.gnu.org/wiki/LinkTimeOptimization.
See point 6 "in to do list," and I quote.
"Design and implement rules for handling mixed command line options.
  The basic problem here is what to do when the flags used to generate the initial IL are different than the flags used for final code generation:
    $ gcc -flto -O2 -msse2 -c file.c
    $ gcc -flto -o file file.o
"
Perhaps, this is what is causing the problem.

Kind regards,
Aous.

Comment 56 Aous Naman 2024-12-05 04:55:56 UTC
Hi Dominik,

I may have spoken too early.
I have a version that works with LTO on my machine.  
I will test it with fedora build flags, and if it works, I will push ver 0.18.1.
I should know by the end of today my time.

Kind regards,
Aous.

Comment 57 Aous Naman 2024-12-05 09:24:09 UTC
Hi Dominik,

I published version 0.18.1, which I believe address the illegal instruction issue.
I tested with the flags of the build logs

Please let me know how it goes -- happy to test again.

Kind regards,
Aous.

Comment 58 Fedora Update System 2024-12-12 17:20:50 UTC
FEDORA-2024-f45da0d608 (openjph-0.18.2-1.fc41) has been submitted as an update to Fedora 41.
https://bodhi.fedoraproject.org/updates/FEDORA-2024-f45da0d608

Comment 59 Fedora Update System 2024-12-12 17:20:51 UTC
FEDORA-EPEL-2024-9565caa26b (openjph-0.18.2-1.el9) has been submitted as an update to Fedora EPEL 9.
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-9565caa26b

Comment 60 Fedora Update System 2024-12-12 17:20:52 UTC
FEDORA-EPEL-2024-552f5c7e86 (openjph-0.18.2-1.el8) has been submitted as an update to Fedora EPEL 8.
https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-552f5c7e86

Comment 61 Fedora Update System 2024-12-13 01:48:56 UTC
FEDORA-EPEL-2024-552f5c7e86 has been pushed to the Fedora EPEL 8 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-552f5c7e86

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 62 Fedora Update System 2024-12-13 02:11:13 UTC
FEDORA-EPEL-2024-9565caa26b has been pushed to the Fedora EPEL 9 testing repository.

You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2024-9565caa26b

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 63 Aous Naman 2024-12-13 02:22:51 UTC
I tested openjph-0.18.2-1.fc42 -- no crashes.

Comment 64 Fedora Update System 2024-12-13 02:39:04 UTC
FEDORA-2024-b779423a3f has been pushed to the Fedora 40 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-b779423a3f`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-b779423a3f

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 65 Fedora Update System 2024-12-13 02:46:29 UTC
FEDORA-2024-f45da0d608 has been pushed to the Fedora 41 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-f45da0d608`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2024-f45da0d608

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Comment 66 Fedora Update System 2024-12-21 00:31:38 UTC
FEDORA-EPEL-2024-552f5c7e86 (openjph-0.18.2-1.el8) has been pushed to the Fedora EPEL 8 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 67 Fedora Update System 2024-12-21 03:05:29 UTC
FEDORA-2024-b779423a3f (openjph-0.18.2-1.fc40) has been pushed to the Fedora 40 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 68 Fedora Update System 2024-12-21 03:36:07 UTC
FEDORA-2024-f45da0d608 (openjph-0.18.2-1.fc41) has been pushed to the Fedora 41 stable repository.
If problem still persists, please make note of it in this bug report.

Comment 69 Fedora Update System 2024-12-21 03:40:25 UTC
FEDORA-EPEL-2024-9565caa26b (openjph-0.18.2-1.el9) has been pushed to the Fedora EPEL 9 stable repository.
If problem still persists, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.