Bug 2357062

Summary: Potential Interaction Between Nvidia Driver and GlibC
Product: [Fedora] Fedora Reporter: lipidconcentrategaming
Component: glibcAssignee: Carlos O'Donell <codonell>
Status: NEW --- QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 42CC: arjun.is, codonell, dj, fberat, fweimer, jlaw, josmyers, mcermak, mcoufal, mfabian, mori, pfrankli, sam, sipoyare, skolosov, suraj.ghimire7
Target Milestone: ---Keywords: Desktop, Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: ---
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Core Dump glxinfo32
none
Core Dump Steam
none
Core Dump eglinfo
none
Nvidia Bug Report none

Description lipidconcentrategaming 2025-04-02 23:52:48 UTC
After upgrading to Fedora 42 Beta, I have been on the hunt to find a fix for anything opengl32 bit refusing to launch and segfaulting at Dlopen. I have reported this problem to libglvnd after finding this report: 
https://bugzilla.redhat.com/show_bug.cgi?id=2346633
In response to reading this I filed a report at the libglvnd git:
https://gitlab.freedesktop.org/glvnd/libglvnd/-/issues/252
It seems that this issue is not caused by libglvnd and is potentially caused by glibc.

Reproducible: Always

Steps to Reproduce:
1.Install Nvidia drivers from rpmfusion.
2. run glxinfo32, elginfo, and/or rpmfusion steam package causing a segfault.
Actual Results:  
The Drivers do indeed install and Nvidia-smi and other stuff does indeed work.
Glxinfo32 and steam(which is 32 bit) fail to launch causing the segfaults in question.
Eglinfo segfaults partway through.

Expected Results:  
Glxinfo32 should not segfault and should display its output.
Steam should open normally.
Eglinfo should display info past Device #1

Using Steam flatpak version 1.0.0.81(Stable Release) does work but steam flatpak version 1.0.0.82(Beta Release) fails to even open.
The Flatpak beta of steam is also the same version as the native rpm-fusion package if that is of any help.

Comment 1 lipidconcentrategaming 2025-04-02 23:54:30 UTC
Created attachment 2083157 [details]
Core Dump glxinfo32

Comment 2 lipidconcentrategaming 2025-04-02 23:55:08 UTC
Created attachment 2083158 [details]
Core Dump Steam

Comment 3 lipidconcentrategaming 2025-04-02 23:55:51 UTC
Created attachment 2083159 [details]
Core Dump eglinfo

Comment 4 lipidconcentrategaming 2025-04-03 01:41:32 UTC
Created attachment 2083164 [details]
Nvidia Bug Report

Comment 5 Florian Weimer 2025-04-03 04:48:35 UTC
Something patched the _dl_debug_update function in the dynamic loader:

Dump of assembler code for function _dl_debug_update:
   0xf7f95880 <+0>:	call   0xf7fb6c01 <__x86.get_pc_thunk.cx>
   0xf7f95885 <+5>:	add    $0x32767,%ecx
   0xf7f9588b <+11>:	mov    0x4(%esp),%edx
   0xf7f9588f <+15>:	test   %edx,%edx
   0xf7f95891 <+17>:	je     0xf7f958a8 <_dl_debug_update+40>
   0xf7f95893 <+19>:	imul   $0x58,%edx,%eax
   0xf7f95896 <+22>:	lea    0x54(%ecx,%eax,1),%eax
   0xf7f9589d <+29>:	cmpl   $0x0,0x4(%eax)
   0xf7f958a1 <+33>:	je     0xf7f958b4 <_dl_debug_update+52>
   0xf7f958a3 <+35>:	jmp    0xf6510320
   0xf7f958a8 <+40>:	lea    0x660(%ecx),%eax
   0xf7f958ae <+46>:	cmpl   $0x0,0x4(%eax)
   0xf7f958b2 <+50>:	jne    0xf7f958a3 <_dl_debug_update+35>
   0xf7f958b4 <+52>:	imul   $0x58,%edx,%edx
   0xf7f958b7 <+55>:	mov    0x14(%ecx,%edx,1),%edx
   0xf7f958be <+62>:	mov    %edx,0x4(%eax)
   0xf7f958c1 <+65>:	ret
End of assembler dump.

The original is:

Dump of assembler code for function _dl_debug_update:
   0x00002880 <+0>:	call   0x23c01 <__x86.get_pc_thunk.cx>
   0x00002885 <+5>:	add    $0x32767,%ecx
   0x0000288b <+11>:	mov    0x4(%esp),%edx
   0x0000288f <+15>:	test   %edx,%edx
   0x00002891 <+17>:	je     0x28a8 <_dl_debug_update+40>
   0x00002893 <+19>:	imul   $0x58,%edx,%eax
   0x00002896 <+22>:	lea    0x54(%ecx,%eax,1),%eax
   0x0000289d <+29>:	cmpl   $0x0,0x4(%eax)
   0x000028a1 <+33>:	je     0x28b4 <_dl_debug_update+52>
   0x000028a3 <+35>:	ret
   0x000028a4 <+36>:	lea    0x0(%esi,%eiz,1),%esi
   0x000028a8 <+40>:	lea    0x660(%ecx),%eax
   0x000028ae <+46>:	cmpl   $0x0,0x4(%eax)
   0x000028b2 <+50>:	jne    0x28a3 <_dl_debug_update+35>
   0x000028b4 <+52>:	imul   $0x58,%edx,%edx
   0x000028b7 <+55>:	mov    0x14(%ecx,%edx,1),%edx
   0x000028be <+62>:	mov    %edx,0x4(%eax)
   0x000028c1 <+65>:	ret
End of assembler dump.

This jump is not present in the original:

   0xf7f958a3 <+35>:	jmp    0xf6510320

The code at this address disassembles to:

   0xf6510320:	push   %ebx
   0xf6510321:	sub    $0x8,%esp
   0xf6510324:	mov    0xf78b52ec,%ebx
   0xf651032a:	movl   $0x1,0xf78b52e8
   0xf6510334:	mov    0xc(%ebx),%eax
   0xf6510337:	cmp    $0x1,%eax
   0xf651033a:	je     0xf6510360
   0xf651033c:	cmp    $0x2,%eax
   0xf651033f:	je     0xf65103e0
   0xf6510345:	test   %eax,%eax
   0xf6510347:	je     0xf6510380
   0xf6510349:	movl   $0x0,0xf78b52e8
   0xf6510353:	add    $0x8,%esp
   0xf6510356:	pop    %ebx
   0xf6510357:	ret

Apparently, this function does not preserve %eax (although it's hard to tell with those jumps, and I didn't disassemble further), so the intended return value from _dl_debug_update gets corrupted.

Realistically, there isn't much we can do about this on the glibc side.

Comment 6 lipidconcentrategaming 2025-04-03 04:52:53 UTC
That is interesting to know and that is way more information than I had to work with before.
Is there any recommendation on where i should report this?

Comment 7 Florian Weimer 2025-04-03 05:04:39 UTC
(In reply to lipidconcentrategaming from comment #6)
> That is interesting to know and that is way more information than I had to
> work with before.
> Is there any recommendation on where i should report this?

I asked on the libglvnd ticket if the Nvidia drivers are known to do this. It could be something else.

Comment 8 lipidconcentrategaming 2025-04-03 05:05:52 UTC
(In reply to Florian Weimer from comment #7)
> (In reply to lipidconcentrategaming from comment #6)
> > That is interesting to know and that is way more information than I had to
> > work with before.
> > Is there any recommendation on where i should report this?
> 
> I asked on the libglvnd ticket if the Nvidia drivers are known to do this.
> It could be something else.

Sounds good.

Comment 9 Florian Weimer 2025-04-03 05:31:15 UTC
There's a report that associates this crash with a binutils change in NOP code generation:

https://forums.developer.nvidia.com/t/slackware64-current-multilib-after-update-to-glibc-2-39-32-bit-glx-programs-segfault/281769/2

Comment 10 lipidconcentrategaming 2025-04-03 05:43:58 UTC
(In reply to Florian Weimer from comment #9)
> There's a report that associates this crash with a binutils change in NOP
> code generation:
> 
> https://forums.developer.nvidia.com/t/slackware64-current-multilib-after-
> update-to-glibc-2-39-32-bit-glx-programs-segfault/281769/2

I originally read that report and thought about downgrading binutils and glibc but realized i couldn't actually do that.

Comment 11 leigh scott 2025-04-03 10:24:06 UTC
I have added the bug links to https://bugzilla.rpmfusion.org/show_bug.cgi?id=7180

Comment 12 leigh scott 2025-04-09 08:19:33 UTC
(In reply to Florian Weimer from comment #5)
> Something patched the _dl_debug_update function in the dynamic loader:
> 
> Dump of assembler code for function _dl_debug_update:
>    0xf7f95880 <+0>:	call   0xf7fb6c01 <__x86.get_pc_thunk.cx>
>    0xf7f95885 <+5>:	add    $0x32767,%ecx
>    0xf7f9588b <+11>:	mov    0x4(%esp),%edx
>    0xf7f9588f <+15>:	test   %edx,%edx
>    0xf7f95891 <+17>:	je     0xf7f958a8 <_dl_debug_update+40>
>    0xf7f95893 <+19>:	imul   $0x58,%edx,%eax
>    0xf7f95896 <+22>:	lea    0x54(%ecx,%eax,1),%eax
>    0xf7f9589d <+29>:	cmpl   $0x0,0x4(%eax)
>    0xf7f958a1 <+33>:	je     0xf7f958b4 <_dl_debug_update+52>
>    0xf7f958a3 <+35>:	jmp    0xf6510320
>    0xf7f958a8 <+40>:	lea    0x660(%ecx),%eax
>    0xf7f958ae <+46>:	cmpl   $0x0,0x4(%eax)
>    0xf7f958b2 <+50>:	jne    0xf7f958a3 <_dl_debug_update+35>
>    0xf7f958b4 <+52>:	imul   $0x58,%edx,%edx
>    0xf7f958b7 <+55>:	mov    0x14(%ecx,%edx,1),%edx
>    0xf7f958be <+62>:	mov    %edx,0x4(%eax)
>    0xf7f958c1 <+65>:	ret
> End of assembler dump.
> 
> The original is:
> 
> Dump of assembler code for function _dl_debug_update:
>    0x00002880 <+0>:	call   0x23c01 <__x86.get_pc_thunk.cx>
>    0x00002885 <+5>:	add    $0x32767,%ecx
>    0x0000288b <+11>:	mov    0x4(%esp),%edx
>    0x0000288f <+15>:	test   %edx,%edx
>    0x00002891 <+17>:	je     0x28a8 <_dl_debug_update+40>
>    0x00002893 <+19>:	imul   $0x58,%edx,%eax
>    0x00002896 <+22>:	lea    0x54(%ecx,%eax,1),%eax
>    0x0000289d <+29>:	cmpl   $0x0,0x4(%eax)
>    0x000028a1 <+33>:	je     0x28b4 <_dl_debug_update+52>
>    0x000028a3 <+35>:	ret
>    0x000028a4 <+36>:	lea    0x0(%esi,%eiz,1),%esi
>    0x000028a8 <+40>:	lea    0x660(%ecx),%eax
>    0x000028ae <+46>:	cmpl   $0x0,0x4(%eax)
>    0x000028b2 <+50>:	jne    0x28a3 <_dl_debug_update+35>
>    0x000028b4 <+52>:	imul   $0x58,%edx,%edx
>    0x000028b7 <+55>:	mov    0x14(%ecx,%edx,1),%edx
>    0x000028be <+62>:	mov    %edx,0x4(%eax)
>    0x000028c1 <+65>:	ret
> End of assembler dump.
> 
> This jump is not present in the original:
> 
>    0xf7f958a3 <+35>:	jmp    0xf6510320
> 
> The code at this address disassembles to:
> 
>    0xf6510320:	push   %ebx
>    0xf6510321:	sub    $0x8,%esp
>    0xf6510324:	mov    0xf78b52ec,%ebx
>    0xf651032a:	movl   $0x1,0xf78b52e8
>    0xf6510334:	mov    0xc(%ebx),%eax
>    0xf6510337:	cmp    $0x1,%eax
>    0xf651033a:	je     0xf6510360
>    0xf651033c:	cmp    $0x2,%eax
>    0xf651033f:	je     0xf65103e0
>    0xf6510345:	test   %eax,%eax
>    0xf6510347:	je     0xf6510380
>    0xf6510349:	movl   $0x0,0xf78b52e8
>    0xf6510353:	add    $0x8,%esp
>    0xf6510356:	pop    %ebx
>    0xf6510357:	ret
> 
> Apparently, this function does not preserve %eax (although it's hard to tell
> with those jumps, and I didn't disassemble further), so the intended return
> value from _dl_debug_update gets corrupted.
> 
> Realistically, there isn't much we can do about this on the glibc side.

Is this the same issue?

Comment 13 leigh scott 2025-04-09 08:20:46 UTC
(In reply to leigh scott from comment #12)
> (In reply to Florian Weimer from comment #5)
> > Something patched the _dl_debug_update function in the dynamic loader:
> > 
> > Dump of assembler code for function _dl_debug_update:
> >    0xf7f95880 <+0>:	call   0xf7fb6c01 <__x86.get_pc_thunk.cx>
> >    0xf7f95885 <+5>:	add    $0x32767,%ecx
> >    0xf7f9588b <+11>:	mov    0x4(%esp),%edx
> >    0xf7f9588f <+15>:	test   %edx,%edx
> >    0xf7f95891 <+17>:	je     0xf7f958a8 <_dl_debug_update+40>
> >    0xf7f95893 <+19>:	imul   $0x58,%edx,%eax
> >    0xf7f95896 <+22>:	lea    0x54(%ecx,%eax,1),%eax
> >    0xf7f9589d <+29>:	cmpl   $0x0,0x4(%eax)
> >    0xf7f958a1 <+33>:	je     0xf7f958b4 <_dl_debug_update+52>
> >    0xf7f958a3 <+35>:	jmp    0xf6510320
> >    0xf7f958a8 <+40>:	lea    0x660(%ecx),%eax
> >    0xf7f958ae <+46>:	cmpl   $0x0,0x4(%eax)
> >    0xf7f958b2 <+50>:	jne    0xf7f958a3 <_dl_debug_update+35>
> >    0xf7f958b4 <+52>:	imul   $0x58,%edx,%edx
> >    0xf7f958b7 <+55>:	mov    0x14(%ecx,%edx,1),%edx
> >    0xf7f958be <+62>:	mov    %edx,0x4(%eax)
> >    0xf7f958c1 <+65>:	ret
> > End of assembler dump.
> > 
> > The original is:
> > 
> > Dump of assembler code for function _dl_debug_update:
> >    0x00002880 <+0>:	call   0x23c01 <__x86.get_pc_thunk.cx>
> >    0x00002885 <+5>:	add    $0x32767,%ecx
> >    0x0000288b <+11>:	mov    0x4(%esp),%edx
> >    0x0000288f <+15>:	test   %edx,%edx
> >    0x00002891 <+17>:	je     0x28a8 <_dl_debug_update+40>
> >    0x00002893 <+19>:	imul   $0x58,%edx,%eax
> >    0x00002896 <+22>:	lea    0x54(%ecx,%eax,1),%eax
> >    0x0000289d <+29>:	cmpl   $0x0,0x4(%eax)
> >    0x000028a1 <+33>:	je     0x28b4 <_dl_debug_update+52>
> >    0x000028a3 <+35>:	ret
> >    0x000028a4 <+36>:	lea    0x0(%esi,%eiz,1),%esi
> >    0x000028a8 <+40>:	lea    0x660(%ecx),%eax
> >    0x000028ae <+46>:	cmpl   $0x0,0x4(%eax)
> >    0x000028b2 <+50>:	jne    0x28a3 <_dl_debug_update+35>
> >    0x000028b4 <+52>:	imul   $0x58,%edx,%edx
> >    0x000028b7 <+55>:	mov    0x14(%ecx,%edx,1),%edx
> >    0x000028be <+62>:	mov    %edx,0x4(%eax)
> >    0x000028c1 <+65>:	ret
> > End of assembler dump.
> > 
> > This jump is not present in the original:
> > 
> >    0xf7f958a3 <+35>:	jmp    0xf6510320
> > 
> > The code at this address disassembles to:
> > 
> >    0xf6510320:	push   %ebx
> >    0xf6510321:	sub    $0x8,%esp
> >    0xf6510324:	mov    0xf78b52ec,%ebx
> >    0xf651032a:	movl   $0x1,0xf78b52e8
> >    0xf6510334:	mov    0xc(%ebx),%eax
> >    0xf6510337:	cmp    $0x1,%eax
> >    0xf651033a:	je     0xf6510360
> >    0xf651033c:	cmp    $0x2,%eax
> >    0xf651033f:	je     0xf65103e0
> >    0xf6510345:	test   %eax,%eax
> >    0xf6510347:	je     0xf6510380
> >    0xf6510349:	movl   $0x0,0xf78b52e8
> >    0xf6510353:	add    $0x8,%esp
> >    0xf6510356:	pop    %ebx
> >    0xf6510357:	ret
> > 
> > Apparently, this function does not preserve %eax (although it's hard to tell
> > with those jumps, and I didn't disassemble further), so the intended return
> > value from _dl_debug_update gets corrupted.
> > 
> > Realistically, there isn't much we can do about this on the glibc side.
> 
> Is this the same issue?

I forgot the link :-)

https://www.phoronix.com/news/Glibc-WA-Steam-Exec-Stack

Comment 14 Florian Weimer 2025-04-09 08:29:27 UTC
(In reply to leigh scott from comment #13)
> Is this the same issue?
>
> https://www.phoronix.com/news/Glibc-WA-Steam-Exec-Stack

I don't think so.