RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1527587 - On Dell Optiplex 960, Dell Precision T5600 Systems Applying RHBA-2017:3296 Prevents Smart Card Logins
Summary: On Dell Optiplex 960, Dell Precision T5600 Systems Applying RHBA-2017:3296 Pr...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pcsc-lite-ccid
Version: 7.4
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Bob Relyea
QA Contact: Asha Akkiangady
URL:
Whiteboard:
: 1548232 (view as bug list)
Depends On:
Blocks: 1477664
TreeView+ depends on / blocked
 
Reported: 2017-12-19 15:06 UTC by aheverle
Modified: 2021-12-10 15:30 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-05 17:19:36 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Output from the pklogin_finder debug command (2.32 KB, text/plain)
2018-04-12 17:52 UTC, Jason Fonseca
no flags Details
Output from GDM running the pam_pkcs11 plugin and failing (3.18 KB, text/plain)
2018-04-12 18:09 UTC, Jason Fonseca
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1528418 0 unspecified CLOSED glibc: Merge error in XSAVE dynamic linker trampoline patch 2021-02-22 00:41:40 UTC

Internal Links: 1528418

Description aheverle 2017-12-19 15:06:43 UTC
After applying RHBA-2017:3296 to a number of systems, users with Dell Optiplex 960 workstations were unable to unlock their screens or log in using their smart cards.  When the user attempts to log in the following errors are reported in the journal:

Dec 12 16:31:35 hostname gdm-smartcard][9620]: sign_value() failed:
Dec 12 16:31:35 hostname gdm-smartcard][9620]: pam_pkcs11(gdm-smartcard:auth): sign_value() failed:

The issue occurs when trying to log into GDM, or unlock the screen (also GDM) using a smart card.  Console login via smart card is possible.

If we "yum downgrade glibc nscd glibc-common glibc-headers glibc-devel" then users are able to log in or unlock their screens again.  Alternatively, if we edit /etc/pam_pkcs11/pam_pkcs11.conf and change cert_policy = crl_auto,  signature; to cert_policy = none; users can log in, but this disables any sort of card verification.

Comment 2 Florian Weimer 2017-12-19 16:38:14 UTC
Was the system rebooted after applying the upgrade?

What was the glibc version before the update?

Comment 5 Florian Weimer 2017-12-21 15:39:32 UTC
For further debugging, we need to know which dynamic linker trampoline is selected by this CPU.

For example, on a somewhat similar Wolfdale CPU, I get this:

$ gdb /bin/true
…
(gdb) break _dl_fixup
Function "_dl_fixup" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (_dl_fixup) pending.
(gdb) r
Starting program: /bin/true 

Breakpoint 1, 0x00007ffff7de9810 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
(gdb) bt
#0  0x00007ffff7de9810 in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#1  0x00007ffff7df0e93 in _dl_runtime_resolve_fxsave () from /lib64/ld-linux-x86-64.so.2
#2  0x0000000000401409 in _start ()
(gdb) 

It would be interesting to see whether the affected system reports _dl_runtime_resolve_fxsave or _dl_runtime_resolve_xsave as the trampoline.

Comment 8 Florian Weimer 2017-12-21 18:12:44 UTC
The _dl_runtime_resolve_fxsave implement looks okay:

Dump of assembler code for function _dl_runtime_resolve_fxsave:
   0x00007ffff7df0e50 <+0>:     push   %rbx
   0x00007ffff7df0e51 <+1>:     mov    %rsp,%rbx
   0x00007ffff7df0e54 <+4>:     and    $0xfffffffffffffff0,%rsp
   0x00007ffff7df0e58 <+8>:     sub    $0x240,%rsp
   0x00007ffff7df0e5f <+15>:    mov    %rax,(%rsp)
   0x00007ffff7df0e63 <+19>:    mov    %rcx,0x8(%rsp)
   0x00007ffff7df0e68 <+24>:    mov    %rdx,0x10(%rsp)
   0x00007ffff7df0e6d <+29>:    mov    %rsi,0x18(%rsp)
   0x00007ffff7df0e72 <+34>:    mov    %rdi,0x20(%rsp)
   0x00007ffff7df0e77 <+39>:    mov    %r8,0x28(%rsp)
   0x00007ffff7df0e7c <+44>:    mov    %r9,0x30(%rsp)
   0x00007ffff7df0e81 <+49>:    fxsave 0x40(%rsp)
   0x00007ffff7df0e86 <+54>:    mov    0x10(%rbx),%rsi
   0x00007ffff7df0e8a <+58>:    mov    0x8(%rbx),%rdi
   0x00007ffff7df0e8e <+62>:    callq  0x7ffff7de9810 <_dl_fixup>
   0x00007ffff7df0e93 <+67>:    mov    %rax,%r11
   0x00007ffff7df0e96 <+70>:    fxrstor 0x40(%rsp)
   0x00007ffff7df0e9b <+75>:    mov    0x30(%rsp),%r9
   0x00007ffff7df0ea0 <+80>:    mov    0x28(%rsp),%r8
   0x00007ffff7df0ea5 <+85>:    mov    0x20(%rsp),%rdi
   0x00007ffff7df0eaa <+90>:    mov    0x18(%rsp),%rsi
   0x00007ffff7df0eaf <+95>:    mov    0x10(%rsp),%rdx
   0x00007ffff7df0eb4 <+100>:   mov    0x8(%rsp),%rcx
   0x00007ffff7df0eb9 <+105>:   mov    (%rsp),%rax
   0x00007ffff7df0ebd <+109>:   mov    %rbx,%rsp
   0x00007ffff7df0ec0 <+112>:   mov    (%rsp),%rbx
   0x00007ffff7df0ec4 <+116>:   add    $0x18,%rsp
   0x00007ffff7df0ec8 <+120>:   bnd jmpq *%r11
End of assembler dump.

fxsave needs 512 (0x200) bytes, and there are that many bytes at %rsp + 0x40 due to earlier allocation of 0x240 bytes.

However, the intent is that XSAVE is used on this machine, and glibc upstream master selects that.  This is caused by an incorrectly nested if statement in init_cpu_features in sysdeps/x86/cpu-features.c—XSAVE is only selected if both AVX and XMM support is available.

What is puzzling here is that FXSAVE does not work as expected.  It should not because it saves all XMM vector registers.  We need to fix the selection and see if using XSAVE makes a difference.

Comment 10 Florian Weimer 2017-12-21 18:14:52 UTC
cpuinfo extract:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 23
model name	: Intel(R) Core(TM)2 Duo CPU     E8400  @ 3.00GHz
stepping	: 10
microcode	: 0xa0c
cpu MHz		: 2992.225
cache size	: 6144 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority dtherm
bogomips	: 5984.45
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

I found an Core 2 Duo E8600 CPU which is very similar (also stepping 10), at microcode level 0xa0b, and it passes the elf/tst-sse tests, so it's not an easy matter of clobbering all vector registers.

Comment 12 Carlos O'Donell 2017-12-21 18:22:11 UTC
(In reply to Florian Weimer from comment #8)
> What is puzzling here is that FXSAVE does not work as expected.  It should
> not because it saves all XMM vector registers.  We need to fix the selection
> and see if using XSAVE makes a difference.

Did you mean to write the following: "It should work because fxsave saves all XMM vector registers."?

Comment 16 Florian Weimer 2017-12-21 19:20:53 UTC
(In reply to Carlos O'Donell from comment #12)
> (In reply to Florian Weimer from comment #8)
> > What is puzzling here is that FXSAVE does not work as expected.  It should
> > not because it saves all XMM vector registers.  We need to fix the selection
> > and see if using XSAVE makes a difference.
> 
> Did you mean to write the following: "It should work because fxsave saves
> all XMM vector registers."?

Right, it should work/should not fail.

Comment 22 Carlos O'Donell 2017-12-22 00:05:01 UTC
(In reply to aheverle from comment #0)
> After applying RHBA-2017:3296 to a number of systems, users with Dell
> Optiplex 960 workstations were unable to unlock their screens or log in
> using their smart cards.  When the user attempts to log in the following
> errors are reported in the journal:
> 
> Dec 12 16:31:35 hostname gdm-smartcard][9620]: sign_value() failed:
> Dec 12 16:31:35 hostname gdm-smartcard][9620]:
> pam_pkcs11(gdm-smartcard:auth): sign_value() failed:
> 
> The issue occurs when trying to log into GDM, or unlock the screen (also
> GDM) using a smart card.  Console login via smart card is possible.
> 
> If we "yum downgrade glibc nscd glibc-common glibc-headers glibc-devel" then
> users are able to log in or unlock their screens again.  Alternatively, if
> we edit /etc/pam_pkcs11/pam_pkcs11.conf and change cert_policy = crl_auto, 
> signature; to cert_policy = none; users can log in, but this disables any
> sort of card verification.

We have a test fix we would like you to try first. If this fixes the issue then we will have to go back to Intel to discuss some of the behaviour of fxsave on the older Wolfdale CPU.

Create the following /etc/yum.repos.d/rhbz1527587.repo
~~~
[rhbz1527587]
name=RHEL 7.5 hotfix for bug 1527587
baseurl=https://people.redhat.com/codonell/rhel-7.5-rhbz1527587
enabled=1
gpgcheck=0
protect=1
~~~
Then yum upgrade should just upgrade you to the testfix version.

Comment 26 Jason Fonseca 2018-01-09 23:02:58 UTC
I tried the patch above.

With the Dell Keyboard smart card reader (M/N: KB813) logging in using the smart card works.  SUDO via pam_pkcs11 and unlocking the screen does not work.  The following messages appear in the journal:

Dec 22 10:09:02 hostname pcscd[1674]: 04016391 ifdhandler.c:117:CreateChannelByNameOrChannel() failed
Dec 22 10:09:02 hostname pcscd[1674]: 00000013 readerfactory.c:1009:RFInitializeReader() Open Port 0x200000 Failed (usb:413c/2101:libudev:0:/dev/bus/usb/003/002)
Dec 22 10:09:02 hostname pcscd[1674]: 00000004 readerfactory.c:312:RFAddReader() Dell Dell Smart Card Reader Keyboard init failed.

If I install a HID OMNIKEY 3121 reader, then login, sudo, and unlock works properly.

Comment 27 Carlos O'Donell 2018-01-10 19:23:17 UTC
(In reply to Jason Fonseca from comment #26)
> I tried the patch above.
> 
> With the Dell Keyboard smart card reader (M/N: KB813) logging in using the
> smart card works.  SUDO via pam_pkcs11 and unlocking the screen does not
> work.  The following messages appear in the journal:
> 
> Dec 22 10:09:02 hostname pcscd[1674]: 04016391
> ifdhandler.c:117:CreateChannelByNameOrChannel() failed
> Dec 22 10:09:02 hostname pcscd[1674]: 00000013
> readerfactory.c:1009:RFInitializeReader() Open Port 0x200000 Failed
> (usb:413c/2101:libudev:0:/dev/bus/usb/003/002)
> Dec 22 10:09:02 hostname pcscd[1674]: 00000004
> readerfactory.c:312:RFAddReader() Dell Dell Smart Card Reader Keyboard init
> failed.
> 
> If I install a HID OMNIKEY 3121 reader, then login, sudo, and unlock works
> properly.

It is not entirely clear to me what you are stating here.

Does the test package progress some of the use case that were previously failing?

Comment 28 Jason Fonseca 2018-01-11 15:26:01 UTC
I retested the three versions of glibc glibc-headers glibc-common glibc-devel nscd again and the testfix.1.bz1527587 does not fix the issue.  Here are the results:

glibc-2.17-196.el7.x86_64: GDM smart card login works, console smart card login works
glibc-2.17-196.el7_4.2.x86_64" GDM smart card login doesn't work, console smart card login works
glibc-2.17-220.el7.0.0.testfix.1.bz1527587.x86_64: GDM smart card login doesn't work, console smart card login works

For these tests I rebooted the system between each package change.  I only used the Dell keyboard smart card reader.

Comment 29 Carlos O'Donell 2018-01-15 21:54:35 UTC
(In reply to Jason Fonseca from comment #28)
> I retested the three versions of glibc glibc-headers glibc-common
> glibc-devel nscd again and the testfix.1.bz1527587 does not fix the issue. 
> Here are the results:
> 
> glibc-2.17-196.el7.x86_64: GDM smart card login works, console smart card
> login works
> glibc-2.17-196.el7_4.2.x86_64" GDM smart card login doesn't work, console
> smart card login works
> glibc-2.17-220.el7.0.0.testfix.1.bz1527587.x86_64: GDM smart card login
> doesn't work, console smart card login works
> 
> For these tests I rebooted the system between each package change.  I only
> used the Dell keyboard smart card reader.

Are you certain glibc is the only thing that changed?

I am going to put together one more test fix for you.

While I do that can you please triple check that upgrading *only* glibc causes the issue? That on a system with working GDM+console smart card login that upgrading from 196.el7 to 220.el7 (hotfix) results in non-working GDM smart card login?

Are you running nscd? If you are running nscd, please restart nscd (or reboot) after upgrade.

Comment 30 Jason Fonseca 2018-01-16 21:54:49 UTC
(In reply to Carlos O'Donell from comment #29)
> Are you certain glibc is the only thing that changed?

The five glibc packages were the only ones to change for the test in Comment 28.  There may be other packages involved in some of the other issues we are seeing (Comment 26).  I'll see if downgrading the other affect those issues.

> I am going to put together one more test fix for you.
> 
> While I do that can you please triple check that upgrading *only* glibc
> causes the issue? That on a system with working GDM+console smart card login
> that upgrading from 196.el7 to 220.el7 (hotfix) results in non-working GDM
> smart card login?

I installed the following packages on a separate OptiPlex 960.  After rebooting smart-card login via GDM did not work.

glibc-2.17-220.el7.0.0.testfix.1.bz1527587.x86_64.rpm
glibc-common-2.17-220.el7.0.0.testfix.1.bz1527587.x86_64.rpm
glibc-devel-2.17-220.el7.0.0.testfix.1.bz1527587.x86_64.rpm
glibc-headers-2.17-220.el7.0.0.testfix.1.bz1527587.x86_64.rpm
nscd-2.17-220.el7.0.0.testfix.1.bz1527587.x86_64.rpm

When I downgraded the above packages to 196.el7 and reboot, GDM smart-card logins work again.
 
> Are you running nscd? If you are running nscd, please restart nscd (or
> reboot) after upgrade.

nscd is disabled on the systems.

Comment 32 Jason Fonseca 2018-01-30 20:06:51 UTC
I have another data point.  We use coolkey with pam_pkcs11 and pcscd.  If I switch the test system over to opensc I get the following error for any authentication.

login[2620]: pam_pkcs11(login:auth): Failed to initialize crypto

This was with the testfix.1.bz1527587 packages installed.

Comment 33 Carlos O'Donell 2018-02-19 04:06:19 UTC
(In reply to Jason Fonseca from comment #32)
> I have another data point.  We use coolkey with pam_pkcs11 and pcscd.  If I
> switch the test system over to opensc I get the following error for any
> authentication.
> 
> login[2620]: pam_pkcs11(login:auth): Failed to initialize crypto
> 
> This was with the testfix.1.bz1527587 packages installed.

We have put together a newer set of packages that include some fixes for applications running with minimal thread stacks. Please tell us if these packages make a difference for you.

In truth we're grasping at straws here because we don't know what difference the packages could have made that might really impact you in the ways you describe. If indeed some subsystem is running out of memory it may fail in such a way to result in authentication being denied as you describe, but we'd also expect a log or record of the crash.

Create the following /etc/yum.repos.d/rhbz1527587-v2.repo
~~~
[rhbz1527587]
name=RHEL 7.5 Beta for bug 1527587 v2
baseurl=https://people.redhat.com/codonell/rhel-7.5-rhbz1527587-v2
enabled=1
gpgcheck=0
protect=1
~~~

Then yum upgrade should just upgrade you to the new version.

You *should* be on the following version then: glibc-2.17-222.

Please tell us if this helps.

Comment 35 Jason Fonseca 2018-02-20 21:27:41 UTC
I installed the new packages and restarted the system.  GDM logins still fail:

Feb 20 14:30:09 host gdm-smartcard][2067]: sign_value() failed:
Feb 20 14:30:09 host gdm-smartcard][2067]: pam_pkcs11(gdm-smartcard:auth): sign_value() failed:

Is there some way I can generate debugging output from gdm-smartcard that would be helpful?

Comment 36 Carlos O'Donell 2018-02-20 22:35:22 UTC
(In reply to Jason Fonseca from comment #35)
> I installed the new packages and restarted the system.  GDM logins still
> fail:
> 
> Feb 20 14:30:09 host gdm-smartcard][2067]: sign_value() failed:
> Feb 20 14:30:09 host gdm-smartcard][2067]: pam_pkcs11(gdm-smartcard:auth):
> sign_value() failed:
> 
> Is there some way I can generate debugging output from gdm-smartcard that
> would be helpful?

While it appears that changing the glibc version makes a difference, it isn't clear exactly what difference it makes, none that we are aware of that might directly impact a signing algorithm like this.

At this point we need to really pass this to the relevant component author to have them diagnose what changed with regard to their expectations on the APIs they are using. Someone has to diagnose why sign_value() from pkcs11_lib.c fails, and the best people to do that are the pam_pkcs11 maintainers.

Assigning to pam_pkcs11 to see if they have any detailed input on how to debug this.

I am not going to disappear here, I will be watching this bug to see exactly how the glibc interaction plays a role in the failure on your end.

Comment 37 Jason Fonseca 2018-02-22 16:39:09 UTC
Thanks for your help.  It does seem to be a glibc <-> gdm <-> pam_pkcs11 interaction issue.

Comment 39 Nikos Mavrogiannopoulos 2018-03-27 11:43:43 UTC
*** Bug 1548232 has been marked as a duplicate of this bug. ***

Comment 41 aheverle 2018-03-27 16:13:43 UTC
What is the output of :
 # pklogin_finder debug

(note: this will output the PIN, It's OK to obscure that value. I don't need it to diagnose the issue). The PIN line will read 'PIN = [{whatever the pin is}]

Comment 43 Bob Relyea 2018-03-27 16:18:42 UTC
What is the output of :
pklogin_finder debug
(note: this will output the PIN, It's OK to obscure that value. I don't need it to diagnose the issue). The PIN line will read 'PIN = [{whatever the pin is}]

Comment 44 Jason Fonseca 2018-04-12 17:52:04 UTC
Created attachment 1421006 [details]
Output from the pklogin_finder debug command

I have attached the output from the pklogin_finder command.  This command completes successfully on the affected host.

Comment 45 Jason Fonseca 2018-04-12 18:09:54 UTC
Created attachment 1421023 [details]
Output from GDM running the pam_pkcs11 plugin and failing

I have attached the output from trying to log into GDM with the debug flag added to the pam_pkcs11 plugin in the /etc/pam.d/smartcard-auth file.

Comment 46 Bob Relyea 2018-06-01 18:13:37 UTC
Jason, does using the latest coolkey from z-stream fix the issue? 

I'm moving the component to pcsc-lite-ccid. If the latest coolkey doesn't fix the issue, it's probably and issue in pcsc-lite-ccid running on certain Dell readers.

Comment 47 Jason Fonseca 2018-06-08 19:20:03 UTC
Due to this and other issues, we've retired all of the Optiplex 960s.  So, unfortunately, I no longer have that hardware to test on.  I noticed that someone added Precision T5600s to the ticket and we do still have some of those.  We did not experience the GDM issue on those, but do get a sign_value() error when pam_pkcs11 is called for SUDO.

Jun 08 08:20:24 host sudo[4005]: pam_pkcs11(sudo-i:auth): sign_value() failed:

I don't know if this is the same issue or a different one.  The systems are running coolkey-1.1.0-37.el7.x86_64.  Is that the latest one you referred to above?


Note You need to log in before you can comment on or make changes to this bug.