Bug 1380866 - dracut-fips breaks systemd (via libgcrypt)
Summary: dracut-fips breaks systemd (via libgcrypt)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: libgcrypt
Version: 27
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Tomas Mraz
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-09-30 20:39 UTC by Micah Abbott
Modified: 2018-07-04 14:10 UTC (History)
15 users (show)

Fixed In Version: libgcrypt-1.8.1-3.fc27
Clone Of:
Environment:
Last Closed: 2017-12-12 11:25:39 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1377226 1 None None None 2021-01-20 06:05:38 UTC
Red Hat Bugzilla 1542453 0 unspecified CLOSED libgcrypt-1.8.1-3 breaks gnupg2 on some systems (old kernels?) 2021-02-22 00:41:40 UTC
Red Hat Bugzilla 1561051 0 unspecified CLOSED F27 Server doesn't boot up in FIPS mode. 2021-02-22 00:41:40 UTC

Internal Links: 1377226 1542453 1561051

Description Micah Abbott 2016-09-30 20:39:24 UTC
Originally filed here - https://pagure.io/fedora-atomic/issue/22


After doing a compose of F25 AH with commit 4633217 in the 'fedora-atomic' repo, I see the following packages added:

Added:
  dracut-fips-044-77.fc25.x86_64
  hmaccalc-0.9.14-4.fc24.x86_64

When I rebooted into the compose that included those files, I saw the following kernel panic on the console

Fatal: no entropy gathering module detected
[    1.065689] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.065689]
[    1.066568] CPU: 0 PID: 1 Comm: init Not tainted 4.8.0-0.rc7.git0.1.fc25.x86_64 #1
[    1.067455] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
[    1.069326]  0000000000000086 000000007f9cf7c5 ffff9ffbbd3f7c20 ffffffffb43e55ad
[    1.071354]  ffff9ffbbc899e00 ffffffffb4c312b8 ffff9ffbbd3f7ca8 ffffffffb41b616c
[    1.072861]  ffffffff00000010 ffff9ffbbd3f7cb8 ffff9ffbbd3f7c50 000000007f9cf7c5
[    1.074329] Call Trace:
[    1.074741]  [<ffffffffb43e55ad>] dump_stack+0x63/0x86
[    1.075524]  [<ffffffffb41b616c>] panic+0xe4/0x226
[    1.076246]  [<ffffffffb40a5430>] do_exit+0xb10/0xb10
[    1.076876]  [<ffffffffb40a54b7>] do_group_exit+0x47/0xb0
[    1.077677]  [<ffffffffb40b08c9>] get_signal+0x289/0x630
[    1.078474]  [<ffffffffb4026077>] do_signal+0x37/0x6b0
[    1.079258]  [<ffffffffb40cc090>] ? wake_up_state+0x10/0x20
[    1.080079]  [<ffffffffb40ae18a>] ? signal_wake_up_state+0x2a/0x30
[    1.080983]  [<ffffffffb40ae2c0>] ? complete_signal+0x100/0x1f0
[    1.081882]  [<ffffffffb40aec5e>] ? send_signal+0x3e/0x80
[    1.082679]  [<ffffffffb400329c>] exit_to_usermode_loop+0x8c/0xd0
[    1.083877]  [<ffffffffb4003b38>] prepare_exit_to_usermode+0x38/0x40
[    1.085091]  [<ffffffffb4801c6f>] retint_user+0x8/0x10
[    1.086254] Kernel Offset: 0x33000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    1.088023] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    1.088023]

I tried booting with fips=1 and met the same issue.

Comment 1 Colin Walters 2016-10-03 19:05:59 UTC
We're likely just going to back out the dracut-fips change for now.

Comment 2 Jonathan Lebon 2016-10-03 19:24:14 UTC
This is due to libgcrypt failing to find /dev/urandom during initialization. Still trying to verify whether removing dracut-fips will fix this.

Comment 3 Colin Walters 2016-10-03 20:36:02 UTC
https://pagure.io/fedora-atomic/commits/f25

Comment 4 Colin Walters 2016-10-03 20:44:02 UTC
So, at least in Fedora 25:

# rpm -q systemd
systemd-231-4.fc25.x86_64
# ldd /usr/lib/systemd/systemd|grep gcr
	libgcrypt.so.20 => /lib64/libgcrypt.so.20 (0x00007f1a6e053000)

Which is just going to explode with fips mode because libgcrypt now tries to open /dev/urandom out of a *constructor* which is going to run before systemd has done anything.

I think two things should happen:

1) libgcrypt should stop using constructors for this
2) systemd should reconsider linking to libgcrypt, or in general factor out the "base system bootstrap" into its own binary with more minimal dependencies

Comment 5 Colin Walters 2016-10-03 20:47:37 UTC
Apparently this patch is a downstream only one that was introduced by http://pkgs.fedoraproject.org/cgit/rpms/libgcrypt.git/commit/?id=040c39b7c3527508fe238a794d4f18487ab48f58
with no rationale at all, so...systemd crew, feel free to toss to libgcrypt.

Comment 6 Colin Walters 2016-10-03 20:50:52 UTC
Ah, but this actually happens now because libsystemd-shared depends on libgcrypt, whereas in F24 that didn't exist, so /usr/lib/systemd/systemd didn't link to libgcrypt.

Comment 7 Tomas Mraz 2016-10-04 07:16:16 UTC
(In reply to Colin Walters from comment #4)
> So, at least in Fedora 25:
> 
> # rpm -q systemd
> systemd-231-4.fc25.x86_64
> # ldd /usr/lib/systemd/systemd|grep gcr
> 	libgcrypt.so.20 => /lib64/libgcrypt.so.20 (0x00007f1a6e053000)
> 
> Which is just going to explode with fips mode because libgcrypt now tries to
> open /dev/urandom out of a *constructor* which is going to run before
> systemd has done anything.
> 
> I think two things should happen:
> 
> 1) libgcrypt should stop using constructors for this

Unfortunately that is not possible due to strict requirement for this for the FIPS mode. The FIPS IG requires to run the self tests in the constructor of the shared library - the self tests need to initialize RNG which in turn needs to have /dev/urandom or other entropy source. We could patch libgcrypt to use the getrandom() syscall instead if that would help though.

Comment 8 Colin Walters 2016-10-04 14:04:18 UTC
> The FIPS IG requires to run the self tests in the constructor of the shared library 

That's pretty ridiculous.  Why can't it be on first function call?

> We could patch libgcrypt to use the getrandom() syscall instead if that would help though.

Yeah, it would likely fix this.

Comment 9 Tomas Mraz 2016-10-04 14:20:52 UTC
(In reply to Colin Walters from comment #8)
> > The FIPS IG requires to run the self tests in the constructor of the shared library 
> 
> That's pretty ridiculous.  Why can't it be on first function call?

You would have to ask NIST CMVP not me.

> > We could patch libgcrypt to use the getrandom() syscall instead if that would help though.
> 
> Yeah, it would likely fix this.

If so, I'll work on a patch.

Comment 10 Colin Walters 2016-10-04 16:45:32 UTC
We've reverted adding dracut-fips by default.

https://pagure.io/fedora-atomic/c/696e5182d206ca62f71de7230fdee5f00cfc354a?branch=master

Comment 11 Fedora End Of Life 2017-11-16 19:41:37 UTC
This message is a reminder that Fedora 25 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 25. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora  'version'
of '25'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 25 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged  change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Comment 12 Nikos Mavrogiannopoulos 2017-12-01 21:08:46 UTC
I actually fell into to this today and my laptop became inaccessible. It was impossible for the system to enter rescue or emergency mode. I didn't have the system in FIPS mode though, only dracut-fips was installed, so shouldn't the libgcrypt errors be advisory in that case rather than kill systemd?

Comment 13 Nikos Mavrogiannopoulos 2017-12-01 21:09:36 UTC
(In reply to Nikos Mavrogiannopoulos from comment #12)
> I actually fell into to this today and my laptop became inaccessible. It was
> impossible for the system to enter rescue or emergency mode. I didn't have
> the system in FIPS mode though, 

s/system/kernel

Comment 14 Colin Walters 2017-12-01 21:46:55 UTC
(In reply to Nikos Mavrogiannopoulos from comment #12)
> I actually fell into to this today and my laptop became inaccessible.

<advertisement>Note if you were using https://pagure.io/workstation-ostree-config it'd have been a simple matter of choosing the previous boot entry to roll back.</advertisement>

(Also currently we always generate the initramfs server side and test it there so even if you *did* somehow get dracut-fips installed you'd have to explicitly do `rpm-ostree initramfs --enable` to do client side builds)

Comment 15 Tomas Mraz 2017-12-04 15:20:14 UTC
Nikos, can you please try https://koji.fedoraproject.org/koji/buildinfo?buildID=1007368 build whether it fixes the problem on Fedora 27? The libgcrypt 1.8.x now calls getrandom instead of pulling data from /dev/urandom but it still tried to open the /dev/urandom device just in case getrandom returns ENOSYS.

Comment 16 Nikos Mavrogiannopoulos 2017-12-05 10:30:20 UTC
Tried in VM. No luck. I'll try to get some logs.

Comment 17 Nikos Mavrogiannopoulos 2017-12-05 10:39:56 UTC
What I got in the system log are:

Dec 01 19:16:22 dhcp-10-40-1-102.brq.redhat.com systemd[1]: systemd 234 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +
Dec 01 19:16:22 dhcp-10-40-1-102.brq.redhat.com systemd[1]: Detected architecture x86-64.
Dec 01 19:16:23 dhcp-10-40-1-102.brq.redhat.com systemd-sysctl[707]: Fatal: no entropy gathering module detected
...
Dec 01 19:16:23 dhcp-10-40-1-102.brq.redhat.com systemd[680]: /usr/lib/systemd/system-generators/systemd-gpt-auto-generator terminated by signal ABRT.

After that I see the following in the log, and the system is stuck.

Dec 01 19:16:23 dhcp-10-40-1-102.brq.redhat.com systemd[1]: Starting Create Static Device Nodes in /dev...
Dec 01 19:16:23 dhcp-10-40-1-102.brq.redhat.com fedora-readonly[718]: /usr/lib/systemd/fedora-readonly: line 14: warning: command substitution: ignored null byte in input
Dec 01 19:16:23 dhcp-10-40-1-102.brq.redhat.com audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=fedora-readonly comm="systemd" exe="/usr/lib/systemd/
Dec 01 19:16:23 dhcp-10-40-1-102.brq.redhat.com systemd[1]: Started Configure read-only root support.
Dec 01 19:16:23 dhcp-10-40-1-102.brq.redhat.com systemd[1]: Starting Load/Save Random Seed...
Dec 01 19:16:23 dhcp-10-40-1-102.brq.redhat.com systemd-journald[703]: Time spent on flushing to /var is 18.703ms for 934 entries.

Comment 18 Fedora Update System 2017-12-06 14:57:15 UTC
libgcrypt-1.8.1-3.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2017-642e467163

Comment 19 Fedora Update System 2017-12-10 00:31:45 UTC
libgcrypt-1.8.1-3.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2017-642e467163

Comment 20 Fedora Update System 2017-12-12 11:25:39 UTC
libgcrypt-1.8.1-3.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.