Bug 1572944

Summary:

4.16.4-200.fc27.x86_64 takes minutes to finish crng init on some systems

Product:

[Fedora] Fedora

Reporter:

James Ralston <ralston>

Component:

kernel

Assignee:

Kernel Maintainer List <kernel-maint>

Status:

CLOSED ERRATA

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

CC:

airlied, awilliam, bskeggs, bugzilla.berk, dbenoit, dustymabe, eduard.vopicka, ewk, hdegoede, hugh, ichavero, itamar, jarodwilson, jcline, jforbes, jglisse, jlebon, john.j5live, jonathan, josef, jreznik, kernel-maint, labbott, linville, mattdm, mchehab, miabbott, mihai, mjg59, nhorman, nmavrogi, noloader, praiskup, steved, tmraz, tytso, walters, zenczykowski

Target Milestone:

---

Keywords:

CommonBugs, Reopened

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

PrioritizedBug https://fedoraproject.org/wiki/Common_F28_bugs#boot-random-block

Fixed In Version:

kernel-4.16.6-302.fc28 kernel-4.16.6-202.fc27 kernel-4.17.4-200.fc28 kernel-4.18.8-200.fc28 kernel-4.18.9-100.fc27 kernel-4.18.9-300.fc29

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-07-11 20:19:56 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
Proposed kernel patch if you really must use Jitter entropy	none

Description James Ralston 2018-04-29 06:09:07 UTC

Description of problem:

On my system (AMD A10-7800 Radeon R7, ASUSTeK A88XM-E), 4.16.4-200.fc27.x86_64 hangs during the boot process. It makes it this far:

[    2.326086] usb 3-2: New USB device found, idVendor=046d, idProduct=c24c
[    2.328038] usb 3-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0
[    2.328984] usb 3-2: Product: G400s Optical Gaming Mouse
[    2.329915] usb 3-2: Manufacturer: Logitech
[    2.338356] input: Logitech G400s Optical Gaming Mouse as /devices/pci0000:00/0000:00:12.0/usb3/3-2/3-2:1.0/0003:046D:C24C.0004/input/input4
[    2.340400] hid-generic 0003:046D:C24C.0004: input,hidraw3: USB HID v1.10 Mouse [Logitech G400s Optical Gaming Mouse] on usb-0000:00:12.0-2/input0
[    2.345363] hid-generic 0003:046D:C24C.0005: hiddev98,hidraw4: USB HID v1.10 Device [Logitech G400s Optical Gaming Mouse] on usb-0000:00:12.0-2/input1

According to the dmesg output from the previous kernel, 4.16.3-200.fc27.x86_64, the next lines should be:

[    2.386486] [drm] radeon kernel modesetting enabled.
[    2.387492] checking generic (e0000000 300000) vs hw (e0000000 10000000)
[    2.387493] fb: switching to radeondrmfb from EFI VGA
[    2.388569] Console: switching to colour dummy device 80x25
[    2.412982] [drm] initializing kernel modesetting (KAVERI 0x1002:0x130F 0x1043:0x85CB 0x00).
[    2.413020] [drm] doorbell mmio base: 0xF0000000
[    2.413023] [drm] doorbell mmio size: 8388608
[    2.413081] ATOM BIOS: 113
[    2.413128] radeon 0000:00:01.0: VRAM: 1024M 0x0000000000000000 - 0x000000003FFFFFFF (1024M used)
[    2.413132] radeon 0000:00:01.0: GTT: 2048M 0x0000000040000000 - 0x00000000BFFFFFFF
[    2.413137] [drm] Detected VRAM RAM=1024M, BAR=256M

…but 4.16.4-200.fc27.x86_64 never gets there.

If I type on the keyboard, letters appear on the screen, and I can Ctrl-Alt-Del to reboot. But otherwise, the kernel is hung.

My guess is that there's a regression somewhere in the video driver code.

Version-Release number of selected component (if applicable):

4.16.4-200.fc27.x86_64

Comment 1 Steve 2018-04-30 16:00:38 UTC

I can confirm that kernel after 4.16.3 will not boot on my System. 

Boot stops at "A start job is running for Hold until boot process finishes up(time/no limit). 

I have a Intel Atom D525 CPU with Nvidia ION/PCIe/SSE2.

kernel-4.16.3-200.fc27.x86_64 boots, kernel-4.16.4-200.fc27.x86_64 and kernel-4.16.5-200.fc27.x86_64 does not.

Comment 2 David Benoit 2018-04-30 17:11:55 UTC

I am experiencing the same issue of the start job holding for a boot process to finish.

i7-7600U CPU
Intel HD Graphics

Additional Info:
The system will not boot to graphical.target, but I am able to boot into multi-user.target.  From multi-user, the display server can be started seemingly without issue.

Comment 3 Jeremy Cline 2018-04-30 17:17:32 UTC

This could be due to the changes in /dev/random in 4.16.4 which causes the boot to hang until the CRNG is seeded if something on startup requires it. Can you try booting and letting it sit for a bit? If it still doesn't finish, you can provide some entropy by typing on the keyboard until it gets enough entropy to seed the CRNG.

Comment 4 Steve 2018-04-30 17:54:46 UTC

(In reply to Jeremy Cline from comment #3)
> This could be due to the changes in /dev/random in 4.16.4 which causes the
> boot to hang until the CRNG is seeded if something on startup requires it.
> Can you try booting and letting it sit for a bit? If it still doesn't
> finish, you can provide some entropy by typing on the keyboard until it gets
> enough entropy to seed the CRNG.

Strange, by typing somewhat on the keyboard the system comes (fully) up.

Comment 5 Jonathan Lebon 2018-04-30 20:07:46 UTC

> This could be due to the changes in /dev/random in 4.16.4 which causes the boot to hang until the CRNG is seeded if something on startup requires it. Can you try booting and letting it sit for a bit? If it still doesn't finish, you can provide some entropy by typing on the keyboard until it gets enough entropy to seed the CRNG.

Thanks for this hint! This is an issue for cloud images which now take much longer to have SSH come up (on the order of 3-4 minutes). I suspect this is at least in part due to the three `sshd-keygen` services that need entropy to generate host keys (one for each of rsa, ecdsa, ed25519), though I think there are other services triggering this since even on reboot I see a delay.

One can also see this clearly happen retrospectively as the journal is essentially frozen until "kernel: random: crng init done" is printed, and then it goes on. I'm not sure what the correct approach is here, but it makes for a painful OOTB experience in the cloud scenario. :(

Comment 6 Colin Walters 2018-04-30 20:29:08 UTC

See also https://bugzilla.redhat.com/show_bug.cgi?id=1389351

Comment 7 Jeremy Cline 2018-04-30 21:03:23 UTC

This is the result of the fix for CVE-2018-1108[0]. If everyone who runs across this can provide the year, model, make, and CPU type of their hardware, that would be helpful to upstream[1].

You can use "systemd-analyze" to discover what services are taking a long time to start up. Regardless of what happens in the kernel, it would be good to minimize the number of services that rely on the CRNG at startup.

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1567306
[1] https://marc.info/?l=linux-kernel&m=152503960222567&w=2

Comment 8 James Ralston 2018-05-01 05:23:33 UTC

I can confirm that I am getting hit by the fix for CVE-2018-1108 on my desktop system. If I wait long enough and/or mash the keyboard, the boot process will eventually continue:

[    2.343362] input: Logitech G400s Optical Gaming Mouse as /devices/pci0000:00
/0000:00:12.0/usb3/3-2/3-2:1.0/0003:046D:C24C.0004/input/input4
[    2.345272] hid-generic 0003:046D:C24C.0004: input,hidraw3: USB HID v1.10 Mou
se [Logitech G400s Optical Gaming Mouse] on usb-0000:00:12.0-2/input0
[    2.349551] hid-generic 0003:046D:C24C.0005: hiddev98,hidraw4: USB HID v1.10 Device [Logitech G400s Optical Gaming Mouse] on usb-0000:00:12.0-2/input1
[   81.461227] random: crng init done
[   81.520340] systemd[1]: systemd 234 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 +IDN default-hierarchy=hybrid)

I didn't see this issue with 4.16.4-200.fc27.x86_64 on an Intel laptop I have (which also runs Fedora 27), but I can think of at least two differences that may potentially account for that:

1. The Intel laptop may have a hardware entropy source that the kernel can leverage.

2. I don't use LUKS on the Intel laptop, but I do use LUKS on my desktop system. (The block on crng init happened well before the prompt for the LUKS password, though.)

I built my own system, via hardware purchase in November 2014. Processor and motherboard:

AMD A10-7800 Radeon R7
ASUSTeK A88XM-E

Processor flags:

fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb bpext ptsc cpb hw_pstate vmmcall fsgsbase bmi1 xsaveopt arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold overflow_recov

FWIW, the amd-rng module asserts that I have no hardware entropy source:

$ modprobe amd-rng
modprobe: ERROR: could not insert 'amd_rng': No such device

systemd-analyze says:

Startup finished in 6.290s (firmware) + 5.626s (loader) + 1min 21.500s (kernel) + 12.841s (initrd) + 58.474s (userspace) = 2min 44.731s

Is there any way to tell what part of the boot process blocked on the crng init?

Comment 9 Jeremy Cline 2018-05-01 12:37:27 UTC

(In reply to James Ralston from comment #8)
> systemd-analyze says:
> 
> Startup finished in 6.290s (firmware) + 5.626s (loader) + 1min 21.500s
> (kernel) + 12.841s (initrd) + 58.474s (userspace) = 2min 44.731s
> 
> Is there any way to tell what part of the boot process blocked on the crng
> init?

Yes, the "systemd-analyze blame" subcommand shows all running services ordered by initialization time, and "systemd-analyze critical-chain" shows a tree of the time-critical chain of units for the default target. It should be pretty clear what service blocked on crng init when compared with a boot of the older kernel.

Comment 10 Jonathan Lebon 2018-05-01 13:12:40 UTC

This is in an OpenStack cloud VM running F28AH:

> If everyone who runs across this can provide the year, model, make, and CPU type of their hardware, that would be helpful to upstream[1].

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel Xeon Processor (Skylake)
stepping        : 4
microcode       : 0x1
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 arat pku ospke

> You can use "systemd-analyze" to discover what services are taking a long time to start up.

# systemd-analyze blame | head
    5min 19.437s cloud-init-local.service
    1min 30.190s gssproxy.service
    1min 30.172s chronyd.service
         10.664s cloud-init.service
          2.424s cloud-config.service
          1.950s lvm2-pvscan@252:2.service
          1.586s registries.service
          1.416s cloud-final.service
           940ms docker-storage-setup.service
           876ms initrd-switch-root.service

# systemd-analyze critical-chain
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

multi-user.target @5min 34.274s
└─sshd.service @5min 34.882s +14ms
  └─sshd-keygen.target @5min 34.878s

Comment 11 Colin Walters 2018-05-01 13:28:54 UTC

For virt hosts IMO you really want to use virtio-rand.  There's lots of links for this; see:

https://wiki.qemu.org/Features/VirtIORNG
https://github.com/vagrant-libvirt/vagrant-libvirt#random-number-generator-passthrough
https://wiki.openstack.org/wiki/LibvirtVirtioRng#Random_number_generator_device
https://github.com/clalancette/oz/pull/260
etc.

Let's try using that OpenStack property?  The vagrant-libvirt one gets me to crng init in 3 seconds versus 13.

Comment 12 Matthew Miller 2018-05-01 13:36:32 UTC

See also bug #1572916, for 4.17.0+ on Rawhide.

Comment 13 Jonathan Lebon 2018-05-01 14:40:52 UTC

Yeah, with `hw_rng_model=virtio`, I get much saner times:

# systemd-analyze critical-chain
The time after the unit is active or started is printed after the "@" character.
The time the unit takes to start is printed after the "+" character.

multi-user.target @16.739s
└─sshd.service @17.036s +12ms
  └─sshd-keygen.target @17.034s

This is a good workaround for platforms that allow this.

Comment 14 Adam Williamson 2018-05-01 15:25:29 UTC

I don't think somehow telling everyone who ever runs into this to set a magic non-default property on their cloud instances, assuming they even have control over this on all cloud platforms, is a practical solution.

It's good that the workaround exists but it's not sufficient. It is *never* "OK" for a user's experience to be "this system was working fine, then I installed a stable update, now I need to do some magic thing I never had to do before to keep the system working as well as it was working before".

Comment 15 Dusty Mabe 2018-05-01 15:30:55 UTC

(In reply to Adam Williamson from comment #14)

> It's good that the workaround exists but it's not sufficient. It is *never*
> "OK" for a user's experience to be "this system was working fine, then I
> installed a stable update, now I need to do some magic thing I never had to
> do before to keep the system working as well as it was working before".

Agree. I would say the only exception is if it's a critical critical critical security flaw, but for the most part, keep existing functionality.

Comment 16 James Ralston 2018-05-02 07:07:24 UTC

(In reply to Jeremy Cline from comment #9)

> (In reply to James Ralston from comment #8)
>
> > Is there any way to tell what part of the boot process blocked on the crng
> > init?
> 
> Yes, the "systemd-analyze blame" subcommand shows all running services
> ordered by initialization time, and "systemd-analyze critical-chain" shows a
> tree of the time-critical chain of units for the default target.

Thanks; that's useful.

> It should be pretty clear what service blocked on crng init when compared
> with a boot of the older kernel.

There's no difference, because the block isn't occurring in a service. The block occurs in the kernel itself:

May 01 20:50:08 localhost.localdomain kernel: random: get_random_u32 called from bsp_init_amd+0x1e6/0x230 with crng_init=0

The offending code is in arch/x86/kernel/cpu/amd.c:

if (c->x86 == 0x15) {
        unsigned long upperbit;
        u32 cpuid, assoc;

        cpuid    = cpuid_edx(0x80000005);
        assoc    = cpuid >> 16 & 0xff;
        upperbit = ((cpuid >> 24) << 10) / assoc;

        va_align.mask     = (upperbit - 1) & PAGE_MASK;
        va_align.flags    = ALIGN_VA_32 | ALIGN_VA_64;

        /* A random value per boot for bit slice [12:upper_bit) */
        va_align.bits = get_random_int() & va_align.mask;
}

But looking at this:

https://fedoraproject.org/wiki/Common_F28_bugs#boot-random-block

…the reason why I'm getting burned by this is because although I'm not booting in FIPS mode, I have the dracut-fips package installed, and that apparently reduces the entropy pool to such an extent that the get_random_int() call in bsp_init_amd() blocks, and thus deadlocks the system until I provide enough entropy by mashing the keyboard.

(BTW, it is *completely non-intuitive* that the mere presence of the dracut-fips package changes the behavior of the kernel, *without* my passing fips=1 to the kernel.)

If I remove dracut-fips and re-run dracut to regenerate my initramfs, I don't have this issue: I still see the warning about bsp_init_amd() calling get_random_u32() with crng_init=0, but the system will boot to the point where it prompts me for the password to unlock my LUKS-encrypted volumes. And the act of typing that password reliably triggers the "random: crng init done" message.

I'm still skeptical that this is the best way to address CVE-2018-1108, though.  For starters, we have RHEL7 systems we are required to run in fips=1 mode, and it sounds like the current fix could make it impossible to boot those systems without manual intervention if Red Hat backports the CVE-2018-1108 fix to RHEL7.

Perhaps a better way to address CVE-2018-1108 would be to provide a kernel command line argument to select the old (broken) behavior of crng_ready() or the new (correct) behavior? For now, the default could be the old behavior. At some point in the future, once the userspace/kernel/FIPS issues are addressed, the default could be changed to prefer the new behavior?

Comment 17 Fedora Update System 2018-05-02 11:37:59 UTC

kernel-4.16.6-302.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-cf65b3a7a2

Comment 18 Fedora Update System 2018-05-02 11:38:24 UTC

kernel-4.16.6-302.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-cf65b3a7a2

Comment 19 Fedora Update System 2018-05-02 11:42:31 UTC

kernel-4.16.6-202.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-6dacc4732c

Comment 20 Fedora Update System 2018-05-02 11:42:45 UTC

kernel-4.16.6-202.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-6dacc4732c

Comment 21 Jeremy Cline 2018-05-02 11:53:38 UTC

(In reply to James Ralston from comment #16)
> I'm still skeptical that this is the best way to address CVE-2018-1108,
> though.  For starters, we have RHEL7 systems we are required to run in
> fips=1 mode, and it sounds like the current fix could make it impossible to
> boot those systems without manual intervention if Red Hat backports the
> CVE-2018-1108 fix to RHEL7.
> 
> Perhaps a better way to address CVE-2018-1108 would be to provide a kernel
> command line argument to select the old (broken) behavior of crng_ready() or
> the new (correct) behavior? For now, the default could be the old behavior.
> At some point in the future, once the userspace/kernel/FIPS issues are
> addressed, the default could be changed to prefer the new behavior?

For the short term (since the issue is still being discussed upstream), I've reverted the fix for CVE-2018-1108 in kernel-4.16.6-302.fc28 and kernel-4.16.6-202.fc27.

Comment 22 Adam Williamson 2018-05-02 15:28:21 UTC

James: the complication with dracut-fips isn't that it changes the behaviour of the kernel. What we think is happening (Laura and Patrick figured this out) is that if the initramfs is built with FIPS support, this causes a blocking use of the kernel RNG *very early in the boot process* - right when systemd is starting up, in fact. That's what https://bugzilla.redhat.com/show_bug.cgi?id=1572916#c13 is about. The earliness of this makes this a particularly bad case for two reasons: one, it means that using VirtIO-rng in a VM doesn't help because the module isn't yet loaded by this point. Two, it's so early that the kernel is very low on entropy at that point, and can't really get much more especially in a VM unless you do some keyboard mashing or whatever.

That's all AIUI, of course.

Comment 23 Laura Abbott 2018-05-02 15:52:06 UTC

The reverts are a solution nobody is particularly happy about but blocking stable Fedora releases is not great so we took the most pragmatic option. We're still working with upstream to get a proper solution.

Comment 24 Jeremy Cline 2018-05-03 13:28:49 UTC

*** Bug 1573801 has been marked as a duplicate of this bug. ***

Comment 25 Fedora Update System 2018-05-03 19:30:33 UTC

kernel-4.16.6-202.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-6dacc4732c

Comment 26 Fedora Update System 2018-05-03 20:21:30 UTC

kernel-4.16.6-302.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-cf65b3a7a2

Comment 27 Eduard Vopicka 2018-05-04 12:10:36 UTC

kernel-4.16.6-302.fc28 is OK on my Lenovo T440s.

Comment 28 Fedora Update System 2018-05-05 20:33:45 UTC

kernel-4.16.6-302.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 29 Fedora Update System 2018-05-07 04:15:16 UTC

kernel-4.16.6-202.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 30 Saverio Proto 2018-06-29 12:51:47 UTC

Hello,
I can reproduce this on Fedora 28

[fedora@fed ~]$ uname -a
Linux fed 4.17.2-200.fc28.x86_64 #1 SMP Mon Jun 18 20:09:31 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

I am making Cloud Images for Openstack for Fedora 28. It would be great to have a fix to upstream it also to the Openstack diskimage-builder

The boot is stuck until I access the NoVNC console and I hit some keys on the keyboard. A soon as I do that I get a message "random crng init done" and the system finishes boots correctly very quickly

thanks

Saverio

Comment 31 Jeremy Cline 2018-06-29 13:55:09 UTC

Hi Saverio,

The patches reverting the change got dropped in the 4.17 rebase. I've brought them back for the time being and they should be included in the next stable build. I'd recommend investigating how to configure virtio-rng for your VMs on Openstack as that will help with this, as well.

Comment 32 Justin M. Forbes 2018-07-03 13:22:44 UTC

The real place to fix this is in the diskimage-builder, it need to pass virtio-rng. I will leave the reverts in for 4.17 but they should go away by the 4.18 rebase.  The other place known to not give access to virtio-rng is Google, but they are the ones who pushed this change, so they seem to think it is more important than being able to boot in their cloud right now. I expect they will have a fix soon.

Comment 33 Fedora Update System 2018-07-03 21:19:39 UTC

kernel-tools-4.17.4-200.fc28 kernel-4.17.4-200.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-d82a45d9ab

Comment 34 Fedora Update System 2018-07-03 21:20:45 UTC

kernel-tools-4.17.4-100.fc27 kernel-4.17.4-100.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-0f7448b3ab

Comment 35 Berk 2018-07-04 10:52:26 UTC

Hi there, 

I'm still having this issue with Kernel 4.17.4. I'm on a ThinkPad X131e and the only way to skip this long boot process is to mash keys on the keyboard mentioned here: https://fedoraproject.org/wiki/Common_F28_bugs#Boot_process_is_very_slow_or_appears_to_hang_with_kernel_4.16.4_onwards

Also, dmesg has this error: extension failed to load: ThinkPad Battery Extension 

Is there a permanent fix for this? 

Thanks!

Comment 36 Fedora Update System 2018-07-04 16:23:03 UTC

kernel-4.17.4-100.fc27, kernel-tools-4.17.4-100.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-0f7448b3ab

Comment 37 Fedora Update System 2018-07-04 18:21:59 UTC

kernel-4.17.4-200.fc28, kernel-tools-4.17.4-200.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-d82a45d9ab

Comment 38 Neil Horman 2018-07-05 13:39:20 UTC

FWIW, we're working on incorporating the jitter entropy daemon into the upstream rng-tools and intend to backport it to fedora ASAP:
https://github.com/nhorman/rng-tools/pull/17

Jitter is a entropy generator that uses High resolution timers to create random data.  A solution to this problem might be, once this feature is backported, to simpy run rngd.  Any hardware that supports high resolution timers (including guests) should be able to use this feature to create sufficient entropy to unblock the boot process.


Note however, it would still be preferable to use virtio-rng, as that means you would just need to run a single rngd instance on the host, rather than one on each guest.

Comment 39 Colin Walters 2018-07-05 14:32:59 UTC

> FWIW, we're working on incorporating the jitter entropy daemon into the upstream rng-tools

I'm not sure how much that's going to help given how many instances of this are stalls very early on in userspace.  If we're going to gather entropy in userspace from *local* sources (which I still find weird given that as Lennart commented, we're shuffling data from one part of the kernel to the other), then it seems like it'll need to be forked off very early on in the initramfs at least, right?

The libgcrypt case is truly insane though - we just need to go to the committee and tell them that requiring it to be a library constructor is fundamentally broken.

And virtualization providers/hosts need to be providing entropy in an easily accessible way, and virt tools need to be sure they're enabling it.

Comment 40 Eduard Vopicka 2018-07-05 19:15:13 UTC

Problem with booting of my Lenovo T440s notebook is back. To proceed with boot, I must type and type and type...

I really do not think that it is good idea to just remove the reverts again with 4.18 rebase without putting another solution into effect.

Comment 41 Maciej Żenczykowski 2018-07-06 07:38:58 UTC

[root@varda ~]# uname -a
Linux varda 4.17.3-200.fc28.x86_64 #1 SMP Tue Jun 26 14:17:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@varda ~]# systemd-analyze blame | head
   15min 25.768s cloud-init-local.service
    1min 30.176s gssproxy.service
    1min 30.174s chronyd.service
          3.279s network.service
          3.095s initrd-switch-root.service
          1.723s cloud-init.service
          1.119s iptables.service
           880ms cloud-config.service
           782ms dev-sda1.device
           528ms cloud-final.service

[root@nike ~]# uname -a
Linux nike 4.17.3-200.fc28.x86_64 #1 SMP Tue Jun 26 14:17:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@nike ~]# systemd-analyze blame | head
3h 14min 58.268s cloud-init-local.service
    1min 30.145s gssproxy.service
    1min 30.142s chronyd.service
          3.296s network.service
          3.231s initrd-switch-root.service
          1.666s cloud-init.service
          1.200s cloud-config.service
          1.165s ip6tables.service
           992ms dev-sda1.device
           649ms systemd-journal-flush.service

(and my 3rd VM hasn't yet finished coming up many many hours later)

Comment 42 Maciej Żenczykowski 2018-07-06 08:08:32 UTC

btw. what about adding something like 'rdrand-gen -n 1024 > /dev/random' (requires RdRand rpm) to the boot process or clout-init-local.service?

Comment 43 Maciej Żenczykowski 2018-07-06 22:12:28 UTC

I'm just going to update that:
  dnf update --enablerepo=updates-testing kernel-core kernel-headers
which upgraded to 4.17.4-200.fc28 does indeed fix the problem.

Comment 44 Maciej Żenczykowski 2018-07-07 00:36:00 UTC

btw. I'm surprised to learn that systemd random seed restoration does not increase entropy (via RNDADDENTROPY ioctl)...?

[root@athina ~]# strace -ff /usr/lib/systemd/systemd-random-seed load
openat(AT_FDCWD, "/proc/sys/kernel/random/poolsize", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "4096\n", 1024)                 = 5
close(3)                                = 0
stat("/var/lib/systemd", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
openat(AT_FDCWD, "/var/lib/systemd/random-seed", O_RDWR|O_CREAT|O_NOCTTY|O_CLOEXEC, 0600) = 3
openat(AT_FDCWD, "/dev/urandom", O_RDWR|O_NOCTTY|O_CLOEXEC) = 4
read(3, "\236\321\315\207p\333\357X\224\213\326 \327\307_\10\342\240\254U\37\33{|\30\377\303\6\330\303?\254"..., 512) = 512
lseek(3, 0, SEEK_SET)                   = 0
write(4, "\236\321\315\207p\333\357X\224\213\326 \327\307_\10\342\240\254U\37\33{|\30\377\303\6\330\303?\254"..., 512) = 512
fchmod(3, 0600)                         = 0
fchown(3, 0, 0)                         = 0
read(4, "\304\373\276h4L9#\0\224\311\5\271\203\371Q\3q\19\322`\21N\306\305\234ho\372\336\260"..., 512) = 512
write(3, "\304\373\276h4L9#\0\224\311\5\271\203\371Q\3q\19\322`\21N\306\305\234ho\372\336\260"..., 512) = 512
close(4)                                = 0
close(3)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

Comment 45 Maciej Żenczykowski 2018-07-07 00:54:16 UTC

Maybe something like this should be added to the boot scripts?

rdrand-gen -n 4096 | python -c $'import fcntl, os, struct, sys\nRNDADDENTROPY = 1074287107\nfd = os.open("/dev/random", os.O_WRONLY)\nwhile True:\n  rnd=sys.stdin.read(512)\n  if not rnd: break\n  fcntl.ioctl(fd, RNDADDENTROPY, struct.pack("ii%is" % len(rnd), 8 * len(rnd), len(rnd), rnd))\nos.close(fd)'

cat /proc/sys/kernel/random/entropy_avail

(based partially on https://github.com/netom/onetimepad/blob/master/rndaddentropy.py)

Comment 46 Maciej Żenczykowski 2018-07-07 07:41:16 UTC

'Fixed' it via replacing the random seed loading script from systemd to make use of rdrand-gen (from RdRand rpm) and to increase entropy via ioctl while feeding it in.

So this is perhaps one avenue of attack?

[root@varda ~]# uname -a
Linux varda 4.17.3-200.fc28.x86_64 #1 SMP Tue Jun 26 14:17:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

[root@varda ~]# systemd-analyze blame | head
          3.279s initrd-switch-root.service
          2.873s cloud-init-local.service
          2.746s systemd-journal-flush.service
          2.687s network.service
          1.333s cloud-init.service
          1.090s iptables.service
           910ms cloud-config.service
           766ms dev-sda1.device
           563ms systemd-random-seed.service
           514ms cloud-final.service

[root@varda ~]# cat rnd 
#!/bin/bash

declare -r RND_SEED_DIR='/var/lib/systemd'
declare -r RND_SEED_FILE="${SAVEDIR}/random-seed"

entropy() {
  echo "$(< /proc/sys/kernel/random/entropy_avail)"
}

poolsize_bits() {
  echo "$(< /proc/sys/kernel/random/poolsize)"
}

poolsize_bytes() {
  echo "$[$(poolsize_bits) / 8]"
}

rnd_seed_save() {
  [[ -d "${RND_SEED_DIR}" ]] || return

  dd if=/dev/urandom of="${RND_SEED_FILE}" bs="$(poolsize_bytes)" count=1 status=none
  chown 0:0 "${RND_SEED_FILE}"
  chmod 0600 "${RND_SEED_FILE}"
  #ls -al "${RND_SEED_FILE}"
}

rnd_seed_load() {
  [[ -d "${RND_SEED_DIR}" ]] || return
  [[ -f "${RND_SEED_FILE}" ]] || return

  dd if="${RND_SEED_FILE}" of=/dev/urandom bs="$(poolsize_bytes)" count=1 status=none
  rnd_seed_save
}

rnd_flush() {
  dd if=/dev/random bs=$[$(entropy) / 8] count=1 status=none | xxd
}

prog() {
cat <<EOF
import os, fcntl, struct, sys
RNDADDENTROPY = 1074287107
fd = os.open("/dev/random", os.O_WRONLY)
while True:
  rnd=sys.stdin.read(512)
  if not rnd: break
  fcntl.ioctl(fd, RNDADDENTROPY, struct.pack("ii%is" % len(rnd), 8 * len(rnd), len(rnd), rnd))
os.close(fd)
EOF
}

rnd_fill() {
  {
    dd if=/dev/hwrng bs=512 count=4 status=none 2>/dev/null
    rdrand-gen -n $[4096*4]
  } | python -c "$(prog)"
}

main() {
  local -r arg="$1"
  shift

  case "${arg}" in
    load) rnd_seed_load; rnd_fill;;
    restore) rnd_seed_load;;
    save) rnd_seed_save;;
    entropy) entropy;;
    poolsize) echo "poolsize bits: $(poolsize_bits) bytes: $(poolsize_bytes)";;
    poolsize_bits) poolsize_bits;;
    poolsize_bytes) poolsize_bytes;;
    flush) entropy; rnd_flush; entropy;;
    fill) entropy; rnd_fill; entropy;;
    *) echo "Usage: $0 [load|save]" 1>&2;;
  esac
}

main "$@"; exit

# Analyze timing via:
#
# systemd-analyze blame | head
#
# systemctl status systemd-random-seed.service
#
# Install via:
#
# chcon unconfined_u:object_r:lib_t:s0 /root/rnd
# mv /usr/lib/systemd/systemd-random-seed /usr/lib/systemd/systemd-random-seed.old
# ln -s /root/rnd /usr/lib/systemd/systemd-random-seed
#
#
# Test via:
#
# ./fast-reboot 'Fedora (4.17.3-200.fc28.x86_64) 28 (Cloud Edition)'

Comment 47 Colin Walters 2018-07-07 12:34:46 UTC

> btw. I'm surprised to learn that systemd random seed restoration does not increase entropy (via RNDADDENTROPY ioctl)...?

That is surprising to me too; I asked on the list:

https://lists.freedesktop.org/archives/systemd-devel/2018-July/041008.html

Comment 48 Fedora Update System 2018-07-11 20:19:56 UTC

kernel-4.17.4-200.fc28, kernel-tools-4.17.4-200.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 49 Theodore Tso 2018-07-15 04:18:06 UTC

Created attachment 1458938 [details]
Proposed kernel patch if you really must use Jitter entropy

I'm really not a fan of Jitter entropy generation, but I understand why Fedora is trying to use it as a workaround.

But if you must use, I suggest the use of a patch like this.

Comment 50 Maciej Żenczykowski 2018-08-07 05:42:53 UTC

I've confirmed that the 'bad' (ie. without the CVE mitigation/bugfix rollback) kernel also boots quickly on a VM with virtio-rng enabled (but very slowly without).

Comment 51 Maciej Żenczykowski 2018-09-12 08:40:22 UTC

Looks like it's back...

[root@varda ~]# uname -a
Linux varda 4.18.5-200.fc28.x86_64 #1 SMP Tue Sep 4 15:56:14 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

[root@varda ~]# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 03)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:03.0 Non-VGA unclassified device: Red Hat, Inc. Virtio SCSI
00:04.0 Ethernet controller: Red Hat, Inc. Virtio network device

[root@varda ~]# dmesg | egrep -i rng
[    0.000000] random: get_random_bytes called from start_kernel+0x93/0x558 with crng_init=0
[  602.254183] random: crng init done

[root@varda ~]# systemd-analyze blame | head
    9min 54.042s cloud-init-local.service
    1min 30.070s chronyd.service
    1min 30.055s gssproxy.service
          3.717s dnf-makecache.service

(and this is the first of 3 vms to come back from a reboot into the new kernel)

Comment 52 Maciej Żenczykowski 2018-09-12 17:19:10 UTC

After approximately 9 hours worth of waiting... got tired of it.
Stuffed keypresses into remote serial console, and sure enough both other VMs get unblocked and continue to boot.

[32237.164756] random: crng init done
[32237.168337] random: 7 urandom warning(s) missed due to ratelimiting
[32240.889906] cloud-init[565]: Cloud-init v. 17.1 running 'init-local' at Wed, 12 Sep 2018 08:15:06 +0000. Up 12.94 seconds.

[32515.468006] random: crng init done
[32515.471652] random: 7 urandom warning(s) missed due to ratelimiting
[32519.002762] cloud-init[568]: Cloud-init v. 17.1 running 'init-local' at Wed, 12 Sep 2018 08:15:15 +0000. Up 23.36 seconds.

Comment 53 Maciej Żenczykowski 2018-09-12 17:32:05 UTC

It would be nice to get one of: 
 https://kernel.googlesource.com/pub/scm/linux/kernel/git/tytso/random/+/9b25436662d5fb4c66eb527ead53cab15f596ee0%5E%21/
and whatever support is needed for it ie.
 https://kernel.googlesource.com/pub/scm/linux/kernel/git/tytso/random/+/39a8883a2b989d1d21bd8dd99f5557f0c5e89694%5E%21/

[and I guess we'd have to set the kernel commandline flag 'random.trust_cpu=on']

or
  https://lkml.org/lkml/2018/9/6/1003

I think either of those would fix the problem, both would probably be even better.

Comment 54 Adam Williamson 2018-09-12 19:09:11 UTC

Maciej: was this with virtio-rng still enabled?

Comment 55 Maciej Żenczykowski 2018-09-13 08:29:49 UTC

No: while AFAICT the Fedora 4.18.5-200.fc28.x86_64 kernel (most likely) supports virtio-rng (4.17 did, no reason why this would change), the cloud virtualization platform I'm running on (GCE) does not expose a virtio random pci device.  (side note: I imagine that since adding a pci device changes the shape of a VM it may be a very time consuming change to make: think of software rollbacks in underlying virtualization infrastructure, how to handle VM migrations and what happens if the extra pci device breaks some extant VM image somewhere in some subtle way - possibly just by virtue of pci devices being numbered differently now, plus additional fun like what happens if the guest can exhaust the host's entropy pool, etc...)

btw. this has the additional annoying aspect that the system just blocks, goes idle and basically stops generating *any* entropy.  If the system just decided to read pseudo-random sectors of off the primary drive, it would at least have a chance to generate some randomness via disk activity (possibly more likely with the above scsi add_randomness patch) and just generic 'something is happening' activity.  Otherwise it appears it can just sit idle for 9+ hours (at which point I gave up and force-typed a few hundred characters at the remote serial console).  I'm actually not sure why one of the 3 VMs recovers so much faster (in only 10 minutes with no external action on my part).  I *think* this might have been the broadwell one vs the other two being haswell, so perhaps there's some extra influence of underlying hardware (I imagine newer cpu may also mean tons of other things like network or disk are newer too)...

also of note: I reinstalled my 'improved /usr/lib/systemd/systemd-random-seed' script from comment 46 and of course now things once again boot fine... so this is (of course) indeed rng starvation again (with my fix we now treat the saved random seed as a source of entropy while we restore it, and also feed in randr cpu generated data if cpu provides it (and since my VMs are on haswell and broadwell machines: it does, but it wouldn't on sandybridge or older ones).

Comment 56 Laura Abbott 2018-09-14 00:34:58 UTC

I did a backport of the patches to use the cpu's RNG for feeding the pool because I'm tried of playing whack-a-mole. This is on by default but those who really don't trust it can turn it off on the command line. It should be in a 4.18.8.

Comment 57 Adam Williamson 2018-09-14 00:43:35 UTC

Do you think that's worth a Beta FE (given that we slip^H^H^H^Hre-scheduled to the target #1 date today)?

Comment 58 Pavel Raiskup 2018-09-14 05:30:04 UTC

*** Bug 1628599 has been marked as a duplicate of this bug. ***

Comment 59 Maciej Żenczykowski 2018-09-14 08:33:11 UTC

(this bug should probably be reopened, or a new one filed against 4.18.5)

Comment 60 Laura Abbott 2018-09-14 17:53:10 UTC

If people can test and confirm it works, I'm for a FE. 4.18.8 should be out this weekend and I should have it built by Monday.

Comment 61 Maciej Żenczykowski 2018-09-14 23:18:30 UTC

Sure, but https://koji.fedoraproject.org/koji/buildinfo?buildID=1143476 ie. 4.18.7-200.fc28 is still the latest thing I can see... and afaict per the changelog it doesn't have any relevant fixes.

Comment 62 Adam Williamson 2018-09-14 23:28:46 UTC

^^^

"I should have it built by Monday."

Comment 63 Maciej Żenczykowski 2018-09-15 18:21:24 UTC

Ah, I see, I wasn't aware how things worked.

I was confused by references to the upstream stable 4.18.8 which isn't maintained by RH/Fedora.

Anyway, 4.18.8 is out now and doesn't include the changes, but the fedora .spec repo now includes backports of the above 2 patches:

https://src.fedoraproject.org/rpms/kernel/c/c96d1d09f0be343683b92f7d057cfba2f758ac07?branch=f27

https://src.fedoraproject.org/rpms/kernel/c/b1cc6d82ff4fedf3259faa3f0b475c9f95fa6474?branch=f28

https://src.fedoraproject.org/rpms/kernel/c/aa224e7033e6a25b5baee1e053be24c98a3db43e?branch=f29

And for 4.19 Linus already pulled the fixes straight from Ted:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3243a89dcbd8f5810b72ee0903d349bd000c4c9d
which I believe it's already there in 4.19-rc3

However, even with the patches:
  config RANDOM_TRUST_CPU
    bool "Trust the CPU manufacturer to initialize Linux's CRNG"
    depends on X86 || S390 || PPC
    default n

So 4.19+ kernels will still need the RANDOM_TRUST_CPU=y generic portion.

Comment 64 Maciej Żenczykowski 2018-09-16 23:31:57 UTC

Looks like it works.

[root@varda ~]# uname -a
Linux varda 4.18.8-200.fc28.x86_64 #1 SMP Sun Sep 16 18:15:41 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@varda ~]# uptime
 23:31:36 up 0 min,  1 user,  load average: 0.20, 0.06, 0.02
[root@varda ~]# systemd-analyze blame | head
          3.216s initrd-switch-root.service
          3.083s cloud-init-local.service
          2.896s network.service
          1.956s cloud-init.service
          1.247s rc-local.service
          1.184s iptables.service
          1.127s cloud-config.service
          1.006s systemd-journal-flush.service
           878ms dev-sda1.device
           585ms cloud-final.service

Comment 65 Maciej Żenczykowski 2018-09-16 23:33:44 UTC

Also:
[root@varda ~]# dmesg | egrep -i rng
[    0.000000] random: get_random_bytes called from start_kernel+0x93/0x558 with crng_init=0
[    0.045220] random: crng done (trusting CPU's manufacturer)

(same on other 2 VMs as well, and, yeah I uninstalled my hacked up init script first)

Comment 66 Fedora Update System 2018-09-17 15:04:39 UTC

kernel-headers-4.18.8-200.fc28 kernel-4.18.8-200.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-f93103ae20

Comment 67 Fedora Update System 2018-09-17 15:05:58 UTC

kernel-headers-4.18.8-100.fc27 kernel-4.18.8-100.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-3b2c23b946

Comment 68 Fedora Update System 2018-09-17 18:28:30 UTC

kernel-4.18.8-100.fc27, kernel-headers-4.18.8-100.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-3b2c23b946

Comment 69 Maciej Żenczykowski 2018-09-17 18:46:12 UTC

(possibly worth noting: https://lkml.org/lkml/2018/9/6/1003 referenced in comment #53 was merged into scsi maintainer's tree: https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/log/?h=4.19/scsi-fixes )

Comment 70 Fedora Update System 2018-09-17 19:25:06 UTC

kernel-4.18.8-200.fc28, kernel-headers-4.18.8-200.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-f93103ae20

Comment 71 Fedora Update System 2018-09-18 22:01:22 UTC

kernel-headers-4.18.8-300.fc29 kernel-4.18.8-300.fc29 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2018-85f2e498e7

Comment 72 Fedora Update System 2018-09-20 15:56:49 UTC

kernel-headers-4.18.9-100.fc27 kernel-4.18.9-100.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-d77cc41f35

Comment 73 Fedora Update System 2018-09-20 16:17:16 UTC

kernel-4.18.8-300.fc29, kernel-headers-4.18.8-300.fc29 has been pushed to the Fedora 29 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-85f2e498e7

Comment 74 Fedora Update System 2018-09-20 19:13:22 UTC

kernel-4.18.8-200.fc28, kernel-headers-4.18.8-200.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 75 Maciej Żenczykowski 2018-09-20 19:54:56 UTC

For the curious:

The current kernel on a VM with virtio-rng (with default kcmdline, so effectively random.trust_cpu=on):

Linux varda 4.18.8-200.fc28.x86_64 #1 SMP Sun Sep 16 18:15:41 UTC 2018
x86_64 x86_64 x86_64 GNU/Linux

# dmesg | egrep -i 'virtio|rng|random'
[    0.000000] random: get_random_bytes called from
start_kernel+0x93/0x558 with crng_init=0
[    0.040975] random: crng done (trusting CPU's manufacturer)
[    0.870831] virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy driver
[    0.879379] virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
[    0.890812] virtio-pci 0000:00:05.0: virtio_pci: leaving for legacy driver

and when kexec'ed into same kernel but with 'random.trust_cpu=off' boot time is still quick and we see:

# dmesg | egrep -i 'virtio|rng|random'
[    0.000000] Command line:
BOOT_IMAGE=/boot/vmlinuz-4.18.8-200.fc28.x86_64
root=UUID=9be70055-35f2-4a57-b120-5a003dfdb504 ro no_timer_check
console=tty1 console=ttyS0,115200n8 console=ttyS1 LANG=en_US.UTF-8
initrd=/boot/initramfs-4.18.8-200.fc28.x86_64.img kexeced
random.trust_cpu=off
[    0.000000] random: get_random_bytes called from
start_kernel+0x93/0x558 with crng_init=0
[    0.000000] Kernel command line:
BOOT_IMAGE=/boot/vmlinuz-4.18.8-200.fc28.x86_64
root=UUID=9be70055-35f2-4a57-b120-5a003dfdb504 ro no_timer_check
console=tty1 console=ttyS0,115200n8 console=ttyS1 LANG=en_US.UTF-8
initrd=/boot/initramfs-4.18.8-200.fc28.x86_64.img kexeced
random.trust_cpu=off
[    0.878502] virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy driver
[    0.887885] virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
[    0.901269] virtio-pci 0000:00:05.0: virtio_pci: leaving for legacy driver
[    1.058585] random: fast init done
[    1.059770] random: crng init done

and we can indeed see we no longer trust cpu manufacturer and now are
only initting *after* virtio-rng device is found and initialized.

(ie. everything is working as intended)

Comment 76 Fedora Update System 2018-09-21 08:33:19 UTC

kernel-4.18.9-100.fc27, kernel-headers-4.18.9-100.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-d77cc41f35

Comment 77 Fedora Update System 2018-09-26 20:17:21 UTC

kernel-4.18.9-100.fc27, kernel-headers-4.18.9-100.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.

Comment 78 Fedora Update System 2018-09-26 20:21:14 UTC

kernel-4.18.9-300.fc29, kernel-headers-4.18.9-300.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 79 Maciej Żenczykowski 2019-01-26 00:50:15 UTC

GCE (Google Cloud Engine) now supports project/instance metadata 'google-compute-enable-virtio-rng=true' to get a virtio-rng device.

(you can also enable project-wide with =true and disable on specific instances with =false, it'll probably take a VM restart for the change to go live)

AFAICT it should be perfectly safe to enable on Linux VMs, but I'm not clear on the driver impact for other OS's (like Windows).



# cat /etc/fedora-release; uname -a; lspci

Fedora release 29 (Twenty Nine)

Linux zoom 4.20.3-200.fc29.x86_64 #1 SMP Thu Jan 17 15:19:35 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 03)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:03.0 Non-VGA unclassified device: Red Hat, Inc. Virtio SCSI
00:04.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:05.0 Unclassified device [00ff]: Red Hat, Inc. Virtio RNG

(that final device is new)

---

For anyone curious I also spent a fair bit of time fixing Android networking tests (which run in UML or QEMU) with recent kernels.
You can see the sequence of fixes at https://android.googlesource.com/kernel/tests/+log/42f963407a4c6e6630f0e72b1a8eae741fbb3eef and earlier - it took a little bit of finangling to get things working reliably.