Bug 1781902

Summary: rhcos-4.2.0-x86_64-vmware.ova boot sequence hangs on kernel: random
Product: OpenShift Container Platform Reporter: Sara Ferguson <sferguso>
Component: RHCOSAssignee: Ben Howard <behoward>
Status: CLOSED DUPLICATE QA Contact: Michael Nguyen <mnguyen>
Severity: low Docs Contact:
Priority: medium    
Version: 4.2.0CC: aconway, bbreard, dgowran, dornelas, dustymabe, imcleod, jligon, nstielau, rgregory, smilner, syangsao, walters
Target Milestone: ---Keywords: Reopened
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-05-04 15:34:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1186913    

Comment 7 Alan Conway 2019-12-27 18:51:13 UTC
I am seeing something similar on a bare-metal cluster installed with rhcos-4.2.0-x86_64-metal-bios.raw.gz

My worker and bootstrap nodes are all booting up OK, but my master node boots very slowly with many 
"systemd: uninitialized urandom read (16 bytes read)" warnings and finally gets stuck at "random: crng.init done"
and hangs there for > 1 hour.

Comment 8 Micah Abbott 2020-02-26 19:55:10 UTC
Since RHCOS uses the same kernel as RHEL8, I'd be interested to see if folks are seeing the same problems with their infrastructure when installing RHEL8.

There isn't much RHCOS can do to provide additional entropy other than what has already been suggested through the use of virtio-rng[0]

If this issue persists and is causing problems, please re-open.


[0] https://wiki.openstack.org/wiki/LibvirtVirtioRng

Comment 9 Sam Yangsao 2020-02-28 21:23:42 UTC
Re-opening.  This is affecting multiple VMware installs.

Initial installation of RHCOS on vSphere takes up to ~15 minutes, before rebooting and kicking off the installation.  We need a fix around this, either we add virtio-rng to the RHCOS image and enable it or some other method to get around this - this installer should just run right through this versus hanging on this.  

Let me know and I can also test in our vSphere lab as well.

Thanks!

Comment 10 Colin Walters 2020-02-28 23:20:36 UTC
Much more extensive discussion here https://github.com/openshift/machine-config-operator/issues/854
Note in particular
https://github.com/openshift/machine-config-operator/issues/854#issuecomment-549858053
Direct link
https://lwn.net/Articles/800509/

Which would greatly help this, though we want better entropy by default than jitter.

Comment 12 Colin Walters 2020-04-20 15:04:58 UTC
Getting the kernel jitter entropy in should fix this:
https://bugzilla.redhat.com/show_bug.cgi?id=1778762

We still should try to figure out how to get entropy from the hypervisor on VMWare, but that fallback will address the core pain.

Comment 13 Declan Gowran 2020-04-23 08:28:02 UTC
Seeing similar random messages in aws when autoscaling and rhel7

Please enter passphrase for disk nvme1n1--VolGroup-nvme1n1--LogVol (nvme1n1-CryptVol) on /AppMount!:[   15.940869] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
[   16.265315] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
[   14.582571] systemd-fsck[822]: /dev/nvme0n1p1: clean, 298/32768 files, 36993/131072 blocks
[   17.220842] type=1305 audit(1587568330.921:3): audit_pid=898 old=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1
[   38.129844] random: crng init done

Comment 15 Colin Walters 2020-05-04 15:34:46 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1778762
and 
https://bugzilla.redhat.com/show_bug.cgi?id=1830280
are the fixes for this.

*** This bug has been marked as a duplicate of bug 1778762 ***