Probably from historical reasons engine was using SHA1PRNG implementation for random number generation. SHA1PRNG is pseudo-random number generator implemented within JVM (more details at [1]) and it's blocking implementation (it can block the entire process if there is not enough entropy), which could cause issues when engine is running inside a VM. We should switch to NativePRNG, which is blocking only for seed generation. More information about Java implementation is available at [1] and [2]. [1] https://docs.oracle.com/javase/8/docs/technotes/guides/security/StandardNames.html#SecureRandom [2] https://tersesystems.com/blog/2015/12/17/the-right-way-to-use-securerandom/
adding CodeChange moving to VERIFIED
Hi Daniel, Meital, I just had a conversation with Martin that this indeed requires verification performance-wise on hosted-engine installation (should make it much faster), can you take a look? Moving back to ON_QA.
Martin: We would need verification instructions what to check while on Hosted engine? Lukas, Metal - We don't have HE envs at scale&perf team. If Performance monitoring is needed you could provide us with such env we could monitor it
(In reply to Daniel Gur from comment #3) > Martin: We would need verification instructions what to check while on > Hosted engine? > > Lukas, Metal - We don't have HE envs at scale&perf team. > If Performance monitoring is needed you could provide us with such env we > could monitor it BZ1421472 is the original bug, I have found this issue when investigating it. So I think you can use the same reproducing steps to verify it.
Meital, This is related to BZ1421472 is the original bug found by Nikolai from your team - could he please handle the validation?
Please provide exact reproduction steps for the verification.
(In reply to Nikolai Sednev from comment #6) > Please provide exact reproduction steps for the verification. Once again, I don't have exact reproduction steps. As a part of BZ1421472 it was claimed that engine installation takes too much time if there is not enough entropy. Fixes for BZ1540909 and BZ1540907 changes the engine/aaa-jdbc behavior to not block the whole process until entropy is high enough, so we should see significantly faster installation
(In reply to Martin Perina from comment #7) > (In reply to Nikolai Sednev from comment #6) > > Please provide exact reproduction steps for the verification. > > Once again, I don't have exact reproduction steps. As a part of BZ1421472 it > was claimed that engine installation takes too much time if there is not > enough entropy. Fixes for BZ1540909 and BZ1540907 changes the > engine/aaa-jdbc behavior to not block the whole process until entropy is > high enough, so we should see significantly faster installation Now deployment is much different, now its done over ansible. How these fixes should be verified? Ansible or vintage? Steps were as follows: 1.Deploy HE on a single host, over NFS and add 2 NFS data storage domains to it. 2.Get HE storage_domain auto-imported. 3.Check on host "cat /run/ovirt-hosted-engine-ha/vm.conf | grep random devices={device:virtio,specParams:{source:random},model:virtio,type:rng}" 4.Run on host "time sh -xc 'while true; do dd if=/dev/random of=/dev/null bs=1 count=5; cat /proc/sys/kernel/random/entropy_avail; sleep 1; done' " and after several minutes exit using "ctrl+c" exit sequence. Problem was in that "time sh -xc 'while true; do dd if=/dev/random of=/dev/null bs=1 count=5; cat /proc/sys/kernel/random/entropy_avail; sleep 1; done' " reported too small entropy values than expected. What current fix does? Should the entropy values get raised?
(In reply to Nikolai Sednev from comment #8) > (In reply to Martin Perina from comment #7) > > (In reply to Nikolai Sednev from comment #6) > > > Please provide exact reproduction steps for the verification. > > > > Once again, I don't have exact reproduction steps. As a part of BZ1421472 it > > was claimed that engine installation takes too much time if there is not > > enough entropy. Fixes for BZ1540909 and BZ1540907 changes the > > engine/aaa-jdbc behavior to not block the whole process until entropy is > > high enough, so we should see significantly faster installation > > Now deployment is much different, now its done over ansible. > How these fixes should be verified? > Ansible or vintage? I have not idea how those type differs, but I assume they both execute engine-setup. If so, then it doesn't matter which one you choose. > Steps were as follows: > 1.Deploy HE on a single host, over NFS and add 2 NFS data storage domains to > it. > 2.Get HE storage_domain auto-imported. > 3.Check on host "cat /run/ovirt-hosted-engine-ha/vm.conf | grep random > devices={device:virtio,specParams:{source:random},model:virtio,type:rng}" > 4.Run on host "time sh -xc 'while true; do dd if=/dev/random of=/dev/null > bs=1 count=5; cat /proc/sys/kernel/random/entropy_avail; sleep 1; done' " > and after several minutes exit using "ctrl+c" exit sequence. > > Problem was in that "time sh -xc 'while true; do dd if=/dev/random > of=/dev/null bs=1 count=5; cat /proc/sys/kernel/random/entropy_avail; sleep > 1; done' " reported too small entropy values than expected. > > What current fix does? Should the entropy values get raised? Fixes for BZ1540909 and BZ1540907 cause, that process (either engine or aaa-jdbc) doesn't block if there is not enough entropy when we try to get random numbers for encryption purposes. So if engine host (standalone engine) or VM (hosted engine) doesn't have enough entropy, the whole engine-setup execution shouldn't take more than 10 minutes as reported.
I still don't understand the verification steps.Sorry. Please define exact verification steps. If SHE deployment in less than 10 minutes is the pass criteria for verification, then it might be heavily influenced by ansible deployment, thus making it impossible to differentiate on pass/fail criteria.
Try to install 4.2.1 HE with low entropy and measure time of engine installation. Repeat the same with 4.2.2 (please try to have more or less the same entropy value on installation). Expected results: 1. Installation should fail with any encryption error (MUST) 2. Installation should be faster (NICE TO HAVE)
I believe step 1 should read: should _not_ fail with any encryption error. Step 2 is basically: Run the setup inside a VM with no hwrng device and keep draining the entropy counter during the setup by constantly reading from /dev/random.
I see that there is ovirt-engine-extension-aaa-jdbc-1.1.6-1.el7ev.noarch on engine, which came from rhvm-appliance-4.2-20180202.0.el7.noarch. Can you verify that this is compatible package for the verication? Looks much better now on these components: ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.13-1.el7ev.noarch rhvm-appliance-4.2-20180202.0.el7.noarch Linux 3.10.0-861.el7.x86_64 #1 SMP Wed Mar 14 10:21:01 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo) On engine: nsednev-he-1 ~]# time sh -xc 'while true; do dd if=/dev/random of=/dev/null bs=1 count=5; cat /proc/sys/kernel/random/entropy_avail; sleep 1; done' + true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.000299757 s, 16.7 kB/s + cat /proc/sys/kernel/random/entropy_avail 1159 + sleep 1 + true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.000292873 s, 17.1 kB/s + cat /proc/sys/kernel/random/entropy_avail 1159 + sleep 1 ^C real 0m8.656s user 0m0.021s sys 0m0.047s ovirt-engine-extension-aaa-jdbc-1.1.6-1.el7ev.noarch On ha-host: [root@alma03 ~]# cat /run/ovirt-hosted-engine-ha/vm.conf | grep random devices={alias:rng0,specParams:{source:urandom},deviceId:b8e60f7a-46f5-491d-a140-48b13ded816f,address:{type:pci,slot:0x09,bus:0x00,domain:0x0000,function:0x0},device:virtio,model:virtio,type:rng} [root@alma03 ~]# time sh -xc 'while true; do dd if=/dev/random of=/dev/null bs=1 count=5; cat /proc/sys/kernel/random/entropy_avail; sleep 1; done' + true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.000200491 s, 24.9 kB/s + cat /proc/sys/kernel/random/entropy_avail 3466 ^C real 0m20.512s user 0m0.024s sys 0m0.071s
(In reply to Nikolai Sednev from comment #13) > I see that there is ovirt-engine-extension-aaa-jdbc-1.1.6-1.el7ev.noarch on > engine, which came from rhvm-appliance-4.2-20180202.0.el7.noarch. Can you > verify that this is compatible package for the verication? The fix should be tested with ovirt-engine-extension-aaa-jdbc-1.1.7 as mentioned in Fix in version field.
Hi Ryan, I see that on our latest rhvm-appliance-4.2-20180202.0.el7.noarch is still ovirt-engine-extension-aaa-jdbc-1.1.6-1.el7ev.noarch. When can we get latest required ovirt-engine-extension-aaa-jdbc-1.1.7 with the appliance?
Hey Nikolai - This will automatically happen as part of the image build this week
New build still did not updated the appliance: alma03 ~]# rpm -qa | grep appliance rhvm-appliance-4.2-20180202.0.el7.noarch [root@alma03 ~]# yum list | grep appliance rhvm-appliance.noarch 2:4.2-20180202.0.el7 @rhv-4.2.2 rhevm-appliance.noarch 20161214.0-1.el7ev rhv-4.2.2 I'm running with these components now: ovirt-hosted-engine-setup-2.2.15-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.9-1.el7ev.noarch rhvm-appliance-4.2-20180202.0.el7.noarch
Works fine on these components on hosts: ovirt-hosted-engine-setup-2.2.18-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch rhvm-appliance-4.2-20180420.0.el7.noarch Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo) On engine: ovirt-engine-extension-aaa-jdbc-1.1.7-1.el7ev.noarch ovirt-engine-setup-4.2.3.2-0.1.el7.noarchLinux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo) Moving to verified. If you still see any issue with this, please reopen. Results from the engine: nsednev-he-1 ~]# time sh -xc 'while true; do dd if=/dev/random of=/dev/null bs=1 count=5; cat /proc/sys/kernel/random/entropy_avail; sleep 1; done' + true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.000297491 s, 16.8 kB/s + cat /proc/sys/kernel/random/entropy_avail 1331 + sleep 1 + true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.000309364 s, 16.2 kB/s + cat /proc/sys/kernel/random/entropy_avail 1331 + sleep 1 + true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.000307854 s, 16.2 kB/s + cat /proc/sys/kernel/random/entropy_avail 1268 + sleep 1 + true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.000288398 s, 17.3 kB/s + cat /proc/sys/kernel/random/entropy_avail 1269 + sleep 1 ^C+ true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.000293675 s, 17.0 kB/s + cat /proc/sys/kernel/random/entropy_avail 1270 + sleep 1 + true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.000295667 s, 16.9 kB/s + cat /proc/sys/kernel/random/entropy_avail 1271 + sleep 1 + true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.000309655 s, 16.1 kB/s + cat /proc/sys/kernel/random/entropy_avail 1271 + sleep 1 + true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.000307519 s, 16.3 kB/s + cat /proc/sys/kernel/random/entropy_avail 1271 + sleep 1 ^C real 0m7.814s user 0m0.013s sys 0m0.034s Results from host: alma04 ~]# time sh -xc 'while true; do dd if=/dev/random of=/dev/null bs=1 count=5; cat /proc/sys/kernel/random/entropy_avail; sleep 1; done' + true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.000198377 s, 25.2 kB/s + cat /proc/sys/kernel/random/entropy_avail 3292 + sleep 1 + true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.000204135 s, 24.5 kB/s + cat /proc/sys/kernel/random/entropy_avail 3228 + sleep 1 + true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.000196619 s, 25.4 kB/s + cat /proc/sys/kernel/random/entropy_avail 3229 + sleep 1 + true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.00020499 s, 24.4 kB/s + cat /proc/sys/kernel/random/entropy_avail 3229 + sleep 1 + true + dd if=/dev/random of=/dev/null bs=1 count=5 5+0 records in 5+0 records out 5 bytes (5 B) copied, 0.000204448 s, 24.5 kB/s + cat /proc/sys/kernel/random/entropy_avail 3231 + sleep 1 ^C real 0m4.289s user 0m0.008s sys 0m0.020s
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.