Created attachment 1904905 [details] issue log files Description of problem: Hosted engine deploy failed since "Task Configure OVN for oVirt failed to execute." engine.log --------------------------------- 2022-08-11 15:38:34,198+08 ERROR [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (EE-ManagedThreadFactory-engine-Thread-1) [02593842-eb65-430c-ba41-30a1bdd94537] Exception: Task Configure OVN for oVirt failed to execute. Please check logs for more details: /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20220811153411-hp-dl388g9-04.lab.eng.pek2.redhat.com-02593842-eb65-430c-ba41-30a1bdd94537.log 2022-08-11 15:38:34,203+08 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [02593842-eb65-430c-ba41-30a1bdd94537] Host installation failed for host '527d0666-73e5-4ca2-a9cb-de7a469ab29d', 'hp-dl388g9-04.lab.eng.pek2.redhat.com': Task Configure OVN for oVirt failed to execute. Please check logs for more details: /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20220811153411-hp-dl388g9-04.lab.eng.pek2.redhat.com-02593842-eb65-430c-ba41-30a1bdd94537.log host-deploy.log ------------------------------- 2022-08-11 15:36:49 CST - TASK [ovirt-provider-ovn-driver : Configure OVN for oVirt] ********************* 2022-08-11 15:38:34 CST - { "uuid" : "63d8bcf8-43be-4d2c-9617-11c79740d9a7", "counter" : 406, "stdout" : "fatal: [hp-dl388g9-04.lab.eng.pek2.redhat.com]: FAILED! => {\"changed\": true, \"cmd\": [\"vdsm-tool\", \"ovn-config\", \"192.168.222.176\", \"10.73.73.105\", \"hp-dl388g9-04.lab.eng.pek2.redhat.com\"], \"delta\": \"0:01:44.103965\", \"end\": \"2022-08-11 15:38:32.513523\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2022-08-11 15:36:48.409558\", \"stderr\": \"Created symlink /etc/systemd/system/multi-user.target.wants/openvswitch-ipsec.service → /usr/lib/systemd/system/openvswitch-ipsec.service.\\nJob for openvswitch-ipsec.service failed because a timeout was exceeded.\\nSee \\\"systemctl status openvswitch-ipsec.service\\\" and \\\"journalctl -xe\\\" for details.\\nTraceback (most recent call last):\\n File \\\"/usr/bin/vdsm-tool\\\", line 209, in main\\n return tool_command[cmd][\\\"command\\\"](*args)\\n File \\\"/usr/lib/python3.6/site-packages/vdsm/tool/ovn_config.py\\\", line 75, in ovn_config\\n exec_command(cmd, 'Failed to configure OVN controller.')\\n File \\\"/usr/lib/python3.6/site-packages/vdsm/tool/ovn_config.py\\\", line 141, in exec_command\\n raise EnvironmentError(error_msg)\\nOSError: Failed to configure OVN controller.\", \"stderr_lines\": [\"Created symlink /etc/systemd/system/multi-user.target.wants/openvswitch-ipsec.service → /usr/lib/systemd/system/openvswitch-ipsec.service.\", \"Job for openvswitch-ipsec.service failed because a timeout was exceeded.\", \"See \\\"systemctl status openvswitch-ipsec.service\\\" and \\\"journalctl -xe\\\" for details.\", \"Traceback (most recent call last):\", \" File \\\"/usr/bin/vdsm-tool\\\", line 209, in main\", \" return tool_command[cmd][\\\"command\\\"](*args)\", \" File \\\"/usr/lib/python3.6/site-packages/vdsm/tool/ovn_config.py\\\", line 75, in ovn_config\", \" exec_command(cmd, 'Failed to configure OVN controller.')\", \" File \\\"/usr/lib/python3.6/site-packages/vdsm/tool/ovn_config.py\\\", line 141, in exec_command\", \" raise EnvironmentError(error_msg)\", \"OSError: Failed to configure OVN controller.\"], \"stdout\": \"\", \"stdout_lines\": []}", Version-Release number of selected component (if applicable): RHVH-4.5-20220809.0-RHVH-x86_64-dvd1.iso rhvm-appliance-4.5-20220529.0.el8ev.x86_64 ovirt-hosted-engine-setup-2.6.5-1.el8ev.noarch ovirt-hosted-engine-ha-2.5.0-1.el8ev.noarch How reproducible: 100% Steps to Reproduce: 1. Clean install RHVH-4.5-20220809.0-RHVH-x86_64-dvd1.iso 2. Deploy hosted engine 3. Actual results: hosted-engine deploy failed since "Failed to configure OVN controller" Expected results: hosted-engine deploy successful Additional info:
Also appliance is super old, have you updated engine packages before proceeding with engine installation during HE installation?
In reply to comment #5 We have reproduced this issue with RHEL and RHV-H hosts and recent appliance (20220603.1.el8ev).
(In reply to Martin Perina from comment #5) > Also appliance is super old, have you updated engine packages before > proceeding with engine installation during HE installation? This is the latest appliance QE can use. Confirmed with Dev.
(In reply to Wei Wang from comment #11) > (In reply to Martin Perina from comment #5) > > Also appliance is super old, have you updated engine packages before > > proceeding with engine installation during HE installation? > > This is the latest appliance QE can use. Confirmed with Dev. I know, but have you updated packages included within appliance image before running engine-setup? 1. Run hosted-engine --deploy --ansible-extra-vars=he_pause_before_engine_setup=true 2. When installation is paused perform following: a. Connect to engine VM b. Setup latest RHV 4.5.2-5 build repositories c. Update engine-setup packages 3. Continue installation Because appliance from end of May contains really old engine packages.
Found the problem: openvswitch-ipsec.service during startup is depending on output from "certutil -L -d sql:/etc/ipsec.d/" (/usr/share/openvswitch/scripts/ovs-monitor-ipsec: line 674) which is very slow and most probably doesn't work properly with a kernel-4.18.0-372.23.1.el8_6.x86_64 so the start of the service will timeout on that. With older kernel kernel-4.18.0-372.19.1.el8_6.x86_64 (tested) it works properly Checked strace with the ".23.1.el8_6" kernel and it seems that certutil is very slowed on getrandom() syscall [pid 431392] getrandom("\x97\x05\x50\xbf\x95\xf9\x70\x4f\x6b\x0d\x46\x53", 32, GRND_RANDOM) = 12 [pid 431392] getrandom("\xbe\x8e\x74\xed\x72\x7c", 20, GRND_RANDOM) = 6 [pid 431392] getrandom("\x54\xaa\x58\x75\xd4\x68", 14, GRND_RANDOM) = 6 [pid 431392] getrandom("\x20\x32\xe0\xa0\x58\x16", 8, GRND_RANDOM) = 6 [pid 431392] getrandom("\xfa\x40", 2, GRND_RANDOM) = 2 [pid 431392] getrandom("\x29\x7b\x65\x1a", 32, GRND_RANDOM) = 4 [pid 431392] getrandom("\x0b\xd9\x50\x81\x58\x31", 28, GRND_RANDOM) = 6 [pid 431392] getrandom("\xee\x87\x0c\xd7\x68\x7d", 22, GRND_RANDOM) = 6 [pid 431392] getrandom("\x61\x5d\x6e\x75\xdc\xef", 16, GRND_RANDOM) = 6 [pid 431392] getrandom("\xf2\x2b\xb2\x08\x69\xb9", 10, GRND_RANDOM) = 6 [pid 431392] getrandom("\x5e\xff\x41\xd2", 4, GRND_RANDOM) = 4 [pid 431392] getrandom("\x8a\xb1", 32, GRND_RANDOM) = 2 .... (ommitted hundreds of lines) .... every line is printed after approx 5 second instead of "immediately" on ".19.1.el8_6" Conclusion: It looks like a bug in kernel as everything works in kernel-4.18.0-372.19.1.el8_6.x86_64 The issue is reproducible in a newer kernel-4.18.0-372.23.1.el8_6.x86_64 Downgrading the kernel will fix the issue
can you try on a different hardware? And on the same hardware with bare RHEL? just check the available entropy in /proc/sys/kernel/random/entropy_avail
possibly fixed in nss 3.79.0-11 * Thu Aug 11 2022 Bob Relyea <rrelyea> - 3.79.0-11 - Fix QA found failures: - remove extra '+' from sslpolicy.txt file causing test error values - only use GRND_RANDOM if the kernel is in FIPS mode. please confirm
(In reply to Michal Skrivanek from comment #15) changelog entry is wrong, fixed version is nss 3.79.0-10
(In reply to Michal Skrivanek from comment #14) > can you try on a different hardware? And on the same hardware with bare RHEL? > just check the available entropy in /proc/sys/kernel/random/entropy_avail Test with RHEL8.6, the bug can be reproduced. Test steps: 1. Clean install rhel8.6 2. According to http://ci-web.eng.lab.tlv.redhat.com/docs/master/Guide/install_guide/index.html, deploy hosted engine Result: Hosted engine deploy failed as the same error. More info: [root@hp-dlxxgx-xx ~]# cat /proc/sys/kernel/random/entropy_avail 21
(In reply to Wei Wang from comment #17) > (In reply to Michal Skrivanek from comment #14) > > can you try on a different hardware? And on the same hardware with bare RHEL? > > just check the available entropy in /proc/sys/kernel/random/entropy_avail > > Test with RHEL8.6, the bug can be reproduced. > > Test steps: > 1. Clean install rhel8.6 > 2. According to > http://ci-web.eng.lab.tlv.redhat.com/docs/master/Guide/install_guide/index. > html, deploy hosted engine > > Result: > Hosted engine deploy failed as the same error. > > > More info: > [root@hp-dlxxgx-xx ~]# cat /proc/sys/kernel/random/entropy_avail > 21 Available entropy is still very very low, not sure why, but probably you will need to install rngd daemon to increase it, otherwise pretty much no cryptography function can be used: https://access.redhat.com/articles/1314933
(In reply to Martin Perina from comment #18) > (In reply to Wei Wang from comment #17) > > (In reply to Michal Skrivanek from comment #14) > > > can you try on a different hardware? And on the same hardware with bare RHEL? > > > just check the available entropy in /proc/sys/kernel/random/entropy_avail > > > > Test with RHEL8.6, the bug can be reproduced. > > > > Test steps: > > 1. Clean install rhel8.6 > > 2. According to > > http://ci-web.eng.lab.tlv.redhat.com/docs/master/Guide/install_guide/index. > > html, deploy hosted engine > > > > Result: > > Hosted engine deploy failed as the same error. > > > > > > More info: > > [root@hp-dlxxgx-xx ~]# cat /proc/sys/kernel/random/entropy_avail > > 21 > > Available entropy is still very very low, not sure why, but probably you > will need to install rngd daemon to increase it, otherwise pretty much no > cryptography function can be used: > > https://access.redhat.com/articles/1314933 Hi Martin, Thanks for your help. After using this daemon, the available entropy is increased. [root@hp-dlxxxgx-xx ~]# cat /proc/sys/kernel/random/entropy_avail 3801 And deployment of hosted engine is successful with the increased entropy.
we believe the fix happened in nss (comment #16), which is now included in nightly RHVH build redhat-virtualization-host-4.5.2-202208170132_8.6
Test With RHVH-4.5-20220817.0-RHVH-x86_64-dvd1.iso, the bug is fixed. Hosted engine deploy successful.
According to comment 21, move it to "VERIFIED"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: RHV RHEL Host (ovirt-host) [ovirt-4.5.2] security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6392
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days