Bug 2117558 - hosted-engine deploy failed since "Failed to configure OVN controller"
Summary: hosted-engine deploy failed since "Failed to configure OVN controller"
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-setup
Version: 4.5.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.5.2
: ---
Assignee: Asaf Rachmani
QA Contact: Wei Wang
URL:
Whiteboard:
Depends On: 2118334
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-08-11 10:13 UTC by Wei Wang
Modified: 2023-09-15 01:57 UTC (History)
19 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
If this bug requires documentation, please select an appropriate Doc Type value.
Clone Of:
: 2118334 (view as bug list)
Environment:
Last Closed: 2022-09-08 11:26:41 UTC
oVirt Team: Integration
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
issue log files (2.62 MB, application/gzip)
2022-08-11 10:13 UTC, Wei Wang
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-47822 0 None None None 2022-08-11 10:24:12 UTC
Red Hat Product Errata RHSA-2022:6392 0 None None None 2022-09-08 11:27:09 UTC

Description Wei Wang 2022-08-11 10:13:41 UTC
Created attachment 1904905 [details]
issue log files

Description of problem:
Hosted engine deploy failed since "Task Configure OVN for oVirt failed to execute."

engine.log
---------------------------------
2022-08-11 15:38:34,198+08 ERROR [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (EE-ManagedThreadFactory-engine-Thread-1) [02593842-eb65-430c-ba41-30a1bdd94537] Exception: Task Configure OVN for oVirt failed to execute. Please check logs for more details: /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20220811153411-hp-dl388g9-04.lab.eng.pek2.redhat.com-02593842-eb65-430c-ba41-30a1bdd94537.log
2022-08-11 15:38:34,203+08 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [02593842-eb65-430c-ba41-30a1bdd94537] Host installation failed for host '527d0666-73e5-4ca2-a9cb-de7a469ab29d', 'hp-dl388g9-04.lab.eng.pek2.redhat.com': Task Configure OVN for oVirt failed to execute. Please check logs for more details: /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20220811153411-hp-dl388g9-04.lab.eng.pek2.redhat.com-02593842-eb65-430c-ba41-30a1bdd94537.log

host-deploy.log
-------------------------------
2022-08-11 15:36:49 CST - TASK [ovirt-provider-ovn-driver : Configure OVN for oVirt] *********************
2022-08-11 15:38:34 CST - {
  "uuid" : "63d8bcf8-43be-4d2c-9617-11c79740d9a7",
  "counter" : 406,
  "stdout" : "fatal: [hp-dl388g9-04.lab.eng.pek2.redhat.com]: FAILED! => {\"changed\": true, \"cmd\": [\"vdsm-tool\", \"ovn-config\", \"192.168.222.176\", \"10.73.73.105\", \"hp-dl388g9-04.lab.eng.pek2.redhat.com\"], \"delta\": \"0:01:44.103965\", \"end\": \"2022-08-11 15:38:32.513523\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2022-08-11 15:36:48.409558\", \"stderr\": \"Created symlink /etc/systemd/system/multi-user.target.wants/openvswitch-ipsec.service → /usr/lib/systemd/system/openvswitch-ipsec.service.\\nJob for openvswitch-ipsec.service failed because a timeout was exceeded.\\nSee \\\"systemctl status openvswitch-ipsec.service\\\" and \\\"journalctl -xe\\\" for details.\\nTraceback (most recent call last):\\n  File \\\"/usr/bin/vdsm-tool\\\", line 209, in main\\n    return tool_command[cmd][\\\"command\\\"](*args)\\n  File \\\"/usr/lib/python3.6/site-packages/vdsm/tool/ovn_config.py\\\", line 75, in ovn_config\\n    exec_command(cmd, 'Failed to configure OVN controller.')\\n  File \\\"/usr/lib/python3.6/site-packages/vdsm/tool/ovn_config.py\\\", line 141, in exec_command\\n    raise EnvironmentError(error_msg)\\nOSError: Failed to configure OVN controller.\", \"stderr_lines\": [\"Created symlink /etc/systemd/system/multi-user.target.wants/openvswitch-ipsec.service → /usr/lib/systemd/system/openvswitch-ipsec.service.\", \"Job for openvswitch-ipsec.service failed because a timeout was exceeded.\", \"See \\\"systemctl status openvswitch-ipsec.service\\\" and \\\"journalctl -xe\\\" for details.\", \"Traceback (most recent call last):\", \"  File \\\"/usr/bin/vdsm-tool\\\", line 209, in main\", \"    return tool_command[cmd][\\\"command\\\"](*args)\", \"  File \\\"/usr/lib/python3.6/site-packages/vdsm/tool/ovn_config.py\\\", line 75, in ovn_config\", \"    exec_command(cmd, 'Failed to configure OVN controller.')\", \"  File \\\"/usr/lib/python3.6/site-packages/vdsm/tool/ovn_config.py\\\", line 141, in exec_command\", \"    raise EnvironmentError(error_msg)\", \"OSError: Failed to configure OVN controller.\"], \"stdout\": \"\", \"stdout_lines\": []}", 


Version-Release number of selected component (if applicable):
RHVH-4.5-20220809.0-RHVH-x86_64-dvd1.iso
rhvm-appliance-4.5-20220529.0.el8ev.x86_64
ovirt-hosted-engine-setup-2.6.5-1.el8ev.noarch
ovirt-hosted-engine-ha-2.5.0-1.el8ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Clean install RHVH-4.5-20220809.0-RHVH-x86_64-dvd1.iso
2. Deploy hosted engine
3.

Actual results:
hosted-engine deploy failed since "Failed to configure OVN controller"

Expected results:
hosted-engine deploy successful

Additional info:

Comment 5 Martin Perina 2022-08-11 13:43:28 UTC
Also appliance is super old, have you updated engine packages before proceeding with engine installation during HE installation?

Comment 9 Jiri Macku 2022-08-14 21:09:01 UTC
In reply to comment #5
We have reproduced this issue with RHEL and RHV-H hosts and recent appliance (20220603.1.el8ev).

Comment 11 Wei Wang 2022-08-15 01:17:36 UTC
(In reply to Martin Perina from comment #5)
> Also appliance is super old, have you updated engine packages before
> proceeding with engine installation during HE installation?

This is the  latest appliance QE can use. Confirmed with Dev.

Comment 12 Martin Perina 2022-08-15 08:39:48 UTC
(In reply to Wei Wang from comment #11)
> (In reply to Martin Perina from comment #5)
> > Also appliance is super old, have you updated engine packages before
> > proceeding with engine installation during HE installation?
> 
> This is the  latest appliance QE can use. Confirmed with Dev.

I know, but have you updated packages included within appliance image before running engine-setup?

1. Run hosted-engine --deploy --ansible-extra-vars=he_pause_before_engine_setup=true
2. When installation is paused perform following:
    a. Connect to engine VM
    b. Setup latest RHV 4.5.2-5 build repositories
    c. Update engine-setup packages
3. Continue installation

Because appliance from end of May contains really old engine packages.

Comment 13 Petr Kubica 2022-08-15 13:58:20 UTC
Found the problem:

openvswitch-ipsec.service during startup is depending on output from "certutil -L -d sql:/etc/ipsec.d/"
(/usr/share/openvswitch/scripts/ovs-monitor-ipsec: line 674)
which is very slow and most probably doesn't work properly with a kernel-4.18.0-372.23.1.el8_6.x86_64
so the start of the service will timeout on that.

With older kernel kernel-4.18.0-372.19.1.el8_6.x86_64 (tested) it works properly


Checked strace with the ".23.1.el8_6" kernel and it seems that certutil is very slowed on getrandom() syscall

[pid 431392] getrandom("\x97\x05\x50\xbf\x95\xf9\x70\x4f\x6b\x0d\x46\x53", 32, GRND_RANDOM) = 12
[pid 431392] getrandom("\xbe\x8e\x74\xed\x72\x7c", 20, GRND_RANDOM) = 6
[pid 431392] getrandom("\x54\xaa\x58\x75\xd4\x68", 14, GRND_RANDOM) = 6
[pid 431392] getrandom("\x20\x32\xe0\xa0\x58\x16", 8, GRND_RANDOM) = 6
[pid 431392] getrandom("\xfa\x40", 2, GRND_RANDOM) = 2
[pid 431392] getrandom("\x29\x7b\x65\x1a", 32, GRND_RANDOM) = 4
[pid 431392] getrandom("\x0b\xd9\x50\x81\x58\x31", 28, GRND_RANDOM) = 6
[pid 431392] getrandom("\xee\x87\x0c\xd7\x68\x7d", 22, GRND_RANDOM) = 6
[pid 431392] getrandom("\x61\x5d\x6e\x75\xdc\xef", 16, GRND_RANDOM) = 6
[pid 431392] getrandom("\xf2\x2b\xb2\x08\x69\xb9", 10, GRND_RANDOM) = 6
[pid 431392] getrandom("\x5e\xff\x41\xd2", 4, GRND_RANDOM) = 4
[pid 431392] getrandom("\x8a\xb1", 32, GRND_RANDOM) = 2
.... (ommitted hundreds of lines) ....

every line is printed after approx 5 second instead of "immediately" on ".19.1.el8_6"

Conclusion:
It looks like a bug in kernel as everything works in kernel-4.18.0-372.19.1.el8_6.x86_64
The issue is reproducible in a newer kernel-4.18.0-372.23.1.el8_6.x86_64
Downgrading the kernel will fix the issue

Comment 14 Michal Skrivanek 2022-08-15 18:14:22 UTC
can you try on a different hardware? And on the same hardware with bare RHEL?
just check the available entropy in /proc/sys/kernel/random/entropy_avail

Comment 15 Michal Skrivanek 2022-08-16 08:49:24 UTC
possibly fixed in nss 3.79.0-11

* Thu Aug 11 2022 Bob Relyea <rrelyea> - 3.79.0-11
- Fix QA found failures:
-  remove extra '+' from sslpolicy.txt file causing test error values
-  only use GRND_RANDOM if the kernel is in FIPS mode.

please confirm

Comment 16 Michal Skrivanek 2022-08-16 08:56:37 UTC
(In reply to Michal Skrivanek from comment #15)

changelog entry is wrong, fixed version is nss 3.79.0-10

Comment 17 Wei Wang 2022-08-16 10:04:11 UTC
(In reply to Michal Skrivanek from comment #14)
> can you try on a different hardware? And on the same hardware with bare RHEL?
> just check the available entropy in /proc/sys/kernel/random/entropy_avail

Test with RHEL8.6, the bug can be reproduced.

Test steps:
1. Clean install rhel8.6
2. According to http://ci-web.eng.lab.tlv.redhat.com/docs/master/Guide/install_guide/index.html, deploy hosted engine

Result:
Hosted engine deploy failed as the same error.


More info:
[root@hp-dlxxgx-xx ~]# cat /proc/sys/kernel/random/entropy_avail
21

Comment 18 Martin Perina 2022-08-16 10:56:15 UTC
(In reply to Wei Wang from comment #17)
> (In reply to Michal Skrivanek from comment #14)
> > can you try on a different hardware? And on the same hardware with bare RHEL?
> > just check the available entropy in /proc/sys/kernel/random/entropy_avail
> 
> Test with RHEL8.6, the bug can be reproduced.
> 
> Test steps:
> 1. Clean install rhel8.6
> 2. According to
> http://ci-web.eng.lab.tlv.redhat.com/docs/master/Guide/install_guide/index.
> html, deploy hosted engine
> 
> Result:
> Hosted engine deploy failed as the same error.
> 
> 
> More info:
> [root@hp-dlxxgx-xx ~]# cat /proc/sys/kernel/random/entropy_avail
> 21

Available entropy is still very very low, not sure why, but probably you will need to install rngd daemon to increase it, otherwise pretty much no cryptography function can be used:

https://access.redhat.com/articles/1314933

Comment 19 Wei Wang 2022-08-17 06:57:40 UTC
(In reply to Martin Perina from comment #18)
> (In reply to Wei Wang from comment #17)
> > (In reply to Michal Skrivanek from comment #14)
> > > can you try on a different hardware? And on the same hardware with bare RHEL?
> > > just check the available entropy in /proc/sys/kernel/random/entropy_avail
> > 
> > Test with RHEL8.6, the bug can be reproduced.
> > 
> > Test steps:
> > 1. Clean install rhel8.6
> > 2. According to
> > http://ci-web.eng.lab.tlv.redhat.com/docs/master/Guide/install_guide/index.
> > html, deploy hosted engine
> > 
> > Result:
> > Hosted engine deploy failed as the same error.
> > 
> > 
> > More info:
> > [root@hp-dlxxgx-xx ~]# cat /proc/sys/kernel/random/entropy_avail
> > 21
> 
> Available entropy is still very very low, not sure why, but probably you
> will need to install rngd daemon to increase it, otherwise pretty much no
> cryptography function can be used:
> 
> https://access.redhat.com/articles/1314933

Hi Martin,

Thanks for your help. After using this daemon, the available entropy is increased.

[root@hp-dlxxxgx-xx ~]# cat /proc/sys/kernel/random/entropy_avail
3801

And deployment of hosted engine is successful with the increased entropy.

Comment 20 Michal Skrivanek 2022-08-17 08:37:55 UTC
we believe the fix happened in nss (comment #16), which is now included in nightly RHVH build redhat-virtualization-host-4.5.2-202208170132_8.6

Comment 21 Wei Wang 2022-08-18 04:21:25 UTC
Test With RHVH-4.5-20220817.0-RHVH-x86_64-dvd1.iso, the bug is fixed. 
Hosted engine deploy successful.

Comment 25 Wei Wang 2022-08-18 09:58:01 UTC
According to comment 21, move it to "VERIFIED"

Comment 29 errata-xmlrpc 2022-09-08 11:26:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: RHV RHEL Host (ovirt-host) [ovirt-4.5.2] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6392

Comment 30 Red Hat Bugzilla 2023-09-15 01:57:15 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days


Note You need to log in before you can comment on or make changes to this bug.