Description of problem: At the start of the deployment process (at the ansible flow), if an old ovirtmgmt bridge exists it need to be removed and recreated. Suggestion: the logic should be added in ansible/get_network_interfaces.yml How reproducible: 100% Steps to Reproduce: 1. run hosted-engine --deploy on a machine that HE was deployed and cleaned (that has ovirtmgmt bridge configured on it before deploying HE again) 2. when getting to the part of choosing a nic to connect to the bridge ("Please indicate a nic to set ovirtmgmt bridge on"), see that the ovirtmgmt bridge still present Actual results: The bridge is not removed at the deployment Expected results: The bridge should be removed at the beginning of the deployment
Still not sure if it will be better to remove the management bridge or if it will be better to simply detect the interface in use.
Verification steps provided by Ido: 1. hosted-engine --deploy (with default answers) until it is finished successfully 2. ovirt-hosted-engine-cleanup 3. see that ovirtmgmt is still configured - it should 4. hosted-engine --deploy 5. when choosing a nic see that it is the nic that ovirtmgmt is configured on
alma03 ~]# ovirt-hosted-engine-cleanup This will de-configure the host to run ovirt-hosted-engine-setup from scratch. Caution, this operation should be used with care. Are you sure you want to proceed? [y/n] y -=== Destroy hosted-engine VM ===- -=== Stop HA services ===- -=== Shutdown sanlock ===- shutdown force 1 wait 0 shutdown done 0 -=== Disconnecting the hosted-engine storage domain ===- Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/disconnect_storage_server.py", line 27, in <module> ha_cli.disconnect_storage_server() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 294, in disconnect_storage_server sserver.disconnect_storage_server() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_server.py", line 325, in disconnect_storage_server connectionParams=conList, File "/usr/lib/python2.7/site-packages/vdsm/client.py", line 278, in _call raise TimeoutError(method, kwargs, timeout) vdsm.client.TimeoutError: Request StoragePool.disconnectStorageServer with args {'connectionParams': [{'port': '3260', 'connection': '10.35.146.129', 'iqn': 'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00', 'user': '', 'tpgt': '1', 'password': '', 'id': '9e177df8-91db-4b8b-81af-28d56d856dba'}], 'storagepoolID': '00000000-0000-0000-0000-000000000000', 'domainType': 3} timed out after 900 seconds -=== De-configure VDSM networks ===- -=== Stop other services ===- -=== De-configure external daemons ===- -=== Removing configuration files ===- ? /etc/init/libvirtd.conf already missing - removing /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml - removing /etc/ovirt-hosted-engine/answers.conf - removing /etc/ovirt-hosted-engine/hosted-engine.conf - removing /etc/vdsm/vdsm.conf - removing /etc/pki/vdsm/certs/cacert.pem - removing /etc/pki/vdsm/certs/vdsmcert.pem - removing /etc/pki/vdsm/keys/vdsmkey.pem - removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem - removing /etc/pki/vdsm/libvirt-spice/server-cert.pem - removing /etc/pki/vdsm/libvirt-spice/server-key.pem - removing /etc/pki/CA/cacert.pem - removing /etc/pki/libvirt/clientcert.pem - removing /etc/pki/libvirt/private/clientkey.pem ? /etc/pki/ovirt-vmconsole/*.pem already missing - removing /var/cache/libvirt/qemu ? /var/run/ovirt-hosted-engine-ha/* already missing You have new mail in /var/spool/mail/root [root@alma03 ~]# Stuck in this way for more than 20 minutes... MainThread::INFO::2018-03-18 18:27:43,227::states::413::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine vm was unexpectedly shut down MainThread::INFO::2018-03-18 18:27:45,336::hosted_engine::614::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) Stopped VDSM domain monitor MainThread::INFO::2018-03-18 18:27:45,336::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down alma03 ~]# date Sun Mar 18 18:43:03 IST 2018 Components on host: ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.13-1.el7ev.noarch rhvm-appliance-4.2-20180202.0.el7.noarch Linux 3.10.0-861.el7.x86_64 #1 SMP Wed Mar 14 10:21:01 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo) alma03 ~]# hosted-engine --check-deployed [root@alma03 ~]# hosted-engine --vm-status The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable. [root@alma03 ~]# ifconfig eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet6 fe80::242b:39d3:578c:d564 prefixlen 64 scopeid 0x20<link> ether e0:db:55:fc:cf:43 txqueuelen 1000 (Ethernet) RX packets 1003 bytes 361080 (352.6 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 2973 bytes 523824 (511.5 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device memory 0x91720000-9173ffff eno2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether e0:db:55:fc:cf:44 txqueuelen 1000 (Ethernet) RX packets 1006 bytes 362160 (353.6 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3 bytes 180 (180.0 B) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 device memory 0x91700000-9171ffff enp5s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether a0:36:9f:3a:c4:f0 txqueuelen 1000 (Ethernet) RX packets 16511891 bytes 17944160116 (16.7 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 40751706 bytes 57531917022 (53.5 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 enp5s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 ether a0:36:9f:3a:c4:f2 txqueuelen 1000 (Ethernet) RX packets 207057 bytes 12752137 (12.1 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 968 bytes 70625 (68.9 KiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10<host> loop txqueuelen 1000 (Local Loopback) RX packets 144239 bytes 44688495 (42.6 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 144239 bytes 44688495 (42.6 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ovirtmgmt: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.35.92.3 netmask 255.255.252.0 broadcast 10.35.95.255 inet6 fe80::a236:9fff:fe3a:c4f0 prefixlen 64 scopeid 0x20<link> inet6 2620:52:0:235c:a236:9fff:fe3a:c4f0 prefixlen 64 scopeid 0x0<global> ether a0:36:9f:3a:c4:f0 txqueuelen 1000 (Ethernet) RX packets 6790584 bytes 15766897409 (14.6 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3747898 bytes 55515343503 (51.7 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 virbr0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500 inet 192.168.122.1 netmask 255.255.255.0 broadcast 192.168.122.255 ether 52:54:00:5b:00:6a txqueuelen 1000 (Ethernet) RX packets 4787 bytes 4064510 (3.8 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 4387 bytes 3587733 (3.4 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 After waiting for the command to finish for more than 20 minutes: alma03 ~]# hosted-engine --check-deployed The hosted engine has not been deployed ovirtmgmt: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 10.35.92.3 netmask 255.255.252.0 broadcast 10.35.95.255 inet6 fe80::a236:9fff:fe3a:c4f0 prefixlen 64 scopeid 0x20<link> inet6 2620:52:0:235c:a236:9fff:fe3a:c4f0 prefixlen 64 scopeid 0x0<global> ether a0:36:9f:3a:c4:f0 txqueuelen 1000 (Ethernet) RX packets 6792778 bytes 15767031001 (14.6 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3748174 bytes 55515464303 (51.7 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 During re-deployment the previously chosen NIC was used: --== HOST NETWORK CONFIGURATION ==-- [ INFO ] Bridge ovirtmgmt already created Please indicate a pingable gateway IP address [10.35.95.254]: [ INFO ] TASK [Gathering Facts] [ INFO ] ok: [localhost] [ INFO ] TASK [Detecting interface on existing management bridge] [ INFO ] ok: [localhost] [ INFO ] TASK [Get all active network interfaces] [ INFO ] TASK [Filter bonds with bad naming] [ INFO ] TASK [Generate output list] [ INFO ] ok: [localhost] Please indicate a nic to set ovirtmgmt bridge on: (enp5s0f0) [enp5s0f0]: . . . [ INFO ] Hosted Engine successfully deployed Moving to verified due to the fact that required functionality is working and redeployment is working fine. I've opened a separate bug on slow response and exception at https://bugzilla.redhat.com/show_bug.cgi?id=1557793
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.