Bug 1546652 - HE setup: Ansible: detect the interface used on an existing management bridge and propose just that one
Summary: HE setup: Ansible: detect the interface used on an existing management bridge...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: Network
Version: 2.2.0
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ovirt-4.2.2
: ---
Assignee: Ido Rosenzwig
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks: 1458709
TreeView+ depends on / blocked
 
Reported: 2018-02-19 09:14 UTC by Ido Rosenzwig
Modified: 2018-03-29 11:08 UTC (History)
4 users (show)

Fixed In Version: ovirt-hosted-engine-setup-2.2.12
Doc Type: No Doc Update
Doc Text:
undefined
Clone Of:
Environment:
Last Closed: 2018-03-29 11:08:42 UTC
oVirt Team: Integration
Embargoed:
rule-engine: ovirt-4.2+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 88046 0 master MERGED ansible: network: detect interface on existing bridge 2018-09-03 09:35:26 UTC
oVirt gerrit 88167 0 ovirt-hosted-engine-setup-2.2 MERGED ansible: network: detect interface on existing bridge 2018-02-26 11:10:35 UTC

Description Ido Rosenzwig 2018-02-19 09:14:31 UTC
Description of problem:
At the start of the deployment process (at the ansible flow),
if an old ovirtmgmt bridge exists it need to be removed and recreated.

Suggestion: the logic should be added in ansible/get_network_interfaces.yml


How reproducible:
100%

Steps to Reproduce:
1. run hosted-engine --deploy on a machine that HE was deployed and cleaned (that has ovirtmgmt bridge configured on it before deploying HE again)
2. when getting to the part of choosing a nic to connect to the bridge ("Please indicate a nic to set ovirtmgmt bridge on"), see that the ovirtmgmt bridge still present


Actual results:
The bridge is not removed at the deployment

Expected results:
The bridge should be removed at the beginning of the deployment

Comment 1 Simone Tiraboschi 2018-02-19 09:17:53 UTC
Still not sure if it will be better to remove the management bridge or if it will be better to simply detect the interface in use.

Comment 3 Nikolai Sednev 2018-03-06 15:14:03 UTC
Verification steps provided by Ido:
1. hosted-engine --deploy (with default answers) until it is finished successfully 
2. ovirt-hosted-engine-cleanup
3. see that ovirtmgmt is still configured - it should
4. hosted-engine --deploy 
5. when choosing a nic see that it is the nic that ovirtmgmt is configured on

Comment 7 Nikolai Sednev 2018-03-18 17:33:00 UTC
alma03 ~]# ovirt-hosted-engine-cleanup
 This will de-configure the host to run ovirt-hosted-engine-setup from scratch. 
Caution, this operation should be used with care.

Are you sure you want to proceed? [y/n]
y
  -=== Destroy hosted-engine VM ===- 
  -=== Stop HA services ===- 
  -=== Shutdown sanlock ===- 
shutdown force 1 wait 0
shutdown done 0
  -=== Disconnecting the hosted-engine storage domain ===- 


Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_setup/disconnect_storage_server.py", line 27, in <module>
    ha_cli.disconnect_storage_server()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 294, in disconnect_storage_server
    sserver.disconnect_storage_server()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_server.py", line 325, in disconnect_storage_server
    connectionParams=conList,
  File "/usr/lib/python2.7/site-packages/vdsm/client.py", line 278, in _call
    raise TimeoutError(method, kwargs, timeout)
vdsm.client.TimeoutError: Request StoragePool.disconnectStorageServer with args {'connectionParams': [{'port': '3260', 'connection': '10.35.146.129', 'iqn': 'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00', 'user': '', 'tpgt': '1', 'password': '', 'id': '9e177df8-91db-4b8b-81af-28d56d856dba'}], 'storagepoolID': '00000000-0000-0000-0000-000000000000', 'domainType': 3} timed out after 900 seconds
  -=== De-configure VDSM networks ===- 
  -=== Stop other services ===- 
  -=== De-configure external daemons ===- 
  -=== Removing configuration files ===- 
? /etc/init/libvirtd.conf already missing
- removing /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml
- removing /etc/ovirt-hosted-engine/answers.conf
- removing /etc/ovirt-hosted-engine/hosted-engine.conf
- removing /etc/vdsm/vdsm.conf
- removing /etc/pki/vdsm/certs/cacert.pem
- removing /etc/pki/vdsm/certs/vdsmcert.pem
- removing /etc/pki/vdsm/keys/vdsmkey.pem
- removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-key.pem
- removing /etc/pki/CA/cacert.pem
- removing /etc/pki/libvirt/clientcert.pem
- removing /etc/pki/libvirt/private/clientkey.pem
? /etc/pki/ovirt-vmconsole/*.pem already missing
- removing /var/cache/libvirt/qemu
? /var/run/ovirt-hosted-engine-ha/* already missing
You have new mail in /var/spool/mail/root
[root@alma03 ~]# 

Stuck in this way for more than 20 minutes...

MainThread::INFO::2018-03-18 18:27:43,227::states::413::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(consume) Engine vm was unexpectedly shut down
MainThread::INFO::2018-03-18 18:27:45,336::hosted_engine::614::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_stop_domain_monitor) Stopped VDSM domain monitor
MainThread::INFO::2018-03-18 18:27:45,336::agent::89::ovirt_hosted_engine_ha.agent.agent.Agent::(run) Agent shutting down

alma03 ~]# date
Sun Mar 18 18:43:03 IST 2018

Components on host:
ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.13-1.el7ev.noarch
rhvm-appliance-4.2-20180202.0.el7.noarch
Linux 3.10.0-861.el7.x86_64 #1 SMP Wed Mar 14 10:21:01 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

alma03 ~]# hosted-engine --check-deployed
[root@alma03 ~]# hosted-engine --vm-status
The hosted engine configuration has not been retrieved from shared storage. Please ensure that ovirt-ha-agent is running and the storage server is reachable.
[root@alma03 ~]# ifconfig
eno1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::242b:39d3:578c:d564  prefixlen 64  scopeid 0x20<link>
        ether e0:db:55:fc:cf:43  txqueuelen 1000  (Ethernet)
        RX packets 1003  bytes 361080 (352.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2973  bytes 523824 (511.5 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x91720000-9173ffff  

eno2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether e0:db:55:fc:cf:44  txqueuelen 1000  (Ethernet)
        RX packets 1006  bytes 362160 (353.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3  bytes 180 (180.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0x91700000-9171ffff  

enp5s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether a0:36:9f:3a:c4:f0  txqueuelen 1000  (Ethernet)
        RX packets 16511891  bytes 17944160116 (16.7 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 40751706  bytes 57531917022 (53.5 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp5s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether a0:36:9f:3a:c4:f2  txqueuelen 1000  (Ethernet)
        RX packets 207057  bytes 12752137 (12.1 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 968  bytes 70625 (68.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 144239  bytes 44688495 (42.6 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 144239  bytes 44688495 (42.6 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

ovirtmgmt: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.35.92.3  netmask 255.255.252.0  broadcast 10.35.95.255
        inet6 fe80::a236:9fff:fe3a:c4f0  prefixlen 64  scopeid 0x20<link>
        inet6 2620:52:0:235c:a236:9fff:fe3a:c4f0  prefixlen 64  scopeid 0x0<global>
        ether a0:36:9f:3a:c4:f0  txqueuelen 1000  (Ethernet)
        RX packets 6790584  bytes 15766897409 (14.6 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3747898  bytes 55515343503 (51.7 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

virbr0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        inet 192.168.122.1  netmask 255.255.255.0  broadcast 192.168.122.255
        ether 52:54:00:5b:00:6a  txqueuelen 1000  (Ethernet)
        RX packets 4787  bytes 4064510 (3.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 4387  bytes 3587733 (3.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

After waiting for the command to finish for more than 20 minutes:
alma03 ~]# hosted-engine --check-deployed
The hosted engine has not been deployed

ovirtmgmt: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.35.92.3  netmask 255.255.252.0  broadcast 10.35.95.255
        inet6 fe80::a236:9fff:fe3a:c4f0  prefixlen 64  scopeid 0x20<link>
        inet6 2620:52:0:235c:a236:9fff:fe3a:c4f0  prefixlen 64  scopeid 0x0<global>
        ether a0:36:9f:3a:c4:f0  txqueuelen 1000  (Ethernet)
        RX packets 6792778  bytes 15767031001 (14.6 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 3748174  bytes 55515464303 (51.7 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0


During re-deployment the previously chosen NIC was used:
          --== HOST NETWORK CONFIGURATION ==--
         
[ INFO  ] Bridge ovirtmgmt already created
          Please indicate a pingable gateway IP address [10.35.95.254]: 
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Detecting interface on existing management bridge]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Get all active network interfaces]
[ INFO  ] TASK [Filter bonds with bad naming]
[ INFO  ] TASK [Generate output list]
[ INFO  ] ok: [localhost]
          Please indicate a nic to set ovirtmgmt bridge on: (enp5s0f0) [enp5s0f0]: 
.
.
.
[ INFO  ] Hosted Engine successfully deployed

Moving to verified due to the fact that required functionality is working and redeployment is working fine.

I've opened a separate bug on slow response and exception at  https://bugzilla.redhat.com/show_bug.cgi?id=1557793

Comment 9 Sandro Bonazzola 2018-03-29 11:08:42 UTC
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.