Bug 1555654 - Ansible RHV 4.2.2-4 Installer fails on task [Wait for the management bridge to appear on the host]
Summary: Ansible RHV 4.2.2-4 Installer fails on task [Wait for the management bridge t...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-setup
Version: 4.2.2
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Simone Tiraboschi
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-14 19:26 UTC by Tom Gamull
Modified: 2022-04-16 09:18 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-31 10:09:52 UTC
oVirt Team: Integration
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ovirt install log (309.87 KB, text/plain)
2018-03-14 19:37 UTC, Tom Gamull
no flags Details
Engine Logs from ovirtmgmt bridge failure (96.71 KB, application/x-gzip)
2018-03-15 13:07 UTC, Tom Gamull
no flags Details
JournalCTL from Node installing hosted-engine (18.29 KB, application/x-gzip)
2018-03-15 13:10 UTC, Tom Gamull
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1549642 0 unspecified CLOSED Race condition between host up at engine eyes and SuperVdsm.ServerCallback::add_sourceroute on DHCP configured hosts 2021-02-22 00:41:40 UTC
Red Hat Issue Tracker RHV-45737 0 None None None 2022-04-16 09:18:35 UTC

Internal Links: 1549642

Description Tom Gamull 2018-03-14 19:26:38 UTC
Description of problem:
Installing RHV 4.2.4 fails on RHEL 7.5z on 
[ INFO  ] TASK [Wait for the management bridge to appear on the host]
The ovirt-mgmt link is never created or listed

Version-Release number of selected component (if applicable):
RHV 4.2.4
HREL 7.5z

How reproducible:
Every install

Steps to Reproduce:
1. configure RHEL host and run hosted-engine --deploy using gluster engine volume
2. 
3.

Actual results:

Failed on [ INFO  ] TASK [Wait for the management bridge to appear on the host]

Expected results:
Pass and continue

Additional info:
using engineering/dev 7.5z and using gluster
Used following - http://ci-web.eng.lab.tlv.redhat.com/docs/4.2/Guide/install_guide/index.html

Command runs without return
Mar 14 15:24:25 virt04.gamull.com python[10388]: ansible-command Invoked with warn=True executable=None _uses_shell=False _raw_params=ip link show ovirtmgmt removes=None creates=None chdir=None stdin=None

installer log
2018-03-14 15:13:26,673-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 vlan_id_out: {'stderr_lines': [], u'changed': True, u'end': u'2018-03-14 15:13:25.933657', u'stdout': u'', u'cmd': u"ip -d link show bond0 | grep vlan | grep -Po 'id \\K[\\d]+' | cat", 'failed': False, u'delta': u'0:00:00.010243', u'stderr': u'', u'rc': 0, 'stdout_lines': [], u'start': u'2018-03-14 15:13:25.923414'}
2018-03-14 15:13:26,774-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 changed: False
2018-03-14 15:13:26,874-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Set engine pub key as authorized key without validating the TLS/SSL certificates]
2018-03-14 15:13:28,176-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 changed: [localhost]
2018-03-14 15:13:28,478-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [include_tasks]
2018-03-14 15:13:28,678-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 ok: [localhost]
2018-03-14 15:13:28,879-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Obtain SSO token using username/password credentials]
2018-03-14 15:13:31,186-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 ok: [localhost]
2018-03-14 15:13:31,487-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Enable GlusterFS at cluster level]
2018-03-14 15:13:31,688-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 skipping: [localhost]
2018-03-14 15:13:31,889-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Set VLAN ID at datacenter level]
2018-03-14 15:13:32,190-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 skipping: [localhost]
2018-03-14 15:13:32,391-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Force host-deploy in offline mode]
2018-03-14 15:13:33,593-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 ok: [localhost]
2018-03-14 15:13:33,895-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Add host]
2018-03-14 15:13:35,598-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 changed: [localhost]
2018-03-14 15:13:35,799-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Wait for the engine to start host install process]
2018-03-14 15:13:43,112-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 ok: [localhost]
2018-03-14 15:13:43,414-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 TASK [debug]
2018-03-14 15:13:43,715-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 host_result: {'failed': False, 'attempts': 2, u'changed': False, u'ansible_facts': {u'ovirt_hosts': [{u'comment': u'', u'update_available': False, u'protocol': u'stomp', u'affinity_labels': [], u'hooks': [], u'max_scheduling_memory': 0, u'cluster': {u'href': u'/ovirt-engine/api/clusters/d33f86e0-27ba-11e8-8d9a-00163e3bda1c', u'id': u'd33f86e0-27ba-11e8-8d9a-00163e3bda1c'}, u'href': u'/ovirt-engine/api/hosts/7f3dd1ab-9bb6-4265-833d-7843f97cd749', u'spm': {u'priority': 5, u'status': u'none'}, u'port': 54321, u'external_status': u'ok', u'statistics': [], u'certificate': {u'organization': u'gamull.com', u'subject': u'O=gamull.com,CN=virt04.gamull.com'}, u'nics': [], u'storage_connection_extensions': [], u'id': u'7f3dd1ab-9bb6-4265-833d-7843f97cd749', u'hardware_information': {u'supported_rng_sources': []}, u'memory': 0, u'ksm': {u'enabled': False}, u'se_linux': {}, u'type': u'rhel', u'status': u'installing', u'tags': [], u'katello_errata': [], u'external_network_provider_configurations': [], u'ssh': {u'port': 22, u'fingerprint': u'SHA256:NiuhWH3WCP8RkUJ1Pa7Nhnl2mD+tTAz6hZwduZFJBlE'}, u'address': u'virt04.gamull.com', u'numa_nodes': [], u'device_passthrough': {u'enabled': False}, u'unmanaged_networks': [], u'permissions': [], u'numa_supported': False, u'power_management': {u'kdump_detection': True, u'enabled': False, u'pm_proxies': [], u'automatic_pm_enabled': True}, u'name': u'virt04.gamull.com', u'devices': [], u'summary': {u'total': 0}, u'auto_numa_status': u'unknown', u'transparent_huge_pages': {u'enabled': False}, u'network_attachments': [], u'os': {u'custom_kernel_cmdline': u''}, u'cpu': {u'speed': 0.0, u'topology': {}}, u'kdump_status': u'unknown'}]}}
2018-03-14 15:13:43,815-0400 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 changed: False
2018-03-14 15:13:43,916-0400 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Wait for the management bridge to appear on the host]


[root@virt04 ~]# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 00:1d:09:6c:4c:18 brd ff:ff:ff:ff:ff:ff
3: eno2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP mode DEFAULT group default qlen 1000
    link/ether 00:1d:09:6c:4c:1a brd ff:ff:ff:ff:ff:ff
19: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 00:1d:09:6c:4c:18 brd ff:ff:ff:ff:ff:ff
20: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether c6:c1:07:f5:fd:74 brd ff:ff:ff:ff:ff:ff
21: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1360 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 100
    link/none 
22: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 52:54:00:a5:30:47 brd ff:ff:ff:ff:ff:ff
23: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN mode DEFAULT group default qlen 1000
    link/ether 52:54:00:a5:30:47 brd ff:ff:ff:ff:ff:ff
24: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether fe:16:3e:3b:da:1c brd ff:ff:ff:ff:ff:ff

Comment 1 Tom Gamull 2018-03-14 19:37:10 UTC
Created attachment 1408133 [details]
ovirt install log

Comment 2 Sandro Bonazzola 2018-03-15 07:43:41 UTC
Can you please confirm you're testing 4.2.2-4 (corresponding to 4.2.2 RC3 upstream) internal compose?

Comment 4 Simone Tiraboschi 2018-03-15 08:41:29 UTC
Hi Tom,
was your host configured with DHCP?
If so I think it's just a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1549642 which has been fixed with ovirt-hosted-engine-setup-2.2.13 and vdsm-4.20.21.

Comment 5 Tom Gamull 2018-03-15 11:12:52 UTC
I’ll get versions from rpm shortly

I have a bond that is static and I’ll upload the ifcfg files also soon 
I’m wondering if bridge packages weren’t installed. I could also try on a single interface this morning if that helps

Comment 6 Simone Tiraboschi 2018-03-15 11:20:04 UTC
Thanks,
are you able to connect to the engine VM via ssh (only from the host where you run hosted-engine-setup since the bootstrap VM runs on a natted network) and fetch /var/log/ovirt-engine/engine.log and host-deploy logs (/var/log/ovirt-engine/host-deploy/*) from there?

Comment 7 Tom Gamull 2018-03-15 12:24:13 UTC
Interfaces:
[root@virt04 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0
BONDING_OPTS="downdelay=0 miimon=100 mode=balance-tlb updelay=0"
TYPE=Bond
BONDING_MASTER=yes
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_PRIVACY=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=bond0
UUID=5549fcc8-16a3-468a-86bb-e1d0af15238f
DEVICE=bond0
ONBOOT=yes
IPADDR=10.1.10.84
PREFIX=24
GATEWAY=10.1.10.1
DNS1=10.1.10.1
DOMAIN=gamull.com
ZONE=public

[root@virt04 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0-slave1
HWADDR=00:1D:09:6C:4C:18
TYPE=Ethernet
NAME=bond0-slave1
UUID=5d0bb9dd-1116-446a-8168-abd34e66f903
DEVICE=eno1
ONBOOT=yes
MASTER=bond0
SLAVE=yes
[root@virt04 ~]# cat /etc/sysconfig/network-scripts/ifcfg-bond0-slave2
HWADDR=00:1D:09:6C:4C:1A
TYPE=Ethernet
NAME=bond0-slave2
UUID=b332fdae-0c0a-4d68-aa31-ecad3b2f84bb
DEVICE=eno2
ONBOOT=yes
MASTER=bond0
SLAVE=yes
[root@virt04 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eno1
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=eno1
UUID=9aeb33b1-48b4-4542-83d0-87c4ba02ce99
DEVICE=eno1
ONBOOT=no
ZONE=public
[root@virt04 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eno2
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=eno2
UUID=6dd1adf6-fd16-4ce2-99ea-e5737d88a4de
DEVICE=eno2
ONBOOT=no
ZONE=public

Comment 8 Tom Gamull 2018-03-15 12:28:10 UTC
I'll try to update these components to the latest, they are a minor version behind
[root@virt04 ~]# rpm -qa ovirt-hosted-engine-setup
ovirt-hosted-engine-setup-2.2.12-1.el7ev.noarch
[root@virt04 ~]# rpm -qa vdsm
vdsm-4.20.20-1.el7ev.x86_64

Comment 9 Tom Gamull 2018-03-15 13:07:09 UTC
Created attachment 1408441 [details]
Engine Logs from ovirtmgmt bridge failure

Hosted Engine fails on host addition creating the management bridge

Comment 10 Tom Gamull 2018-03-15 13:10:05 UTC
Created attachment 1408442 [details]
JournalCTL from Node installing hosted-engine

journalctl from virt04.gamull.com (host installing hosted-engine)

Comment 11 Yaniv Kaul 2018-03-15 13:18:13 UTC
Unspecified severity (?!), deferring from 4.2.2

Comment 14 Tom Gamull 2018-03-15 15:51:10 UTC
Using the new version leads to a failure earlier so not sure why.  I may try the single NIC next, Im thinking the bond config is missing

[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Enable GlusterFS at cluster level]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Set VLAN ID at datacenter level]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Force host-deploy in offline mode]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Add host]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Wait for the host to be up]
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "virt04.gamull.com", "affinity_labels": [], "auto_numa_status": "unknown", "certificate": {"organization": "gamull.com", "subject": "O=gamull.com,CN=virt04.gamull.com"}, "cluster": {"href": "/ovirt-engine/api/clusters/155317e0-2865-11e8-8359-00163e48f909", "id": "155317e0-2865-11e8-8359-00163e48f909"}, "comment": "", "cpu": {"speed": 0.0, "topology": {}}, "device_passthrough": {"enabled": false}, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"supported_rng_sources": []}, "hooks": [], "href": "/ovirt-engine/api/hosts/bffd6dea-9451-4b43-9014-0f8af6d5ee05", "id": "bffd6dea-9451-4b43-9014-0f8af6d5ee05", "katello_errata": [], "kdump_status": "unknown", "ksm": {"enabled": false}, "max_scheduling_memory": 0, "memory": 0, "name": "virt04.gamull.com", "network_attachments": [], "nics": [], "numa_nodes": [], "numa_supported": false, "os": {"custom_kernel_cmdline": ""}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_linux": {}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:NiuhWH3WCP8RkUJ1Pa7Nhnl2mD+tTAz6hZwduZFJBlE", "port": 22}, "statistics": [], "status": "install_failed", "storage_connection_extensions": [], "summary": {"total": 0}, "tags": [], "transparent_huge_pages": {"enabled": false}, "type": "rhel", "unmanaged_networks": [], "update_available": false}]}, "attempts": 120, "changed": false}
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Notify the user about a failure]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] ok: [localhost]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180315114415.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180315111027-zimygj.log


From engine.log
2018-03-15 11:33:08,970-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [6581e695] EVENT_ID: VDS_INSTALL_IN_PROGRESS_ERROR(511), An error has occurred during installation of Host virt04.gamull.com: Failed to execute stage 'Closing up': Failed to start service 'ovirt-imageio-daemon'.
2018-03-15 11:33:08,981-04 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [6581e695] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host virt04.gamull.com. Stage: Clean up.
2018-03-15 11:33:08,991-04 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [6581e695] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host virt04.gamull.com. Stage: Pre-termination.
2018-03-15 11:33:09,047-04 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [6581e695] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host virt04.gamull.com. Retrieving installation logs to: '/var/log/ovirt-engine/host-deploy/ovirt-host-deploy-20180315113309-virt04.gamull.com-6581e695.log'.
2018-03-15 11:33:10,083-04 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (VdsDeploy) [6581e695] EVENT_ID: VDS_INSTALL_IN_PROGRESS(509), Installing Host virt04.gamull.com. Stage: Termination.
2018-03-15 11:33:10,273-04 ERROR [org.ovirt.engine.core.uutils.ssh.SSHDialog] (EE-ManagedThreadFactory-engine-Thread-1) [6581e695] SSH error running command root.com:'umask 0077; MYTMP="$(TMPDIR="${OVIRT_TMPDIR}" mktemp -d -t ovirt-XXXXXXXXXX)"; trap "chmod -R u+rwX \"${MYTMP}\" > /dev/null 2>&1; rm -fr \"${MYTMP}\" > /dev/null 2>&1" 0; tar --warning=no-timestamp -C "${MYTMP}" -x &&  "${MYTMP}"/ovirt-host-deploy DIALOG/dialect=str:machine DIALOG/customization=bool:True': IOException: Command returned failure code 1 during SSH session 'root.com'
2018-03-15 11:33:10,274-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.VdsDeployBase] (EE-ManagedThreadFactory-engine-Thread-1) [6581e695] Error during host virt04.gamull.com install
2018-03-15 11:33:10,276-04 ERROR [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [6581e695] Host installation failed for host 'bffd6dea-9451-4b43-9014-0f8af6d5ee05', 'virt04.gamull.com': Command returned failure code 1 during SSH session 'root.com'
2018-03-15 11:33:10,297-04 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1) [6581e695] START, SetVdsStatusVDSCommand(HostName = virt04.gamull.com, SetVdsStatusVDSCommandParameters:{hostId='bffd6dea-9451-4b43-9014-0f8af6d5ee05', status='InstallFailed', nonOperationalReason='NONE', stopSpmFailureLogged='false', maintenanceReason='null'}), log id: 390bd6d5
2018-03-15 11:33:10,307-04 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-engine-Thread-1) [6581e695] FINISH, SetVdsStatusVDSCommand, log id: 390bd6d5
2018-03-15 11:33:10,318-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1) [6581e695] EVENT_ID: VDS_INSTALL_FAILED(505), Host virt04.gamull.com installation failed. Command returned failure code 1 during SSH session 'root.com'.
2018-03-15 11:33:10,335-04 INFO  [org.ovirt.engine.core.bll.hostdeploy.InstallVdsInternalCommand] (EE-ManagedThreadFactory-engine-Thread-1) [6581e695] Lock freed to object 'EngineLock:{exclusiveLocks='[bffd6dea-9451-4b43-9014-0f8af6d5ee05=VDS]', sharedLocks=''}'
2018-03-15 11:37:00,120-04 INFO  [org.ovirt.engine.core.bll.provider.network.SyncNetworkProviderCommand] (EE-ManagedThreadFactory-engineScheduled-Thread-43) [44fae39d] Lock Acquired to object 'EngineLock:{exclusiveLocks='[44952e08-3bc4-4b59-ab5e-768418530529=PROVIDER]', sharedLocks=''}'

deploy log
2018-03-15 11:33:08,960-0400 ERROR otopi.context context._executeMethod:152 Failed to execute stage 'Closing up': Failed to start service 'ovirt-imageio-daemon'
2018-03-15 11:33:08,960-0400 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND       **%EventEnd STAGE closeup METHOD otopi.plugins.ovirt_host_deploy.vdsm.packages.Plugin._start (odeploycons.packages.vdsm.started)
2018-03-15 11:33:08,961-0400 DEBUG otopi.context context.dumpEnvironment:859 ENVIRONMENT DUMP - BEGIN
2018-03-15 11:33:08,961-0400 DEBUG otopi.context context.dumpEnvironment:869 ENV BASE/error=bool:'True'
2018-03-15 11:33:08,961-0400 DEBUG otopi.context context.dumpEnvironment:869 ENV BASE/exceptionInfo=list:'[(<type 'exceptions.RuntimeError'>, RuntimeError("Failed to start service 'ovirt-imageio-daemon'",), <traceback object at 0x7fd4b9ee5dd0>)]'
2018-03-15 11:33:08,963-0400 DEBUG otopi.context context.dumpEnvironment:873 ENVIRONMENT DUMP - END
2018-03-15 11:33:08,963-0400 INFO otopi.context context.runSequence:741 Stage: Clean up
2018-03-15 11:33:08,964-0400 DEBUG otopi.context context.runSequence:745 STAGE cleanup
2018-03-15 11:33:08,965-0400 DEBUG otopi.context context._executeMethod:128 Stage cleanup METHOD otopi.plugins.otopi.dialog.answer_file.Plugin._generate_answer_file
2018-03-15 11:33:08,965-0400 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND       **%EventStart STAGE cleanup METHOD otopi.plugins.otopi.dialog.answer_file.Plugin._generate_answer_file (otopi.core.answer.file.generated)
2018-03-15 11:33:08,966-0400 DEBUG otopi.plugins.otopi.dialog.machine dialog.__logString:204 DIALOG:SEND       **%EventEnd STAGE cleanup METHOD otopi.plugins.otopi.dialog.answer_file.Plugin._generate_answer_file (otopi.core.answer.file.generated)
2018-03-15 11:33:08,967-0400 DEBUG otopi.context context.dumpEnvironment:859 ENVIRONMENT DUMP - BEGIN
2018-03-15 11:33:08,967-0400 DEBUG otopi.context context.dumpEnvironment:869 ENV DIALOG/answerFileContent=str:'# OTOPI answer file, generated by human dialog
[environment:default]
QUESTION/21/CUSTOMIZATION_COMMAND=str:noop
QUESTION/28/CUSTOMIZATION_COMMAND=str:env-query-multi -k NETWORK/iptablesRules
QUESTION/13/CUSTOMIZATION_COMMAND=str:env-query -k VDSM_CONFIG/addresses/management_port
QUESTION/26/CUSTOMIZATION_COMMAND=str:env-query -k KERNEL/cmdlineOld
QUESTION/6/CUSTOMIZATION_COMMAND=str:env-query -k NETWORK/sshKey
QUESTION/17/CUSTOMIZATION_COMMAND=str:env-query -k VDSM/checkVirtHardware
QUESTION/22/CUSTOMIZATION_COMMAND=str:noop
QUESTION/3/CUSTOMIZATION_COMMAND=str:env-query -k SYSTEM/clockSet
QUESTION/14/CUSTOMIZATION_COMMAND=str:env-query -k VDSM/engineHost
QUESTION/16/CUSTOMIZATION_COMMAND=str:env-query -k VDSM/vdsmMinimumVersion
QUESTION/12/CUSTOMIZATION_COMMAND=str:env-query -k VDSM_CONFIG/vars/ssl
QUESTION/5/CUSTOMIZATION_COMMAND=str:env-query -k NETWORK/sshUser
QUESTION/32/CUSTOMIZATION_COMMAND=str:env-query -k VMCONSOLE/caKey
QUESTION/11/CUSTOMIZATION_COMMAND=str:env-get -k VDSM/vdsmId
QUESTION/4/CUSTOMIZATION_COMMAND=str:env-query -k NETWORK/sshEnable
QUESTION/10/CUSTOMIZATION_COMMAND=str:env-get -k VDSM/ovirt-node
QUESTION/23/CUSTOMIZATION_COMMAND=str:noop
QUESTION/33/CUSTOMIZATION_COMMAND=str:install
QUESTION/18/CUSTOMIZATION_COMMAND=str:env-query -k VIRT/enable
QUESTION/27/CUSTOMIZATION_COMMAND=str:env-query -k NETWORK/iptablesEnable
QUESTION/25/CUSTOMIZATION_COMMAND=str:env-query -k KERNEL/cmdlineNew
QUESTION/15/CUSTOMIZATION_COMMAND=str:env-query -k VDSM/enginePort
QUESTION/8/CUSTOMIZATION_COMMAND=str:env-query -k GLUSTER/enable
QUESTION/31/CUSTOMIZATION_COMMAND=str:env-query -k VMCONSOLE/certificateEnrollment
QUESTION/2/CUSTOMIZATION_COMMAND=str:env-query -k OVIRT_ENGINE/correlationId
QUESTION/19/CUSTOMIZATION_COMMAND=str:env-query -k VDSM/certificateEnrollment
QUESTION/24/CUSTOMIZATION_COMMAND=str:noop
QUESTION/30/CUSTOMIZATION_COMMAND=str:env-query -k VMCONSOLE/enable
QUESTION/7/CUSTOMIZATION_COMMAND=str:noop
QUESTION/9/CUSTOMIZATION_COMMAND=str:env-get -k VDSM/ovirt-legacy-node
QUESTION/1/CUSTOMIZATION_COMMAND=str:env-get -k CORE/logFileName
QUESTION/20/CUSTOMIZATION_COMMAND=str:env-get -k KDUMP/supported
QUESTION/29/CUSTOMIZATION_COMMAND=str:env-get -k VMCONSOLE/support
'
2018-03-15 11:33:08,968-0400 DEBUG otopi.context context.dumpEnvironment:873 ENVIRONMENT DUMP - END
2018-03-15 11:33:08,968-0400 INFO otopi.context context.runSequence:741 Stage: Pre-termination
2018-03-15 11:33:08,969-0400 DEBUG otopi.context context.runSequence:745 STAGE pre-terminate

Comment 15 Simone Tiraboschi 2018-03-15 16:04:24 UTC
It failed here:

2018-03-15 11:33:08,960-0400 ERROR otopi.context context._executeMethod:152 Failed to execute stage 'Closing up': Failed to start service 'ovirt-imageio-daemon'

Can you please check journalctl entries for ovirt-imageio-daemon on the host?

Was that a clean deploy attempt or an upgrade from a previous beta?

Comment 16 Tom Gamull 2018-03-15 16:36:35 UTC
I'm using the following to cleanup but I'll do this again and restart the install.  Let me reset this and try it again.

/usr/sbin/ovirt-hosted-engine-cleanup
systemctl stop ovirt-ha-agent; systemctl stop ovirt-ha-broker; systemctl stop vdsmd

echo "stopping services"
service vdsmd stop 2>/dev/null
service supervdsmd stop 2>/dev/null
initctl stop libvirtd 2>/dev/null
echo "removing packages"
yum -y remove \*ovirt\* \*vdsm\* \*libvirt\* collectd \*cockpit\*
#yum -y remove \*cockpit\*
rm -fR /etc/*ovirt* /etc/*vdsm* /etc/*libvirt* /etc/pki/vdsm

May need to clear on .vdsm and ovirt
/var/lib
/root/.vdsm?
/etc
/tmp entries
/var/tmp/local and ansible entries

rm /etc/ovirt-hosted-engine/hosted-engine.conf
rm -f /etc/ovirt-hosted-engine/answers.conf
rm /etc/vdsm/vdsm.conf
rm /etc/pki/vdsm/*/*.pem
#rm /etc/pki/CA/cacert.pem
rm /etc/pki/libvirt/*.pem
rm /etc/pki/libvirt/private/*.pem

vi ~/.ssh/known_hosts
#remove old entries for HE

#MUST REBOOT
systemctl reboot

Comment 17 Tom Gamull 2018-03-15 19:16:54 UTC
I had used a fresh system but I'm not reinstalling the OS on every attempt.  Right now it looks like vdsm isn't getting fully configured.  For example

[root@virt04 ~]# vdsm-tool configure --force

Checking configuration status...

abrt is not configured for vdsm
lvm is configured for vdsm
libvirt is not configured for vdsm yet
FAILED: conflicting vdsm and libvirt-qemu tls configuration.
vdsm.conf with ssl=True requires the following changes:
libvirtd.conf: listen_tcp=0, auth_tcp="sasl", listen_tls=1
qemu.conf: spice_tls=1.
Manual override for multipath.conf detected - preserving current configuration
This manual override for multipath.conf was based on downrevved template. You are strongly advised to contact your support representatives
schema should be configured
Running configure...
Reconfiguration of abrt is done.
Reconfiguration of passwd is done.
Reconfiguration of sebool is done.
Reconfiguration of certificates is done.
Reconfiguration of libvirt is done.
Traceback (most recent call last):
  File "/usr/bin/vdsm-tool", line 219, in main
    return tool_command[cmd]["command"](*args)
  File "/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py", line 38, in wrapper
    func(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line 141, in configure
    _configure(c)
  File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line 88, in _configure
    getattr(module, 'configure', lambda: None)()
  File "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/bond_defaults.py", line 37, in configure
    sysfs_options_mapper.dump_bonding_options()
  File "/usr/lib/python2.7/site-packages/vdsm/network/link/bond/sysfs_options_mapper.py", line 46, in dump_bonding_options
    with open(sysfs_options.BONDING_DEFAULTS, 'w') as f:
IOError: [Errno 2] No such file or directory: '/var/run/vdsm/bonding-defaults.json'

Comment 18 Simone Tiraboschi 2018-03-15 20:34:13 UTC
(In reply to Tom Gamull from comment #17)
> I had used a fresh system but I'm not reinstalling the OS on every attempt. 
> Right now it looks like vdsm isn't getting fully configured.  For example
> 
> [root@virt04 ~]# vdsm-tool configure --force
...
> "/usr/lib/python2.7/site-packages/vdsm/network/link/bond/
> sysfs_options_mapper.py", line 46, in dump_bonding_options
>     with open(sysfs_options.BONDING_DEFAULTS, 'w') as f:
> IOError: [Errno 2] No such file or directory:
> '/var/run/vdsm/bonding-defaults.json'

I think it's worth to track this in a separate and specific bug.

Comment 19 Tom Gamull 2018-03-16 11:25:42 UTC
Having to rebuild the physical machines is a bit of a slow process. I am fine opening another ticket on that bug but I'd like to go back to this one.  I'd like advice on what next steps you'd like me to try based on where i am right now.

Current setup: 3 nodes installed with RHEL 7.5 beta ISO
* no bond this time, I used eno1 as the main interface and eno2 is not up (I can use this as dedicated storage/migration later but right now I'd like to isolate to eno1)
* vdo is installed but I haven't configured the storage yet.  

OPTIONS:
Option 1: 
* Since I had vdo issues before, I was considering not using vdo and taking /dev/sda (1TB SATA) as a non-vdo gluster replica3 for the engine leaving the /dev/fioa (fusion IO 300GB device) and /dev/sdb (1TB SATA) as the RHHI style lvm with cache (fusionIO would do cache on lvm) with VDO.
* Using the above I could try with RHV 4.2beta from the consumer site and the RHEL 7.5 beta from consumer site and try to replicate the original issue in this BZ or maybe it succeeds?
* If I get it working I can try to upgrade?

--OR--

Option 2: 
* I use the 4.2.2 internal repos and the rhel7.5z kernel and then install the vdo and gluster volumes and replica 3 volumes (so upgrade packages and then install the gluster, then rhv)
* my setup for this involves having to pull the RHV latest through a vpn.  This doesn't seem to affect the install but could be having issues

For either option, is there a recommended install step path?  Right now I'm using a process I had from Grafton (RHHI) for RHEL based hosts (right now RHHI only uses RHV-H).  This process essentially installs Gluster with replica3 volumes using lvmcache.  So far none of my issues seem to be with the storage, but rather with networking.

Comment 20 Simone Tiraboschi 2018-05-31 10:09:52 UTC
That piece of code has been refactored by different patches, and this seems not reproducible today.
Feel free to reopen if you hit this again.

Comment 21 Franta Kust 2019-05-16 13:08:22 UTC
BZ<2>Jira Resync


Note You need to log in before you can comment on or make changes to this bug.