+++ This bug was initially created as a clone of Bug #1689838 +++ Description of problem: ====================== While upgrading the RHHI cluster using the ansible roles,the Host which has the HE running is stuck in "Preparing for Maintenance" state and fails. This is seen twice. Version-Release number of selected component ============================================ ovirt-ansible-infra-1.1.12-1.el7ev.noarch ovirt-ansible-cluster-upgrade-1.1.12-1.el7ev.noarch ansible-2.7.9-1.el7ae.noarch ovirt-ansible-shutdown-env-1.0.3-1.el7ev.noarch ovirt-ansible-roles-1.1.6-1.el7ev.noarch ovirt-engine-4.3.2.1-0.1.el7.noarch How reproducible: ================ Twice Steps to Reproduce: ================== 1.Try upgrading the RHV hosts 4.2 to 4.3 using ansible playbook 2.Create a upgrade yaml with the required details 3.Check the upgrade status on the Host which has HE running Actual results: ============== The upgrade fails Expected results: ================ The HE should get migrated and the Host should be in Maintenance Additional info: =============== Once we manually migrate the HE, could see the ansible upgrade working fine. But post that could see this issue though https://bugzilla.redhat.com/show_bug.cgi?id=1685951#c8 --- Additional comment from bipin on 2019-03-18 09:34:49 UTC --- Ansible log: =========== TASK [ovirt.cluster-upgrade : Upgrade host] *********************************************************************************************************************************************************************** task path: /usr/share/ansible/roles/ovirt.cluster-upgrade/tasks/upgrade.yml:1 <127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: root <127.0.0.1> EXEC /bin/sh -c 'echo ~root && sleep 0' <127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp/ansible-tmp-1552894166.44-226733322406964 `" && echo ansible-tmp-1552894166.44-226733322406964="` echo /root/.ansible/tmp/ansible-tmp-1552894166.44-226733322406964 `" ) && sleep 0' Using module file /usr/share/ansible/roles/ovirt.cluster-upgrade/library/ovirt_host_28.py <127.0.0.1> PUT /root/.ansible/tmp/ansible-local-8098X14nFa/tmp3E7lR2 TO /root/.ansible/tmp/ansible-tmp-1552894166.44-226733322406964/AnsiballZ_ovirt_host_28.py <127.0.0.1> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1552894166.44-226733322406964/ /root/.ansible/tmp/ansible-tmp-1552894166.44-226733322406964/AnsiballZ_ovirt_host_28.py && sleep 0' <127.0.0.1> EXEC /bin/sh -c '/usr/bin/python2 /root/.ansible/tmp/ansible-tmp-1552894166.44-226733322406964/AnsiballZ_ovirt_host_28.py && sleep 0' <127.0.0.1> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1552894166.44-226733322406964/ > /dev/null 2>&1 && sleep 0' The full traceback is: Traceback (most recent call last): File "/tmp/ansible_ovirt_host_28_payload__UxiYv/__main__.py", line 531, in main reboot=module.params['reboot_after_upgrade'], File "/tmp/ansible_ovirt_host_28_payload__UxiYv/ansible_ovirt_host_28_payload.zip/ansible/module_utils/ovirt.py", line 749, in action poll_interval=self._module.params['poll_interval'], File "/tmp/ansible_ovirt_host_28_payload__UxiYv/ansible_ovirt_host_28_payload.zip/ansible/module_utils/ovirt.py", line 341, in wait raise Exception("Timeout exceed while waiting on result state of the entity.") Exception: Timeout exceed while waiting on result state of the entity. Engine log: ========== 2019-03-18 12:59:27,402+05 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] START, SetVdsStatusVDSCommand(HostName = rhsqa-grafton7-nic2.lab.eng.blr.redhat.com, SetVdsStatusVDSCommandParameters:{hostId='6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd', status='PreparingForMaintenance', nonOperationalReason='NONE', stopSpmFailureLogged='true', maintenanceReason=''}), log id: 1609dee2 2019-03-18 12:59:27,116+05 INFO [org.ovirt.engine.core.bll.hostdeploy.UpgradeHostCommand] (default task-11) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] Running command: UpgradeHostCommand internal: false. Entities affected : ID: 6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd Type: VDSAction group EDIT_HOST_CONFIGURATION with role type ADMIN 2019-03-18 12:59:27,265+05 INFO [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] Lock Acquired to object 'EngineLock:{exclusiveLocks='', sharedLocks='[7e686ac4-4933-11e9-ac3a-004755204901=POOL]'}' 2019-03-18 12:59:27,352+05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-11) [] EVENT_ID: HOST_UPGRADE_STARTED(840), Host rhsqa-grafton7-nic2.lab.eng.blr.redhat.com upgrade was started (User: admin@internal-authz). 2019-03-18 12:59:27,397+05 INFO [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] Running command: MaintenanceNumberOfVdssCommand internal: true. Entities affected : ID: 6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd Type: VDSAction group MANIPULATE_HOST with role type ADMIN 2019-03-18 12:59:27,402+05 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] START, SetVdsStatusVDSCommand(HostName = rhsqa-grafton7-nic2.lab.eng.blr.redhat.com, SetVdsStatusVDSCommandParameters:{hostId='6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd', status='PreparingForMaintenance', nonOperationalReason='NONE', stopSpmFailureLogged='true', maintenanceReason=''}), log id: 1609dee2 2019-03-18 12:59:27,402+05 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] VDS 'rhsqa-grafton7-nic2.lab.eng.blr.redhat.com' is spm and moved from up calling resetIrs. 2019-03-18 12:59:27,404+05 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] START, ResetIrsVDSCommand( ResetIrsVDSCommandParameters:{storagePoolId='7e686ac4-4933-11e9-ac3a-004755204901', ignoreFailoverLimit='false', vdsId='6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd', ignoreStopFailed='false'}), log id: 1b071a98 2019-03-18 12:59:27,409+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] START, SpmStopVDSCommand(HostName = rhsqa-grafton7-nic2.lab.eng.blr.redhat.com, SpmStopVDSCommandParameters:{hostId='6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd', storagePoolId='7e686ac4-4933-11e9-ac3a-004755204901'}), log id: 4d4a51b5 2019-03-18 12:59:27,415+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] SpmStopVDSCommand::Stopping SPM on vds 'rhsqa-grafton7-nic2.lab.eng.blr.redhat.com', pool id '7e686ac4-4933-11e9-ac3a-004755204901' 2019-03-18 12:59:27,423+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] FINISH, SpmStopVDSCommand, return: , log id: 4d4a51b5 2019-03-18 12:59:27,428+05 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] FINISH, ResetIrsVDSCommand, return: , log id: 1b071a98 2019-03-18 12:59:27,438+05 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] FINISH, SetVdsStatusVDSCommand, return: , log id: 1609dee2 2019-03-18 12:59:27,441+05 INFO [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] Lock freed to object 'EngineLock:{exclusiveLocks='', sharedLocks='[7e686ac4-4933-11e9-ac3a-004755204901=POOL]'}' 2019-03-18 12:59:27,524+05 INFO [org.ovirt.engine.core.bll.MaintenanceVdsCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] Running command: MaintenanceVdsCommand internal: true. Entities affected : ID: 6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd Type: VDS 2019-03-18 12:59:27,563+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SetHaMaintenanceModeVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] START, SetHaMaintenanceModeVDSCommand(HostName = rhsqa-grafton7-nic2.lab.eng.blr.redhat.com, SetHaMaintenanceModeVDSCommandParameters:{hostId='6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd'}), log id: 40632e1f 2019-03-18 12:59:27,566+05 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SetHaMaintenanceModeVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] FINISH, SetHaMaintenanceModeVDSCommand, return: , log id: 40632e1f 2019-03-18 12:59:27,834+05 INFO [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] EVENT_ID: VDS_MAINTENANCE(15), Host rhsqa-grafton7-nic2.lab.eng.blr.redhat.com was switched to Maintenance Mode. 2019-03-18 12:59:28,057+05 INFO [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-45) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] Command 'MaintenanceNumberOfVdss' (id: 'a76d7e9f-1d92-4600-87e8-7966db533e02') waiting on child command id: '9744c2b4-7b7d-4be4-addb-e65ff6f4d645' type:'MaintenanceVds' to complete Messages: ======== Mar 18 12:59:26 hostedenginesm3 python2: ansible-ovirt_host_28 Invoked with comment=None activate=True force=False power_management_enabled=None cluster=None fetch_nested=False hosted_engine=None id=None check_upgrade=True kdump_integration=None iscsi=None state=upgraded reboot_after_upgrade=True auth={'timeout': 0, 'url': 'https://hostedenginesm3.lab.eng.blr.redhat.com/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'dC9d1hXBkFgfz-bgBkEI3rt4eF58m4qmyeD5pMBQIfYQ76zV9YPi-C8YyGbKYGgpke9N-NCMertX6M8be_pv3A', 'ca_file': '/etc/pki/ovirt-engine/ca.pem'} nested_attributes=[] address=None override_iptables=None password=NOT_LOGGING_PARAMETER wait=True public_key=False name=rhsqa-grafton7-nic2.lab.eng.blr.redhat.com spm_priority=None poll_interval=3 kernel_params=None timeout=3600 override_display=None Mar 18 13:01:01 hostedenginesm3 systemd: Started Session 7 of user root. Mar 18 13:59:32 hostedenginesm3 python2: ansible-ovirt_event_28 Invoked with origin=cluster_upgrade custom_id=320069251 storage_domain=None description=Upgrade of cluster Default failed. state=present severity=error user=None poll_interval=3 vm=None auth={'timeout': 0, 'url': 'https://hostedenginesm3.lab.eng.blr.redhat.com/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'dC9d1hXBkFgfz-bgBkEI3rt4eF58m4qmyeD5pMBQIfYQ76zV9YPi-C8YyGbKYGgpke9N-NCMertX6M8be_pv3A', 'ca_file': '/etc/pki/ovirt-engine/ca.pem'} cluster=7e69dcec-4933-11e9-ac17-004755204901 fetch_nested=False nested_attributes=[] timeout=180 data_center=None host=None template=None id=None wait=True Mar 18 13:59:32 hostedenginesm3 python2: ansible-ovirt_cluster Invoked with comment=None ha_reservation=None fence_skip_if_connectivity_broken=None mac_pool=None virt=None threads_as_cores=None gluster=None vm_reason=None fetch_nested=False migration_bandwidth_limit=None switch_type=None data_center=None ksm_numa=None scheduling_policy_properties=[{'name': 'HighUtilization', 'value': '80'}, {'name': 'CpuOverCommitDurationMinutes', 'value': '2'}] description=None cpu_arch=None rng_sources=None network=None state=present ksm=None external_network_providers=None migration_compressed=None ballooning=None migration_auto_converge=None fence_enabled=None migration_policy=None auth={'timeout': 0, 'url': 'https://hostedenginesm3.lab.eng.blr.redhat.com/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'dC9d1hXBkFgfz-bgBkEI3rt4eF58m4qmyeD5pMBQIfYQ76zV9YPi-C8YyGbKYGgpke9N-NCMertX6M8be_pv3A', 'ca_file': '/etc/pki/ovirt-engine/ca.pem'} resilience_policy=None fence_connectivity_threshold=None spice_proxy=None nested_attributes=[] memory_policy=None migration_bandwidth=None fence_skip_if_sd_active=None scheduling_policy=none wait=True compatibility_version=None serial_policy_value=None name=Default host_reason=None poll_interval=3 cpu_type=None timeout=180 serial_policy=None trusted_service=None
This needs to be tested and based on the results, it needs to be re-targeted
Tested with ovirt-ansible-cluster-upgrade-1.2.3 and RHV Manager 4.4.1. The feature works good. It updates the cluster and proceeds to upgrade all the hosts in the cluster. As there are no real upgrade image is available, all the testing is done with interim build RHVH images Host running HE is also getting upgraded as the HE is moved to another active host in the cluster
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHHI for Virtualization 1.8 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:3314