+++ This bug was initially created as a clone of Bug #1679397 +++ Description of problem: ======================= While upgrading from RHV-4.2 to RHV-4.3, the upgrade fails in RHHI-V enviornment. Also could see the filesystem under /var/* getting unmounted during the process and never gets mounted back unless me manually remount. Version-Release number of selected component (if applicable): ============================================================ Upgrade path from rhv 4.2.8.3-0 to rhv 4.3.0.4-0.1 How reproducible: ================ Everytime Steps to Reproduce: ================== 1.Install rhvh-4.2.8.3-0 on the RHHI-V hypervisors 2.Upgrade to the rhv 4.3 redhat-virtualization-host image in the UI 3.Fails to upgrade Actual results: ============== The upgrade fails Expected results: ================ Should successfully complete Additional info: =============== COuld see the issue multiple times. --- Additional comment from RHEL Product and Program Management on 2019-02-21 05:03:57 UTC --- This bug is automatically being proposed for RHHI-V 1.6 release at Red Hat Hyperconverged Infrastructure for Virtualization product, by setting the release flag 'rhiv‑1.6' to '?'. If this bug should be proposed for a different release, please manually change the proposed release flag. --- Additional comment from bipin on 2019-02-21 05:10:21 UTC --- Messages: ======== Feb 21 09:42:14 rhsqa-grafton7-nic2 vdsm[14053]: ERROR FINISH thread <Thread(logfile, started daemon 140162120496896)> failed#012Traceback (most recent call last):#012 File "/usr/lib/python2.7/site-packages/vdsm/common/concurrent.py", line 194, in run#012 ret = func(*args, **kwargs)#012 File "/usr/lib/python2.7/site-packages/vdsm/common/logutils.py", line 336, in _run#012 self._target.handle(record)#012 File "/usr/lib64/python2.7/logging/__init__.py", line 749, in handle#012 self.emit(record)#012 File "/usr/lib64/python2.7/logging/handlers.py", line 429, in emit#012 self.stream = self._open()#012 File "/usr/lib/python2.7/site-packages/vdsm/common/logutils.py", line 131, in _open#012 return logging.handlers.WatchedFileHandler._open(self)#012 File "/usr/lib64/python2.7/logging/__init__.py", line 925, in _open#012 stream = open(self.baseFilename, self.mode)#012IOError: [Errno 2] No such file or directory: '/var/log/vdsm/vdsm.log' Engine log: ========== 019-02-21 09:39:40,691+05 ERROR [org.ovirt.engine.core.services.GlusterEventsWebHookServlet] (default task-4) [] Error processing event data 2019-02-21 09:39:40,691+05 ERROR [org.ovirt.engine.core.services.GlusterEventsWebHookServlet] (default task-2) [] Error processing event data 2019-02-21 09:39:40,712+05 ERROR [org.ovirt.engine.core.services.GlusterEventsWebHookServlet] (default task-2) [] Error processing event data 2019-02-21 09:39:40,712+05 ERROR [org.ovirt.engine.core.services.GlusterEventsWebHookServlet] (default task-1) [] Error processing event data 2019-02-21 09:39:40,831+05 ERROR [org.ovirt.engine.core.services.GlusterEventsWebHookServlet] (default task-1) [] Error processing event data 2019-02-21 09:39:40,835+05 ERROR [org.ovirt.engine.core.services.GlusterEventsWebHookServlet] (default task-1) [] Error processing event data 2019-02-21 09:39:40,917+05 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (EE-ManagedThreadFactory-engineScheduled-Thread-13) [56fca8be] Timer update runtime info failed. Exception: 2019-02-21 09:40:19,229+05 ERROR [org.ovirt.engine.core.services.GlusterEventsWebHookServlet] (default task-2) [] Error processing event data 2019-02-21 09:43:18,435+05 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) [] Connection timeout for host 'rhsqa-grafton7-nic2.lab.eng.blr.redhat.com', last response arrived 22501 ms ago. 2019-02-21 09:45:11,701+05 ERROR [org.ovirt.engine.core.services.GlusterEventsWebHookServlet] (default task-2) [] Error processing event data 2019-02-21 09:45:11,706+05 ERROR [org.ovirt.engine.core.services.GlusterEventsWebHookServlet] (default task-2) [] Error processing event data 2019-02-21 09:45:28,432+05 ERROR [org.ovirt.engine.core.services.GlusterEventsWebHookServlet] (default task-1) [] Error processing event data 2019-02-21 09:45:28,432+05 ERROR [org.ovirt.engine.core.services.GlusterEventsWebHookServlet] (default task-5) [] Error processing event data 2019-02-21 09:45:28,433+05 ERROR [org.ovirt.engine.core.services.GlusterEventsWebHookServlet] (default task-2) [] Error processing event data 2019-02-21 09:45:51,775+05 ERROR [org.ovirt.engine.core.bll.host.HostUpgradeManager] (EE-ManagedThreadFactory-commandCoordinator-Thread-5) [6c26c070-4a20-4dc7-aa67-bdf7c6d9f0fc] Failed to update host 'rhsqa-grafton7-nic2.lab.eng.blr.redhat.com'. 2019-02-21 09:45:51,849+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-commandCoordinator-Thread-5) [6c26c070-4a20-4dc7-aa67-bdf7c6d9f0fc] EVENT_ID: HOST_UPGRADE_FAILED(841), Failed to upgrade Host rhsqa-grafton7-nic2.lab.eng.blr.redhat.com (User: admin@internal-authz). 2019-02-21 09:45:54,186+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-99) [6c26c070-4a20-4dc7-aa67-bdf7c6d9f0fc] EVENT_ID: HOST_UPGRADE_FAILED(841), Failed to upgrade Host rhsqa-grafton7-nic2.lab.eng.blr.redhat.com (User: admin@internal-authz). 2019-02-21 09:50:31,836+05 ERROR [org.ovirt.engine.core.services.GlusterEventsWebHookServlet] (default task-5) [] Error processing event data 2019-02-21 09:50:31,841+05 ERROR [org.ovirt.engine.core.services.GlusterEventsWebHookServlet] (default task-5) [] Error processing event data
Before update to RHV 4.3, following are the mounts: [root@ ~]# df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/rhvh_rhsqa--grafton11--nic2-rhvh--4.2.8.1--0.20190129.0+1 ext4 786G 2.0G 744G 1% / devtmpfs devtmpfs 126G 0 126G 0% /dev tmpfs tmpfs 126G 16K 126G 1% /dev/shm tmpfs tmpfs 126G 2.1G 124G 2% /run tmpfs tmpfs 126G 0 126G 0% /sys/fs/cgroup /dev/mapper/rhvh_rhsqa--grafton11--nic2-var ext4 15G 1.1G 13G 8% /var /dev/mapper/rhvh_rhsqa--grafton11--nic2-home ext4 976M 2.6M 907M 1% /home /dev/mapper/gluster_vg_sdb-gluster_lv_engine xfs 100G 7.3G 93G 8% /gluster_bricks/engine /dev/mapper/rhvh_rhsqa--grafton11--nic2-tmp ext4 976M 2.9M 906M 1% /tmp /dev/sda1 ext4 976M 337M 573M 37% /boot /dev/mapper/rhvh_rhsqa--grafton11--nic2-var_crash ext4 9.8G 37M 9.2G 1% /var/crash <------ This is unmounted post upgrade /dev/mapper/rhvh_rhsqa--grafton11--nic2-var_log ext4 15G 7.5G 6.5G 54% /var/log <------ This is unmounted post upgrade /dev/mapper/rhvh_rhsqa--grafton11--nic2-var_log_audit ext4 2.0G 39M 1.8G 3% /var/log/audit <------ This is unmounted post upgrade /dev/mapper/gluster_vg_sdc-gluster_lv_data xfs 12T 50M 12T 1% /gluster_bricks/data /dev/mapper/gluster_vg_sdc-gluster_lv_vmstore xfs 4.0T 14G 4.0T 1% /gluster_bricks/vmstore rhsqa-grafton10.lab.eng.blr.redhat.com:/engine fuse.glusterfs 100G 8.3G 92G 9% /rhev/data-center/mnt/glusterSD/rhsqa-grafton10.lab.eng.blr.redhat.com:_engine 10.70.45.32:/vmstore fuse.glusterfs 4.0T 55G 4.0T 2% /rhev/data-center/mnt/glusterSD/10.70.45.32:_vmstore 10.70.45.32:/data fuse.glusterfs 12T 123G 12T 2% /rhev/data-center/mnt/glusterSD/10.70.45.32:_data tmpfs tmpfs 26G 0 26G 0% /run/user/0 [root@rhsqa-grafton11 ~]# rpm -qa| grep host-image-update redhat-virtualization-host-image-update-placeholder-4.2-8.1.el7.noarch redhat-virtualization-host-image-update-4.2-20190129.0.el7_6.noarch After upgrade: [root@rhsqa-grafton11 ~]# df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/mapper/rhvh_rhsqa--grafton11--nic2-rhvh--4.2.8.1--0.20190129.0+1 ext4 786G 2.6G 744G 1% / devtmpfs devtmpfs 126G 0 126G 0% /dev tmpfs tmpfs 126G 16K 126G 1% /dev/shm tmpfs tmpfs 126G 2.1G 124G 2% /run tmpfs tmpfs 126G 0 126G 0% /sys/fs/cgroup /dev/mapper/rhvh_rhsqa--grafton11--nic2-var ext4 15G 1.3G 13G 10% /var /dev/mapper/rhvh_rhsqa--grafton11--nic2-home ext4 976M 2.6M 907M 1% /home /dev/mapper/gluster_vg_sdb-gluster_lv_engine xfs 100G 7.3G 93G 8% /gluster_bricks/engine /dev/mapper/rhvh_rhsqa--grafton11--nic2-tmp ext4 976M 2.9M 906M 1% /tmp /dev/sda1 ext4 976M 250M 660M 28% /boot tmpfs tmpfs 26G 0 26G 0% /run/user/0 /dev/mapper/gluster_vg_sdc-gluster_lv_data xfs 12T 50M 12T 1% /gluster_bricks/data /dev/mapper/gluster_vg_sdc-gluster_lv_vmstore xfs 4.0T 14G 4.0T 1% /gluster_bricks/vmstore [root@ ~]# rpm -qa | grep host-im redhat-virtualization-host-image-update-4.3-20190221.0.el7_6.noarch
Errors from engine.log <snip> 2019-02-22 18:42:03,713+05 INFO [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [282ae240-799c-44bd-a4ad-5d3e84a0ab13] Ansible playbook command has exited with value: 2 2019-02-22 18:42:03,713+05 ERROR [org.ovirt.engine.core.bll.host.HostUpgradeManager] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [282ae240-799c-44bd-a4ad-5d3e84a0ab13] Failed to update host 'rhsqa-graf ton11-nic2.lab.eng.blr.redhat.com'. 2019-02-22 18:42:03,716+05 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [282ae240-799c-44bd-a4ad-5d3e84a0ab13] START, SetVdsStatusVDSComman d(HostName = rhsqa-grafton11-nic2.lab.eng.blr.redhat.com, SetVdsStatusVDSCommandParameters:{hostId='b6f3758b-8c30-4cb7-928a-c441708b0659', status='InstallFailed', nonOperationalReason='NONE', stopSpmFailureLogge d='false', maintenanceReason='null'}), log id: 4a18bd65 2019-02-22 18:42:03,722+05 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [282ae240-799c-44bd-a4ad-5d3e84a0ab13] FINISH, SetVdsStatusVDSComma nd, return: , log id: 4a18bd65 2019-02-22 18:42:03,741+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-commandCoordinator-Thread-9) [282ae240-799c-44bd-a4ad-5d3e84a0ab13] EVENT_ID: HOST _UPGRADE_FAILED(841), Failed to upgrade Host rhsqa-grafton11-nic2.lab.eng.blr.redhat.com (User: admin@internal-authz). 2019-02-22 18:42:08,736+05 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler8) [54148dce] START, GlusterServersListVDSCommand(HostName = rhsqa-grafton12-nic2.la b.eng.blr.redhat.com, VdsIdVDSCommandParametersBase:{hostId='9a75e0ea-7785-42be-b90b-9a920c0994e8'}), log id: db7c0d 2019-02-22 18:42:09,074+05 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler8) [54148dce] FINISH, GlusterServersListVDSCommand, return: [10.70.45.34/22:CONNECTE D, rhsqa-grafton11.lab.eng.blr.redhat.com:DISCONNECTED, rhsqa-grafton10.lab.eng.blr.redhat.com:CONNECTED], log id: db7c0d 2019-02-22 18:42:09,079+05 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler8) [54148dce] START, GlusterVolumesListVDSCommand(HostName = rhsqa-grafton12-nic2.la b.eng.blr.redhat.com, GlusterVolumesListVDSParameters:{hostId='9a75e0ea-7785-42be-b90b-9a920c0994e8'}), log id: 2935efc4 2019-02-22 18:42:09,315+05 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler8) [54148dce] Could not associate brick 'rhsqa-grafton10.lab.eng.blr.redhat.com:/gluster _bricks/engine/engine' of volume '7496dcc7-626e-49b2-a899-b89a2476cff1' with correct network as no gluster network found in cluster '6898ec14-296c-11e9-af05-004855204901' 2019-02-22 18:42:09,325+05 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler8) [54148dce] Could not associate brick 'rhsqa-grafton10.lab.eng.blr.redhat.com:/gluster _bricks/data/data' of volume '6fc37b8e-fdb7-448e-8432-62fa85bd99d4' with correct network as no gluster network found in cluster '6898ec14-296c-11e9-af05-004855204901' 2019-02-22 18:42:09,333+05 WARN [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler8) [54148dce] Could not associate brick 'rhsqa-grafton10.lab.eng.blr.redhat.com:/gluster _bricks/vmstore/vmstore' of volume 'cdb4006a-6379-444b-9d6a-d3126db5c984' with correct network as no gluster network found in cluster '6898ec14-296c-11e9-af05-004855204901' 2019-02-22 18:42:09,337+05 INFO [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler8) [54148dce] FINISH, GlusterVolumesListVDSCommand, return: {6fc37b8e-fdb7-448e-8432 -62fa85bd99d4=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@163d2c36, cdb4006a-6379-444b-9d6a-d3126db5c984=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@3e 9805a7, 7496dcc7-626e-49b2-a899-b89a2476cff1=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@67f3ad42}, log id: 2935efc4 2019-02-22 18:42:10,325+05 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engineScheduled-Thread-95) [282ae240-799c-44bd-a4ad-5d3e84a0ab13] EVENT_ID: HOST_U PGRADE_FAILED(841), Failed to upgrade Host rhsqa-grafton11-nic2.lab.eng.blr.redhat.com (User: admin@internal-authz). </snip>
/var/log/messages may carry a clue and attaching the same to this bug
Created attachment 1537591 [details] /var/log/messages on RHVH node
I feel this should more related to RHV/oVirt problem, as the upgrade is kicked in from RHV Manager UI
@kaustav can you please check this as priority?
After upgrading the node, a reboot is indeed required, but if the upgrade failed, can you please attach the imgbased.log ?
Created attachment 1539421 [details] Log
Setting back need-info on Kaustav. Canceled by mistake
The log looks fine, but a reboot is needed of course
Could you please provides logs from /var/log/ovirt-engine/host-deploy? They should give us more details why upgrade failed ...
(In reply to Yuval Turgeman from comment #11) > The log looks fine, but a reboot is needed of course Usually when the upgrade triggered from RHV Manager UI, the reboot is automatically triggered post the upgrade. But this is not happening, with error stating in the UI
Created attachment 1540851 [details] Yum.log Providing the yum.log
adding back the needinfo on rnori.
If you did not reboot, you'll need to mount the var LV for the logs
Just adding back the needinfos... :)
Hi Sandro, I see the bug in ON_QA. Also see the patch in master not in 4.3.
(In reply to bipin from comment #34) > Hi Sandro, > > I see the bug in ON_QA. Also see the patch in master not in 4.3. Engine is going to be branched today, so all current releases are still built from master
*** Bug 1683161 has been marked as a duplicate of this bug. ***
Tested upgrade from RHV 4.2.8 to RHV 4.3 (rhvh-4.3.0.5-0.20190328.0+1) No issues observed, but the VDO errors while stopping the VDO service is noticed, but that doesn't stopped the upgrade.
This bugzilla is included in oVirt 4.3.2 release, published on March 19th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.