Description of problem: I see that when ever an upgrade of RHV-H 4.1.2 to 4.1.3 is done Hosted Engine Ha state is in Local Maintenance. Version-Release number of selected component (if applicable): ovirt-host-deploy-1.6.6-1.el7ev.noarch How reproducible: Always Steps to Reproduce: 1. Install HC setup with RHV-H 4.1.2 async build 2. Now add all the required repos 3. There is an upgrade symbol next to the hypervisor. 4. click on that. Actual results: RHV-H host gets upgraded to 4.1.3 leaving the Hosted Engine HA state in 'Local Maintenance" Expected results: RHV-H host gets upgraded to 4.1.3 leaving the Hosted Engine HA state should not be in 'Local Maintenance" Additional info: Adding hosted-engine --vm-status before and after upgrade: > Output of hosted-engine --vm-status before upgrade: > ======================================================= > > [root@yarrow ~]# hosted-engine --vm-status > > > --== Host 1 status ==-- > > conf_on_shared_storage : True > Status up-to-date : True > Hostname : yarrow.lab.eng.blr.redhat.com > Host ID : 1 > Engine status : {"health": "good", "vm": "up", > "detail": "up"} > Score : 3400 > stopped : False > Local maintenance : False > crc32 : b4359588 > local_conf_timestamp : 75583 > Host timestamp : 75567 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=75567 (Thu Jul 6 15:09:26 2017) > host-id=1 > score=3400 > vm_conf_refresh_time=75583 (Thu Jul 6 15:09:42 2017) > conf_on_shared_storage=True > maintenance=False > state=EngineUp > stopped=False > > > --== Host 2 status ==-- > > conf_on_shared_storage : True > Status up-to-date : True > Hostname : tettnang.lab.eng.blr.redhat.com > Host ID : 2 > Engine status : {"reason": "vm not running on this > host", "health": "bad", "vm": "down", "detail": "unknown"} > Score : 1800 > stopped : False > Local maintenance : False > crc32 : 7bfbbfd5 > local_conf_timestamp : 1440 > Host timestamp : 1423 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=1423 (Thu Jul 6 15:09:07 2017) > host-id=2 > score=1800 > vm_conf_refresh_time=1440 (Thu Jul 6 15:09:23 2017) > conf_on_shared_storage=True > maintenance=False > state=EngineDown > stopped=False > > > --== Host 3 status ==-- > > conf_on_shared_storage : True > Status up-to-date : True > Hostname : zod.lab.eng.blr.redhat.com > Host ID : 3 > Engine status : {"reason": "vm not running on this > host", "health": "bad", "vm": "down", "detail": "unknown"} > Score : 3400 > stopped : False > Local maintenance : False > crc32 : 7caabb48 > local_conf_timestamp : 75597 > Host timestamp : 75581 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=75581 (Thu Jul 6 15:09:23 2017) > host-id=3 > score=3400 > vm_conf_refresh_time=75597 (Thu Jul 6 15:09:39 2017) > conf_on_shared_storage=True > maintenance=False > state=EngineDown > stopped=False > > Output of hosted-engine --vm-status after upgrade: > =================================================== > > [root@yarrow ~]# hosted-engine --vm-status > > > --== Host 1 status ==-- > > conf_on_shared_storage : True > Status up-to-date : True > Hostname : yarrow.lab.eng.blr.redhat.com > Host ID : 1 > Engine status : {"reason": "vm not running on this > host", "health": "bad", "vm": "down", "detail": "unknown"} > Score : 0 > stopped : False > Local maintenance : True > crc32 : bc34659d > local_conf_timestamp : 7624 > Host timestamp : 7608 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=7608 (Thu Jul 6 17:50:33 2017) > host-id=1 > score=0 > vm_conf_refresh_time=7624 (Thu Jul 6 17:50:48 2017) > conf_on_shared_storage=True > maintenance=True > state=LocalMaintenance > stopped=False > > > --== Host 2 status ==-- > > conf_on_shared_storage : True > Status up-to-date : True > Hostname : tettnang.lab.eng.blr.redhat.com > Host ID : 2 > Engine status : {"reason": "vm not running on this > host", "health": "bad", "vm": "down", "detail": "unknown"} > Score : 1800 > stopped : False > Local maintenance : False > crc32 : 521f80d4 > local_conf_timestamp : 11121 > Host timestamp : 11105 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=11105 (Thu Jul 6 17:50:29 2017) > host-id=2 > score=1800 > vm_conf_refresh_time=11121 (Thu Jul 6 17:50:45 2017) > conf_on_shared_storage=True > maintenance=False > state=EngineDown > stopped=False > > > --== Host 3 status ==-- > > conf_on_shared_storage : True > Status up-to-date : True > Hostname : zod.lab.eng.blr.redhat.com > Host ID : 3 > Engine status : {"health": "good", "vm": "up", > "detail": "up"} > Score : 3400 > stopped : False > Local maintenance : False > crc32 : 77b3a2d6 > local_conf_timestamp : 85262 > Host timestamp : 85246 > Extra metadata (valid at timestamp): > metadata_parse_version=1 > metadata_feature_version=1 > timestamp=85246 (Thu Jul 6 17:50:28 2017) > host-id=3 > score=3400 > vm_conf_refresh_time=85262 (Thu Jul 6 17:50:44 2017) > conf_on_shared_storage=True > maintenance=False > state=EngineUp > stopped=False > > > cat /var/lib/ovirt-hosted-engine-ha/ha.conf > local_maintenance=True
Can you check for a regression in the hot activation flow? It is supposed to move the host out of local maintenance.
So it does not a regression in the host activation flow, the problem is: 1) Move host to maintenance via engine(will activate HE "LocalMaintenance" state) 2) Upgrade the host via the engine, after the upgrade host moved straight forward to up state, so from the engine side host is UP, but from the HE side, the host has state "LocalMaintenance", because no one ran activate command on the engine side. See also bug with the similar problem - https://bugzilla.redhat.com/show_bug.cgi?id=1468875
Denis is this going to land in 4.2.0? If not please re-target.
This is severe and should not be targeted so far in the future. The maintenance mode for HE should be lock the the engine maintenance mode, if the engine is up. Maintaining this in upgrade it elementary. Retargeting.
Moving back to ASSIGNED since this patch is on ovirt-host-deploy but no patches are attaced to this bug and no open patches are pushed in gerrit against ovirt-host-deploy. Please update the status of this bug setting the correct product and adding references to the patches being pushed for review.
(In reply to Sandro Bonazzola from comment #5) > Moving back to ASSIGNED since this patch is on ovirt-host-deploy but no > patches are attaced to this bug and no open patches are pushed in gerrit > against ovirt-host-deploy. > Please update the status of this bug setting the correct product and adding > references to the patches being pushed for review. Its tricky, Product should be either ovirt-engine/ovirt-hosted-engine-ha/ovirt-hosted-engine-setup. I have patches for ovirt-hosted-engine-ha and ovirt-hosted-engine-setup. And for this bug (and other bugs) we also need the ovirt-engine patch here as well: https://gerrit.ovirt.org/#/c/86645/
Tested with ovirt-hosted-engine-setup-2.2.19 While updating the host from previous nightly build to other, the state of the node is back to UP.
This bugzilla is included in oVirt 4.2.2 release, published on March 28th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.2 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.