Bug 1536286
| Summary: | Hosted Engine HA state is in Local Maintenance when upgrading RHV-H | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Germano Veit Michel <gveitmic> |
| Component: | ovirt-engine | Assignee: | Denis Chaplygin <dchaplyg> |
| Status: | CLOSED DUPLICATE | QA Contact: | Ying Cui <ycui> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.1.8 | CC: | alukiano, aperotti, bugs, dchaplyg, dfediuck, jbelka, knarra, lsurette, mavital, mkalinin, msivak, nsednev, nsoffer, rbalakri, rgolan, Rhev-m-bugs, srevivo, stirabos, ycui, ykaul, ylavi |
| Target Milestone: | ovirt-4.1.10 | Flags: | lsvaty:
testing_plan_complete-
|
| Target Release: | --- | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 1469143 | Environment: | |
| Last Closed: | 2018-02-28 13:19:05 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | SLA | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1469143, 1489982 | ||
| Bug Blocks: | 1540310 | ||
|
Description
Germano Veit Michel
2018-01-19 03:20:56 UTC
Germano, sounds like my bz#1489982. (In reply to Marina from comment #2) > Germano, sounds like my bz#1489982. Indeed. So your BZ was a dup of Bug #1469143, which wasn't closed when it was fixed. Then I cloned the original BZ downstream. Also your BZ says this was fixed in 4.2, but the original BZ is targetted to 4.3 And I reproduced this 4.1.8. Can it get any more confusing? ;) Should we close them all or we want to get this fixed in 4.1.10? I think it should be fixed in 4.1.10 too as after a round of upgrades all HE hosts might be in maintenance mode, defeating HA, so it's quite serious. What do you think? This is severe and should not be targeted so far in the future. The maintenance mode for HE should be lock the the engine maintenance mode, if the engine is up. Maintaining this in upgrade it elementary. Retargeting. Nikolai, we need to figure out if this is still broken and where. Can you please try reproducing it with 4.1.8 -> 4.1.9 upgrade? It might be RHEV-H specific too. (In reply to Martin Sivák from comment #6) > Nikolai, we need to figure out if this is still broken and where. Can you > please try reproducing it with 4.1.8 -> 4.1.9 upgrade? It might be RHEV-H > specific too. Its HC specific issue. Kasturi Narra, please provide your input. Jiri, have you seen such an issue during your latest upgrade set of tests? Hey, why is it HC specific? I believe what happens here is when the host is out of engine side maintenance due to upgrade or reinstall, it should also cancel the HE local maintenance, that's all. Today it enables local HE maintenance once we put the host in maintenance in RHV UI, through engine, but it never cancel's the HE maintenace when it is auto-activated back on the engine side. And this is the problem. (In reply to Marina from comment #9) > Hey, why is it HC specific? > I believe what happens here is when the host is out of engine side > maintenance due to upgrade or reinstall, it should also cancel the HE local > maintenance, that's all. Today it enables local HE maintenance once we put > the host in maintenance in RHV UI, through engine, but it never cancel's the > HE maintenace when it is auto-activated back on the engine side. And this is > the problem. So there is a confirmation from your side that this is not HC specific.Regular RHEL/RHVH ha-hosts will be hitting the same issue during the upgrade. Martin, please review Comment #9. Nikolai, we asked for a test of this to see if it really is happening and where. There are conflicting information with regards to RHEV-H and branches (4.1 vs 4.2). Since all we have now are opinions, I would like someone from QE to provide some hard data, before we decide what do to with all the linked bugs. Before upgrade
==============
# nodectl info
layers:
rhvh-4.1-0.20180102.0:
rhvh-4.1-0.20180102.0+1
bootloader:
default: rhvh-4.1-0.20180102.0+1
entries:
rhvh-4.1-0.20180102.0+1:
index: 0
title: rhvh-4.1-0.20180102.0
kernel: /boot/rhvh-4.1-0.20180102.0+1/vmlinuz-3.10.0-693.11.6.el7.x86_64
args: "ro crashkernel=auto rd.lvm.lv=rhvh_alma05/rhvh-4.1-0.20180102.0+1 rd.lvm.lv=rhvh_alma05/swap rhgb quiet LANG=en_US.UTF-8 img.bootid=rhvh-4.1-0.20180102.0+1"
initrd: /boot/rhvh-4.1-0.20180102.0+1/initramfs-3.10.0-693.11.6.el7.x86_64.img
root: /dev/rhvh_alma05/rhvh-4.1-0.20180102.0+1
current_layer: rhvh-4.1-0.20180102.0+1
After upgrade
=============
# nodectl info
layers:
rhvh-4.1-0.20180126.0:
rhvh-4.1-0.20180126.0+1
rhvh-4.1-0.20180102.0:
rhvh-4.1-0.20180102.0+1
bootloader:
default: rhvh-4.1-0.20180126.0+1
entries:
rhvh-4.1-0.20180102.0+1:
index: 1
title: rhvh-4.1-0.20180102.0
kernel: /boot/rhvh-4.1-0.20180102.0+1/vmlinuz-3.10.0-693.11.6.el7.x86_64
args: "ro crashkernel=auto rd.lvm.lv=rhvh_alma06/swap rd.lvm.lv=rhvh_alma06/rhvh-4.1-0.20180102.0+1 rhgb quiet LANG=en_US.UTF-8 img.bootid=rhvh-4.1-0.20180102.0+1"
initrd: /boot/rhvh-4.1-0.20180102.0+1/initramfs-3.10.0-693.11.6.el7.x86_64.img
root: /dev/rhvh_alma06/rhvh-4.1-0.20180102.0+1
rhvh-4.1-0.20180126.0+1:
index: 0
title: rhvh-4.1-0.20180126.0
kernel: /boot/rhvh-4.1-0.20180126.0+1/vmlinuz-3.10.0-693.17.1.el7.x86_64
args: "ro crashkernel=auto rd.lvm.lv=rhvh_alma06/swap rd.lvm.lv=rhvh_alma06/rhvh-4.1-0.20180126.0+1 rhgb quiet LANG=en_US.UTF-8 img.bootid=rhvh-4.1-0.20180126.0+1"
initrd: /boot/rhvh-4.1-0.20180126.0+1/initramfs-3.10.0-693.17.1.el7.x86_64.img
root: /dev/rhvh_alma06/rhvh-4.1-0.20180126.0+1
current_layer: rhvh-4.1-0.20180126.0+1
1) Host UP
2) Host has repository with new packages
Check for available updates on host alma06.qa.lab.tlv.redhat.com was completed successfully with message 'found updates for packages redhat-virtualization-host-image-update-4.1-20180126.0.el7_4'.
3) Click on Upgrade link "A new version is available. Upgrade"
Feb 19, 2018 1:53:21 PM
Host alma06.qa.lab.tlv.redhat.com upgrade was completed successfully.
Feb 19, 2018 1:53:20 PM
Host alma06.qa.lab.tlv.redhat.com was restarted using SSH by the engine.
Feb 19, 2018 1:53:19 PM
Installing Host alma06.qa.lab.tlv.redhat.com. Stage: Termination.
Feb 19, 2018 1:53:19 PM
Installing Host alma06.qa.lab.tlv.redhat.com. Retrieving installation logs to: '/var/log/ovirt-engine/host-deploy/ovirt-host-mgmt-20180219065319-alma06.qa.lab.tlv.redhat.com-f75d262d-cc5f-4d2c-bf2d-4ddc0c24988c.log'.
Feb 19, 2018 1:53:19 PM
Installing Host alma06.qa.lab.tlv.redhat.com. Stage: Pre-termination.
Feb 19, 2018 1:53:19 PM
Installing Host alma06.qa.lab.tlv.redhat.com. Stage: Closing up.
Feb 19, 2018 1:53:19 PM
Installing Host alma06.qa.lab.tlv.redhat.com. Stage: Transaction commit.
Feb 19, 2018 1:53:19 PM
Installing Host alma06.qa.lab.tlv.redhat.com. Stage: Misc configuration.
Feb 19, 2018 1:53:18 PM
Installing Host alma06.qa.lab.tlv.redhat.com. Yum Verify: 2/2: redhat-virtualization-host-image-update-placeholder.noarch 0:4.1-8.1.el7 - od.
Feb 19, 2018 1:53:18 PM
Installing Host alma06.qa.lab.tlv.redhat.com. Yum Verify: 1/2: redhat-virtualization-host-image-update.noarch 0:4.1-20180126.0.el7_4 - u.
Feb 19, 2018 1:53:18 PM
Installing Host alma06.qa.lab.tlv.redhat.com. Yum erase: 2/2: redhat-virtualization-host-image-update-placeholder.
Feb 19, 2018 1:45:28 PM
Installing Host alma06.qa.lab.tlv.redhat.com. Yum obsoleting: 1/2: redhat-virtualization-host-image-update-4.1-20180126.0.el7_4.noarch.
Feb 19, 2018 1:45:28 PM
Installing Host alma06.qa.lab.tlv.redhat.com. Yum Status: Running Transaction.
Feb 19, 2018 1:45:28 PM
Installing Host alma06.qa.lab.tlv.redhat.com. Yum Status: Running Test Transaction.
4) Host UP under the engine, but has LocalMaintenace state under hosted-engine --vm-status
--== Host 2 status ==--
conf_on_shared_storage : True
Status up-to-date : True
Hostname : alma06.qa.lab.tlv.redhat.com
Host ID : 2
Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"}
Score : 0
stopped : False
Local maintenance : True
crc32 : 69c202ba
local_conf_timestamp : 3758
Host timestamp : 3758
Extra metadata (valid at timestamp):
metadata_parse_version=1
metadata_feature_version=1
timestamp=3758 (Mon Feb 19 15:07:31 2018)
host-id=2
score=0
vm_conf_refresh_time=3758 (Mon Feb 19 15:07:31 2018)
conf_on_shared_storage=True
maintenance=True
state=LocalMaintenance
stopped=False
I wonder if this could be closed as DUP of #1489982 Can you please check what the states are when you upgrade a RHEV-H host? 1) You put the host to maintenance using webadmin button 2) You update the node 3) The node reboots 4) Does it stay in maintenance mode (in engine) or does it move to Up automatically? It stays in maintenance state. *** This bug has been marked as a duplicate of bug 1489982 *** BZ<2>Jira re-sync |