Bug 906389
Summary: | engine: we are fencing a host when putting it in maintenance after failed reinstall. | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Dafna Ron <dron> | ||||||
Component: | ovirt-engine | Assignee: | Eli Mesika <emesika> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Tareq Alayan <talayan> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 3.1.2 | CC: | acathrow, bazulay, dyasny, iheim, lpeer, masayag, ofrenkel, pstehlik, Rhev-m-bugs, talayan, yeylon, ykaul, yzaslavs | ||||||
Target Milestone: | --- | Keywords: | Regression, Reopened | ||||||
Target Release: | 3.2.0 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | infra | ||||||||
Fixed In Version: | sf14 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2013-02-18 13:50:56 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Infra | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
(In reply to comment #0) > Created attachment 690983 [details] > log > > Description of problem: > > How reproducible: > > 100% > > Steps to Reproduce: > 1. install a clean rhel on a host which is already installed in rhevm (host > needs to be configured with power management) what was the status of the Host before running the yum update ? > 2. do not register the host and try to reinstall it in rhevm I assume you refer to RHN registration. > 3. after install fails -> put the host in maintenance > > > 1. install a clean rhel on a host which is already installed in rhevm (host > > needs to be configured with power management) > > what was the status of the Host before running the yum update ? there was no yum update - it was a complete installation of a clean OS. but, the host has to be in maintenance state so you can re-install it in rhevm. > > > > 2. do not register the host and try to reinstall it in rhevm > > I assume you refer to RHN registration. > since host which is not registered with RHN will fail the install, it will be an easy way to fail install in the early stages (so yes). This BZ looks like a duplicate of Bug 894231, Dafna, can you please open a bz for > one is that we can use rest to reinstall a host in non-operational state For this BZ, block re install for non-operational hosts, and block maintenance for "install failed" hosts will be the solution, as per Comment 19 I'm acking this report with the reqs above (In reply to comment #21) > Dafna, can you please open a bz for > > one is that we can use rest to reinstall a host in non-operational state > > For this BZ, block re install for non-operational hosts, >and block maintenance for "install failed" Had checked the code, AFAIK, this is currently working like that ... > hosts will be the solution, as per Comment > 19 > > I'm acking this report with the reqs above fixing in commit: 5242f13 failed QA (sf12) ================ steps: Reinstalled host Failed installation put hot in maintenance from event tab: 2013-Mar-22, 12:50 Host aqua6 is rebooting. 2013-Mar-22, 12:50 Host aqua6 was started by Engine. 2013-Mar-22, 12:49 Manual fence for host aqua6 was started. 2013-Mar-22, 12:49 Host aqua6 was stopped by Engine. 2013-Mar-22, 12:49 Host aqua6 is non-responsive. 2013-Mar-22, 12:49 Host aqua6 was switched to Maintenance mode by admin@internal. attaching engine log. Created attachment 714481 [details]
engine.log.sf11
I think the discussion above missed the major point of this scenario. We must make sure: - "Install Failed" host should not be allowed to move to maintenance, because there may be various situations under "install failed" that can not be handled. - Such a host can be removed and than reinstalled. (this should be the official way of handling this scenario) (In reply to comment #28) > I think the discussion above missed the major point of this scenario. > > We must make sure: > - "Install Failed" host should not be allowed to move to maintenance, > because there may be various situations under "install failed" that can not > be handled. This is already implemented in MaintenanceVdsCommand::canDoAction > - Such a host can be removed and than reinstalled. (this should be the > official > way of handling this scenario) This is already implemented in RemoveVdsCommand::canDoAction (In reply to comment #29) > (In reply to comment #28) > > I think the discussion above missed the major point of this scenario. > > > > We must make sure: > > - "Install Failed" host should not be allowed to move to maintenance, > > because there may be various situations under "install failed" that can not > > be handled. > > This is already implemented in MaintenanceVdsCommand::canDoAction > > > - Such a host can be removed and than reinstalled. (this should be the > > official > > way of handling this scenario) > > This is already implemented in RemoveVdsCommand::canDoAction The above is valid for 3.1 as well... Barak, following our talk on this BZ , please advice how to proceed ... fixed in commit: f813bb9 Should fix : enabling moving non-operational host to maintenance (In reply to comment #36) > Should fix : enabling moving non-operational host to maintenance fixed in commit: 95d2c0e verified. - Host failed to install - Host can be removed or re-installed - Host cannot go to Maintenance state 3.2 has been released 3.2 has been released |
Created attachment 690983 [details] log Description of problem: one of the hosts on rhevm-3 got kernel panic and we decided to reinstall the OS. after I reinstalled the OS I tried reinstalling the host in rhevm and the re-install failed. when I tried putting the host in maintenance it wend down for reboot. looking at the log, since the host changes state to prepare for maintenance we are sending a query to the host and getting network exception, since rhevm network has not yet been installed the host is fenced. Version-Release number of selected component (if applicable): si26.1 How reproducible: 100% Steps to Reproduce: 1. install a clean rhel on a host which is already installed in rhevm (host needs to be configured with power management) 2. do not register the host and try to reinstall it in rhevm 3. after install fails -> put the host in maintenance Actual results: host is fenced when we put it in maintenance Expected results: host should not be fenced. Additional info:logs 2013-01-31 15:52:09,421 INFO [org.ovirt.engine.core.bll.MaintananceNumberOfVdssCommand] (pool-3-thread-5) [27e46562] Running command: MaintananceNumberOfVdssCommand internal: false. Entities affected : ID: 50077e50-ffcf-11e0-9807-0014 5e832c40 Type: VDS 2013-01-31 15:52:09,438 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (pool-3-thread-5) [27e46562] START, SetVdsStatusVDSCommand(HostName = master-vds13, HostId = 50077e50-ffcf-11e0-9807-00145e832c40, status=PreparingFo rMaintenance, nonOperationalReason=NONE), log id: 6dc3d1c7 2013-01-31 15:52:09,462 INFO [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (pool-3-thread-5) [27e46562] FINISH, SetVdsStatusVDSCommand, log id: 6dc3d1c7 2013-01-31 15:52:09,485 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-95) VDS::handleNetworkException Server failed to respond, vds_id = 50077e50-ffcf-11e0-9807-00145e832c40, vds_name = master-vds13, error = VDSNetworkException: 013-01-31 15:52:09,563 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-95) vds::refreshVdsStats Failed getVdsStats, vds = 50077e50-ffcf-11e0-9807-00145e832c40 : master-vds13, error = VDSNetworkExce ption: VDSNetworkException: 2013-01-31 15:52:09,580 INFO [org.ovirt.engine.core.bll.MaintananceVdsCommand] (pool-3-thread-5) [27e46562] Running command: MaintananceVdsCommand internal: true. E 2013-01-31 15:52:09,682 INFO [org.ovirt.engine.core.bll.FencingExecutor] (pool-3-thread-3) Executing <Status> Power Management command, Proxy Host:master-vds8, Agent:bladecenter, Target Host:master-vds13, Management IP:qabc3-mgmt.qa.la b.tlv.redhat.com, User:USERID, Options:port,secure=False,slot=6