This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 906389 - engine: we are fencing a host when putting it in maintenance after failed reinstall.
engine: we are fencing a host when putting it in maintenance after failed rei...
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.1.2
x86_64 Linux
unspecified Severity high
: ---
: 3.2.0
Assigned To: Eli Mesika
Tareq Alayan
infra
: Regression, Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-01-31 09:55 EST by Dafna Ron
Modified: 2016-02-10 14:29 EST (History)
13 users (show)

See Also:
Fixed In Version: sf14
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-02-18 08:50:56 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Infra
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
log (348.14 KB, application/x-xz)
2013-01-31 09:55 EST, Dafna Ron
no flags Details
engine.log.sf11 (20.14 KB, application/x-gzip)
2013-03-22 06:56 EDT, Tareq Alayan
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 12284 None None None Never
oVirt gerrit 13746 None None None Never
oVirt gerrit 14127 None None None Never

  None (edit)
Description Dafna Ron 2013-01-31 09:55:41 EST
Created attachment 690983 [details]
log

Description of problem:

one of the hosts on rhevm-3 got kernel panic and we decided to reinstall the OS. 
after I reinstalled the OS I tried reinstalling the host in rhevm and the re-install failed. 
when I tried putting the host in maintenance it wend down for reboot. 
looking at the log, since the host changes state to prepare for maintenance we are sending a query to the host and getting network exception, since rhevm network has not yet been installed the host is fenced. 

Version-Release number of selected component (if applicable):

si26.1

How reproducible:

100%

Steps to Reproduce:
1. install a clean rhel on a host which is already installed in rhevm  (host needs to be configured with power management) 
2. do not register the host and try to reinstall it in rhevm
3. after install fails -> put the host in maintenance 
  
Actual results:

host is fenced when we put it in maintenance 

Expected results:

host should not be fenced. 

Additional info:logs

2013-01-31 15:52:09,421 INFO  [org.ovirt.engine.core.bll.MaintananceNumberOfVdssCommand] (pool-3-thread-5) [27e46562] Running command: MaintananceNumberOfVdssCommand internal: false. Entities affected :  ID: 50077e50-ffcf-11e0-9807-0014
5e832c40 Type: VDS
2013-01-31 15:52:09,438 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (pool-3-thread-5) [27e46562] START, SetVdsStatusVDSCommand(HostName = master-vds13, HostId = 50077e50-ffcf-11e0-9807-00145e832c40, status=PreparingFo
rMaintenance, nonOperationalReason=NONE), log id: 6dc3d1c7
2013-01-31 15:52:09,462 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (pool-3-thread-5) [27e46562] FINISH, SetVdsStatusVDSCommand, log id: 6dc3d1c7
2013-01-31 15:52:09,485 ERROR [org.ovirt.engine.core.vdsbroker.VdsManager] (QuartzScheduler_Worker-95) VDS::handleNetworkException Server failed to respond,  vds_id = 50077e50-ffcf-11e0-9807-00145e832c40, vds_name = master-vds13, error 
= VDSNetworkException: 

013-01-31 15:52:09,563 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (QuartzScheduler_Worker-95) vds::refreshVdsStats Failed getVdsStats,  vds = 50077e50-ffcf-11e0-9807-00145e832c40 : master-vds13, error = VDSNetworkExce
ption: VDSNetworkException: 
2013-01-31 15:52:09,580 INFO  [org.ovirt.engine.core.bll.MaintananceVdsCommand] (pool-3-thread-5) [27e46562] Running command: MaintananceVdsCommand internal: true. E


2013-01-31 15:52:09,682 INFO  [org.ovirt.engine.core.bll.FencingExecutor] (pool-3-thread-3) Executing <Status> Power Management command, Proxy Host:master-vds8, Agent:bladecenter, Target Host:master-vds13, Management IP:qabc3-mgmt.qa.la
b.tlv.redhat.com, User:USERID, Options:port,secure=False,slot=6
Comment 2 Barak 2013-02-03 06:00:45 EST
 (In reply to comment #0)
> Created attachment 690983 [details]
> log
> 
> Description of problem:
> 

> How reproducible:
> 
> 100%
> 
> Steps to Reproduce:
> 1. install a clean rhel on a host which is already installed in rhevm  (host
> needs to be configured with power management)

what was the status of the Host before running the yum update ?

 
> 2. do not register the host and try to reinstall it in rhevm

I assume you refer to RHN registration.

> 3. after install fails -> put the host in maintenance 
>
Comment 3 Dafna Ron 2013-02-03 06:57:53 EST
> > 1. install a clean rhel on a host which is already installed in rhevm  (host
> > needs to be configured with power management)
> 
> what was the status of the Host before running the yum update ?

there was no yum update - it was a complete installation of a clean OS. 
but, the host has to be in maintenance state so you can re-install it in rhevm. 
> 
>  
> > 2. do not register the host and try to reinstall it in rhevm
> 
> I assume you refer to RHN registration.
> 

since host which is not registered with RHN will fail the install, it will be an easy way to fail install in the early stages (so yes).
Comment 5 Barak 2013-02-10 05:37:18 EST
This BZ looks like a duplicate of Bug 894231,
Comment 21 Dan Yasny 2013-02-21 07:24:08 EST
Dafna, can you please open a bz for 
> one is that we can use rest to reinstall a host in non-operational state

For this BZ,  block re install for non-operational hosts, and block maintenance for "install failed" hosts will be the solution, as per Comment 19

I'm acking this report with the reqs above
Comment 23 Eli Mesika 2013-02-21 08:04:45 EST
(In reply to comment #21)
> Dafna, can you please open a bz for 
> > one is that we can use rest to reinstall a host in non-operational state
> 
> For this BZ,  block re install for non-operational hosts, 
>and block maintenance for "install failed"

Had checked the code, AFAIK, this is currently working like that ...

> hosts will be the solution, as per Comment
> 19
> 
> I'm acking this report with the reqs above
Comment 24 Eli Mesika 2013-02-25 13:29:57 EST
fixing in commit: 5242f13
Comment 26 Tareq Alayan 2013-03-22 06:54:52 EDT
failed QA (sf12)
================

steps:
Reinstalled host
Failed installation 
put hot in maintenance

from event tab:
2013-Mar-22, 12:50 Host aqua6 is rebooting.
	
2013-Mar-22, 12:50 Host aqua6 was started by Engine.
	
2013-Mar-22, 12:49 Manual fence for host aqua6 was started.
		
2013-Mar-22, 12:49 Host aqua6 was stopped by Engine.
	
2013-Mar-22, 12:49 Host aqua6 is non-responsive.
	
2013-Mar-22, 12:49 Host aqua6 was switched to Maintenance mode by admin@internal.

attaching engine log.
Comment 27 Tareq Alayan 2013-03-22 06:56:53 EDT
Created attachment 714481 [details]
engine.log.sf11
Comment 28 Barak 2013-03-24 06:41:17 EDT
I think the discussion above missed the major point of this scenario.

We must make sure:
- "Install Failed" host should not be allowed to move to maintenance, because there may be various situations under "install failed" that can not be handled.
- Such a host can be removed and than reinstalled. (this should be the official 
way of handling this scenario)
Comment 29 Eli Mesika 2013-03-24 16:40:05 EDT
(In reply to comment #28)
> I think the discussion above missed the major point of this scenario.
> 
> We must make sure:
> - "Install Failed" host should not be allowed to move to maintenance,
> because there may be various situations under "install failed" that can not
> be handled.

This is already implemented in MaintenanceVdsCommand::canDoAction

> - Such a host can be removed and than reinstalled. (this should be the
> official 
> way of handling this scenario)

This is already implemented in RemoveVdsCommand::canDoAction
Comment 30 Eli Mesika 2013-04-02 08:50:44 EDT
(In reply to comment #29)
> (In reply to comment #28)
> > I think the discussion above missed the major point of this scenario.
> > 
> > We must make sure:
> > - "Install Failed" host should not be allowed to move to maintenance,
> > because there may be various situations under "install failed" that can not
> > be handled.
> 
> This is already implemented in MaintenanceVdsCommand::canDoAction
> 
> > - Such a host can be removed and than reinstalled. (this should be the
> > official 
> > way of handling this scenario)
> 
> This is already implemented in RemoveVdsCommand::canDoAction

The above is valid for 3.1 as well...
Comment 31 Eli Mesika 2013-04-06 15:43:02 EDT
Barak, following our talk on this BZ , please advice how to proceed ...
Comment 34 Eli Mesika 2013-04-15 02:57:35 EDT
fixed in commit: f813bb9
Comment 36 Eli Mesika 2013-04-22 08:41:20 EDT
Should fix : enabling moving non-operational host to maintenance
Comment 37 Eli Mesika 2013-04-22 09:51:37 EDT
(In reply to comment #36)
> Should fix : enabling moving non-operational host to maintenance

fixed in commit: 95d2c0e
Comment 38 Tareq Alayan 2013-05-19 09:37:33 EDT
verified. 

- Host failed to install 
- Host can be removed or re-installed
- Host cannot go to Maintenance state
Comment 39 Itamar Heim 2013-06-11 04:22:26 EDT
3.2 has been released
Comment 40 Itamar Heim 2013-06-11 04:24:44 EDT
3.2 has been released

Note You need to log in before you can comment on or make changes to this bug.