Bug 837539 - Cannot confirm "Host has been rebooted" when having a single host in the system.
Summary: Cannot confirm "Host has been rebooted" when having a single host in the system.
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.0.3
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 3.2.0
Assignee: Eli Mesika
QA Contact: Ilanit Stein
URL:
Whiteboard: infra
: 895827 918463 (view as bug list)
Depends On:
Blocks: Simon-RFE-Tracker 948448
TreeView+ depends on / blocked
 
Reported: 2012-07-04 07:59 UTC by Alexander Chuzhoy
Modified: 2016-02-10 19:43 UTC (History)
17 users (show)

Fixed In Version: sf11
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Attached is the log from the rhevm. (1.01 MB, application/octet-stream)
2012-07-04 08:02 UTC, Alexander Chuzhoy
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 13045 0 None None None Never

Description Alexander Chuzhoy 2012-07-04 07:59:12 UTC
Description of problem:

I have a host (rhevh) in the system and I want to remove it and install it from scratch.
I reinstalled the host without moving it to maintenance and now there is no action I can do in the setup to fix the state and have a host in up state.

more details - 

1. Have a single host in the system holding the SPM role. 
2. Reinstalled the host (via PXE) without bringing it to maintenance mode first.

Currently the host's status is "Non responsive".

I want to remove the host from the setup and re-add it.

I can't remove the host from RHEVM (the button is disabled)
I can't manually confirm the host was rebooted (and release the SPM role which hopefully should enable me to remove the host from the setup)

So there is no real action I can take to get out of this.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Alexander Chuzhoy 2012-07-04 08:02:31 UTC
Created attachment 596166 [details]
Attached is the log from the rhevm.

Comment 3 Barak 2012-07-15 15:31:42 UTC
Sasha,

Did you upgrade the rhevh (not throug rhevm ofcourse) ?

Did you reregister the rhevh ?

Comment 5 lpeer 2012-07-16 08:12:34 UTC
(In reply to comment #3)
> Sasha,
> 
> Did you upgrade the rhevh (not throug rhevm ofcourse) ?

Sasha reinstalled the host - not through RHEVM

> 
> Did you reregister the rhevh ?

he can't register it as the host id is already in RHEVM, he has to remove the host in RHEVM and add a new host to the system.

worth adding the info I got from sasha (when trying to help him with his setup) -
after manually releasing the SPM role in the DB he could mark the host as manually fenced and remove it from his setup.

Comment 6 Eli Mesika 2012-07-19 13:47:11 UTC
(In reply to comment #5)
> (In reply to comment #3)
> > Sasha,
> > 
> > Did you upgrade the rhevh (not throug rhevm ofcourse) ?
> 
> Sasha reinstalled the host - not through RHEVM
> 
> > 
> > Did you reregister the rhevh ?
> 
> he can't register it as the host id is already in RHEVM, he has to remove
> the host in RHEVM and add a new host to the system.
> 
> worth adding the info I got from sasha (when trying to help him with his
> setup) -
> after manually releasing the SPM role in the DB he could mark the host as
> manually fenced and remove it from his setup.

Since I also faced that more than once, I wonder if the solution should be to enable reseting the host SPM flag so that the user cam manual fence mark the "Approve Host was rebootted" checkbox and get the SPM election process run

Comment 7 Barak 2012-07-30 07:30:34 UTC
I assume this is what happened:

1 - The RHEVH was SPM
2 - It was reinstalled via PXE (meaning a totally new installation)
3 - the RHEVH tried to re-register, but since the host UUID already in the RHEVM 
    db than the approval sequence was not an option.
4 - the RHVH stayed in 'Non responsive' due to the lack of appropriate 
    certificate.

Douglas - IIRC a new PXE installation will not upgrade the rhevh automatically, is this correct ?

Comment 8 Douglas Schilling Landgraf 2012-08-01 22:04:00 UTC
Hi Barak/Eli,

   Sorry the delay. By default, it will start TUI with upgrade options. We need to pass reinstall or firstboot on the pxe command line.

Thanks
Douglas

Comment 9 Barak 2012-08-27 09:45:46 UTC
According to comments 7 & 8:

- obviously the above is not a supported scenario.
- The only way out from this status can be 'force-remove' host and than re-registration.
- As force remove has other implications on different flows, and we are late in the game I would move it to rhevm-future (+ adding RN and a KB for it) 

Andrew ?
Itamar ?

Comment 10 Itamar Heim 2012-08-27 19:08:50 UTC
I'm still missing why host can't be manually fenced and moved to maintenance post the manual fence (since manual fence means it is not an spm nor has any VMs), then removed from the system normally?

Comment 20 Simon Grinberg 2012-12-03 11:10:38 UTC
Requirements is simple: 
We've added the user based confirmation in order to allow the user to fence the host himself and inform us that this was done. 

The above should always work regardless of the host 'role' in the system even SPM as long as the host is non-responsive, the engine is trusting the user that he actually did it - like any force operation. This is an engine side only operation.

So:
1. Host is non-responsive 
2. User - 'Confirms host has been rebooted' 
3. Host immediately moves to maintenance with all implications, meaning:
   3.1 All VMs on the host are moved to down. 
   3.2 If it was SPM, the SPM role is cleared 

Use cases covered:
1. Any systems without fencing devices 
2. Single host use case
3. Malfunction fence device

Comment 23 Simon Grinberg 2013-01-21 15:28:49 UTC
*** Bug 895827 has been marked as a duplicate of this bug. ***

Comment 24 Simon Grinberg 2013-03-13 09:51:02 UTC
*** Bug 918463 has been marked as a duplicate of this bug. ***

Comment 28 Eli Mesika 2013-03-18 10:46:10 UTC
fixed in commit: 998c176

Comment 31 Ilanit Stein 2013-03-21 14:57:50 UTC
Is this bug also resolve https://bugzilla.redhat.com/show_bug.cgi?id=918463, marked as it's duplicate?

Comment 32 Eli Mesika 2013-03-24 07:56:46 UTC
(In reply to comment #31)
> Is this bug also resolve https://bugzilla.redhat.com/show_bug.cgi?id=918463,
> marked as it's duplicate?

Yes, In the case of a single host in the system that is SPM

Comment 33 Ilanit Stein 2013-04-04 06:09:55 UTC
Verification on sf-12:  Cover comment 20, except host move to maintenance.

Problem reproduction:

SPM "up" single Host and running VM.
without removing the host from rhevm, host installed on another rhevm,
and turned into "Non Responsive". VM remained "up". Storage  

1. 'Confirm host has been rebooted'

2. Event: "Manual fence did not revoke the selected SPM (cyan-vdse.qa.lab.tlv.redhat.com) since the master storage domain was not active or could not use another host for the fence operation.
SPM cleared to normal, status remained non-responssive."

3. Host didn't move to maintenance, and remained "Non Responsive" !

4. Host SPM role is cleared 

5. VM moved to down

6. Move host manually to maintenance - succeeds

7. Reinstall host in order to turn to "up" eventually.

Comment 34 Eli Mesika 2013-04-04 08:11:26 UTC
(In reply to comment #33)
> Verification on sf-12:  Cover comment 20, except host move to maintenance.
> 
> Problem reproduction:
> 
> SPM "up" single Host and running VM.
> without removing the host from rhevm, host installed on another rhevm,
> and turned into "Non Responsive". VM remained "up". Storage  

How this is related to the original scenario ???
The bug was reported on the same rhevm...


> 
> 1. 'Confirm host has been rebooted'
> 
> 2. Event: "Manual fence did not revoke the selected SPM
> (cyan-vdse.qa.lab.tlv.redhat.com) since the master storage domain was not
> active or could not use another host for the fence operation.
> SPM cleared to normal, status remained non-responssive."
> 
> 3. Host didn't move to maintenance, and remained "Non Responsive" !
> 
> 4. Host SPM role is cleared 
> 
> 5. VM moved to down
> 
> 6. Move host manually to maintenance - succeeds
> 
> 7. Reinstall host in order to turn to "up" eventually.

Comment 35 Ilanit Stein 2013-04-04 13:54:55 UTC
Changing the bug to verified, as confirmed with simong is OK:
1. Scenario used for verification in comment 33.
2. Host move to maintenance is not done automatically by 'Confirm host has been rebooted'. It only clear spm, and user can move it to maintenance right afterwards.

Comment 36 Itamar Heim 2013-06-11 09:35:01 UTC
3.2 has been released

Comment 37 Itamar Heim 2013-06-11 09:35:02 UTC
3.2 has been released

Comment 38 Itamar Heim 2013-06-11 09:50:27 UTC
3.2 has been released


Note You need to log in before you can comment on or make changes to this bug.