Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1057232

Summary: engine: cannot change to a different cpu family after installation without removing host from cluster
Product: [Retired] oVirt Reporter: Dafna Ron <dron>
Component: ovirt-engine-coreAssignee: Martin Perina <mperina>
Status: CLOSED CURRENTRELEASE QA Contact: Lukas Svaty <lsvaty>
Severity: medium Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: acathrow, bazulay, emesika, gklein, iheim, knesenko, lsvaty, mavital, michal.skrivanek, nsednev, yeylon
Target Milestone: ---Keywords: Reopened
Target Release: 3.4.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: virt
Fixed In Version: ovirt-3.4.0-ga Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-03-31 12:31:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log none

Description Dafna Ron 2014-01-23 16:52:24 UTC
Created attachment 854478 [details]
log

Description of problem:

I installed a host with intel cpu type. 
after installation when host becomes non-operational I can change the host's cpu to a different cpu in the same family but if I try to move to AMD we get aCanDoAction: 

014-01-23 11:40:05,742 WARN  [org.ovirt.engine.core.bll.UpdateVdsGroupCommand] (ajp--127.0.0.1-8702-7) [c1d0213] CanDoAction of action UpdateVdsGroup failed. Reasons:VAR__TYPE__CLUSTER,VAR__ACTION__UPDATE,VDS_GROUP_CANNOT_UPDATE_CPU_ILLEGAL
2014-01-23 11:40:14,357 WARN  [org.ovirt.engine.core.bll.UpdateVdsGroupCommand] (ajp--127.0.0.1-8702-5) [5a017050] CanDoAction of action UpdateVdsGroup failed. Reasons:VAR__TYPE__CLUSTER,VAR__ACTION__UPDATE,VDS_GROUP_CANNOT_UPDATE_CPU_ILLEGAL

webadmin error: 

Error while executing action: Cannot change Cluster CPU type when there are Hosts attached to this Cluster.

Version-Release number of selected component (if applicable):

ovirt-engine-3.4.0-0.5.beta1.el6.noarch

How reproducible:

100%

Steps to Reproduce:
1. install AMD host with intel family. 
2. after installation -> move host to maintenance -> change cpu to a different intel cpu (we succeed) 
3. move host to maintenance -> change cpu to different family (we fail)

Actual results:

we cannot change cpu family without removing host from cluster

Expected results:

if host is in maintenance we should be able to change family. 

Additional info: log

Comment 1 Itamar Heim 2014-01-26 08:11:30 UTC
Setting target release to current version for consideration and review. please
do not push non-RFE bugs to an undefined target release to make sure bugs are
reviewed for relevancy, fix, closure, etc.

Comment 2 Martin Perina 2014-02-17 10:48:16 UTC
Changing cluster CPU family when all hosts are in Maintenance is IMO problematic. The only case where it's useful is the one in reproducing steps (select wrong CPU family when creating cluster and find this out when 1st host is installed). In all other cases (you have running cluster, put all hosts to Maintenance and try to change CPU family) I don't see how this can be needed in production.

And all these changing CPU family cases can be solved using this: create new cluster, move hosts to new cluster, delete old cluster.

So I close this as WONTFIX, but if some more reasons are found to implement this, please feel free to reopen.

Comment 3 Dafna Ron 2014-02-17 11:53:14 UTC
I am not sure why this limitation has been added or why changing a cpu family is suddenly problematic. 
if a host is inconsistent with the cpu selected it will become non-operational - this has always been the flow and it's still the flow now.  
I don't see a reason why a user has to create a new cluster in case they made a mistake in the cpu family. not very user friendly in my eyes and I assure you it can happen.

Comment 4 Martin Perina 2014-02-17 12:08:35 UTC
(In reply to Dafna Ron from comment #3)
> I am not sure why this limitation has been added or why changing a cpu
> family is suddenly problematic. 

It's not problematic to implement this. But except case provided in reproducing steps, I don't see any other production cases where this would be beneficial.

> if a host is inconsistent with the cpu selected it will become
> non-operational - this has always been the flow and it's still the flow now.

That's true

> 
> I don't see a reason why a user has to create a new cluster in case they
> made a mistake in the cpu family. not very user friendly in my eyes and I
> assure you it can happen.

IMO case provided in reproducing steps is corner case. In all other cases IMO allowing to change CPU family can cause more harm then good ...

Comment 5 Dafna Ron 2014-02-17 12:50:56 UTC
(In reply to Martin Perina from comment #4)
> (In reply to Dafna Ron from comment #3)
> > I am not sure why this limitation has been added or why changing a cpu
> > family is suddenly problematic. 
> 
> It's not problematic to implement this. But except case provided in
> reproducing steps, I don't see any other production cases where this would
> be beneficial.
> 
> > if a host is inconsistent with the cpu selected it will become
> > non-operational - this has always been the flow and it's still the flow now.
> 
> That's true
> 
> > 
> > I don't see a reason why a user has to create a new cluster in case they
> > made a mistake in the cpu family. not very user friendly in my eyes and I
> > assure you it can happen.

not true... making a mistake in selecting the family is not a corner case. 
> 
> IMO case provided in reproducing steps is corner case. In all other cases
> IMO allowing to change CPU family can cause more harm then good ...

Please elaborate? do we have any bugs for other more destructive cases?

Comment 6 Martin Perina 2014-02-17 13:14:10 UTC
(In reply to Dafna Ron from comment #5)
> (In reply to Martin Perina from comment #4)
> > (In reply to Dafna Ron from comment #3)
> > > I am not sure why this limitation has been added or why changing a cpu
> > > family is suddenly problematic. 
> > 
> > It's not problematic to implement this. But except case provided in
> > reproducing steps, I don't see any other production cases where this would
> > be beneficial.
> > 
> > > if a host is inconsistent with the cpu selected it will become
> > > non-operational - this has always been the flow and it's still the flow now.
> > 
> > That's true
> > 
> > > 
> > > I don't see a reason why a user has to create a new cluster in case they
> > > made a mistake in the cpu family. not very user friendly in my eyes and I
> > > assure you it can happen.
> 
> not true... making a mistake in selecting the family is not a corner case. 

For me it's corner case if you count numbers of correct and incorrect CPU family settings once defining cluster. Please ask yourself, how many times did you make such a mistake when creating cluster?

> > 
> > IMO case provided in reproducing steps is corner case. In all other cases
> > IMO allowing to change CPU family can cause more harm then good ...
> 
> Please elaborate? do we have any bugs for other more destructive cases?

If I omit case described in reproducing steps, I can see these cases:

1) Cluster is empty (no hosts) -> you can change any values you want (that's true even for current code)

2) Cluster has some hosts. Now I would make an assumption that some hosts are Up so this won't be the same case as in reproducing steps. Agreed? So I will move all my hosts to Maintenance, change CPU family and Activate them again. Result: all hosts became Non responsive due to incompatible CPU family. So why would I want to allow this change?

Comment 7 Dafna Ron 2014-02-17 13:31:55 UTC
(In reply to Martin Perina from comment #6)
> (In reply to Dafna Ron from comment #5)
> > (In reply to Martin Perina from comment #4)
> > > (In reply to Dafna Ron from comment #3)
> > > > I am not sure why this limitation has been added or why changing a cpu
> > > > family is suddenly problematic. 
> > > 
> > > It's not problematic to implement this. But except case provided in
> > > reproducing steps, I don't see any other production cases where this would
> > > be beneficial.
> > > 
> > > > if a host is inconsistent with the cpu selected it will become
> > > > non-operational - this has always been the flow and it's still the flow now.
> > > 
> > > That's true
> > > 
> > > > 
> > > > I don't see a reason why a user has to create a new cluster in case they
> > > > made a mistake in the cpu family. not very user friendly in my eyes and I
> > > > assure you it can happen.
> > 
> > not true... making a mistake in selecting the family is not a corner case. 
> 
> For me it's corner case if you count numbers of correct and incorrect CPU
> family settings once defining cluster. Please ask yourself, how many times
> did you make such a mistake when creating cluster?

lots of times actually :) user error happens a lot which is why we have a very nice error in the webadmin for host moved to not operational state because of wrong cpu type after installation. 
> 
> > > 
> > > IMO case provided in reproducing steps is corner case. In all other cases
> > > IMO allowing to change CPU family can cause more harm then good ...
> > 
> > Please elaborate? do we have any bugs for other more destructive cases?
> 
> If I omit case described in reproducing steps, I can see these cases:
> 
> 1) Cluster is empty (no hosts) -> you can change any values you want (that's
> true even for current code)

not sure what you mean in this... this is not a destructive case. I know we can change the cluster family if there are no hosts in the cluster - not really related to this case. 
> 
> 2) Cluster has some hosts. Now I would make an assumption that some hosts
> are Up so this won't be the same case as in reproducing steps. Agreed? So I
> will move all my hosts to Maintenance, change CPU family and Activate them
> again. Result: all hosts became Non responsive due to incompatible CPU
> family. So why would I want to allow this change?

1. wrong cpu moves hosts to 'non-operational' and not to 'not-responsive' and it has a very clear error in the webadmin. 
2. again, not really sure what is destructive about this case? if the user changed the cpu type while the hosts were in maintenance and gets an error on wrong cput type once he starts the hosts it's easily fixed and we have a very clear error message in the UI. however, not allowing a user to change a family without creating a new cluster is not user friendly at all...

Comment 8 Itamar Heim 2014-02-17 13:48:22 UTC
michal - please review

Comment 9 Martin Perina 2014-02-17 13:58:14 UTC
(In reply to Dafna Ron from comment #7)
> (In reply to Martin Perina from comment #6)
> > (In reply to Dafna Ron from comment #5)
> > > (In reply to Martin Perina from comment #4)
> > > > (In reply to Dafna Ron from comment #3)
> > > > > I am not sure why this limitation has been added or why changing a cpu
> > > > > family is suddenly problematic. 
> > > > 
> > > > It's not problematic to implement this. But except case provided in
> > > > reproducing steps, I don't see any other production cases where this would
> > > > be beneficial.
> > > > 
> > > > > if a host is inconsistent with the cpu selected it will become
> > > > > non-operational - this has always been the flow and it's still the flow now.
> > > > 
> > > > That's true
> > > > 
> > > > > 
> > > > > I don't see a reason why a user has to create a new cluster in case they
> > > > > made a mistake in the cpu family. not very user friendly in my eyes and I
> > > > > assure you it can happen.
> > > 
> > > not true... making a mistake in selecting the family is not a corner case. 
> > 
> > For me it's corner case if you count numbers of correct and incorrect CPU
> > family settings once defining cluster. Please ask yourself, how many times
> > did you make such a mistake when creating cluster?
> 
> lots of times actually :) user error happens a lot which is why we have a
> very nice error in the webadmin for host moved to not operational state
> because of wrong cpu type after installation. 

OK, I tested on myself and it never occurred to me ...

> > 
> > > > 
> > > > IMO case provided in reproducing steps is corner case. In all other cases
> > > > IMO allowing to change CPU family can cause more harm then good ...
> > > 
> > > Please elaborate? do we have any bugs for other more destructive cases?
> > 
> > If I omit case described in reproducing steps, I can see these cases:
> > 
> > 1) Cluster is empty (no hosts) -> you can change any values you want (that's
> > true even for current code)
> 
> not sure what you mean in this... this is not a destructive case. I know we
> can change the cluster family if there are no hosts in the cluster - not
> really related to this case. 

I just stated cases that can happen, 'destructive' term is irrelevant here ...

> > 
> > 2) Cluster has some hosts. Now I would make an assumption that some hosts
> > are Up so this won't be the same case as in reproducing steps. Agreed? So I
> > will move all my hosts to Maintenance, change CPU family and Activate them
> > again. Result: all hosts became Non responsive due to incompatible CPU
> > family. So why would I want to allow this change?
> 
> 1. wrong cpu moves hosts to 'non-operational' and not to 'not-responsive'
> and it has a very clear error in the webadmin. 

Correct, but it's not important here. User has to move them Maintenance and change CPU back in order to reactivate them again

> 2. again, not really sure what is destructive about this case? if the user
> changed the cpu type while the hosts were in maintenance and gets an error
> on wrong cput type once he starts the hosts it's easily fixed and we have a
> very clear error message in the UI. however, not allowing a user to change a
> family without creating a new cluster is not user friendly at all...

I didn't said anything about destructive. All I wanted to express: is this kind of error happening so many times that it will be beneficial to allow this change considering that there's an easy workaround?

Comment 10 Dafna Ron 2014-02-17 14:34:32 UTC
Martin, I still cannot understand why we need this CanDoAction. 
if the user selects the wrong cpu family and tries to activate the host - they cannot with a very clear error message. 
if the user tries to change cpu family while the hosts are active - they cannot with a very clear error message. 
this limitation, which was recently added, creates a complication for the user so why not remove it?

Comment 11 Martin Perina 2014-02-18 08:20:15 UTC
(In reply to Dafna Ron from comment #10)
> Martin, I still cannot understand why we need this CanDoAction. 
> if the user selects the wrong cpu family and tries to activate the host -
> they cannot with a very clear error message. 
> if the user tries to change cpu family while the hosts are active - they
> cannot with a very clear error message. 
> this limitation, which was recently added, creates a complication for the
> user so why not remove it?

I looked at code history and this limitation (cannot change CPU family while there are hosts in cluster) exists from the beginning.

As I said before, I can do the change (allow change CPU family while all hosts in cluster are in Maintenance), I'm just not sure if we should allow it.

Please wait for Michal's opinion on this case

Comment 12 Michal Skrivanek 2014-02-27 16:03:48 UTC
if the patch is small/simple i don't see a problem allowing it. Once we opened the gates of hell for hosted engine we can as well allow this:)

Comment 13 Martin Perina 2014-03-03 13:49:45 UTC
Too much automation, merged only to master

Comment 14 Martin Perina 2014-03-12 13:24:33 UTC
Included in oVirt 3.4 RC2

Comment 15 Lukas Svaty 2014-03-18 20:33:58 UTC
verified in av3

Comment 16 Sandro Bonazzola 2014-03-31 12:31:50 UTC
this is an automated message: moving to Closed CURRENT RELEASE since oVirt 3.4.0 has been released