Bug 1372594

Summary:	The SchedulingDisabled nodes are schedulable after upgrade
Product:	OpenShift Container Platform	Reporter:	Anping Li <anli>
Component:	Cluster Version Operator	Assignee:	Scott Dodson <sdodson>
Status:	CLOSED ERRATA	QA Contact:	Anping Li <anli>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.3.0	CC:	aos-bugs, jokerman, mmccomas
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Previously nodes had their schedulability state reset to the state defined in the inventory used during an upgrade. If the scheduling state had been modified since the inventory file was created this would be quite a surprise to admins. We have modified the upgrade process to preserve the current schedulability state during upgrade so that nodes do not change state after an upgrade.	Story Points:	---
Clone Of:
Clones:	1375718 (view as bug list)		Environment:
Last Closed:	2016-09-27 09:47:19 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1375718

Description Anping Li 2016-09-02 07:48:13 UTC

Description of problem:
The SchedulingDisabled nodes are schedulable after upgrade. By default, all nodes with masters are SchedulingDisabled after installation.

Version-Release number of selected component (if applicable):
atomic-openshift-utils-3.3.20-1.git.0.d15a8dc.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. install two nodes OSE 3.2 and check node status.
[root@anli-working host6]# oc get nodes
NAME                      STATUS                     AGE
host6master.example.com   Ready,SchedulingDisabled   41d
host6node.example.com     Ready                      41d

2. upgrade to OCP 3.3. and check node status.
[root@anli-working host6]# oc get nodes
NAME                      STATUS                     AGE
host6master.example.com   Ready                      41d
host6node.example.com     Ready                      41d

Actual results:


Expected results:
The node schedule status is same as before.

Additional info:

Comment 1 Scott Dodson 2016-09-02 20:04:58 UTC

Hi Anping,

We weren't able to reproduce the scenario you've described, where the master was set to schedulable when it wasn't prior to the upgrade. Can you share your inventory? The only way I can think that you'd get the results you got is if the master had openshift_schedulable=true in the inventory but then was manually set unschedulable befor the upgrade.


Regardless, I think the previous behavior could have left an environment in "correct" but unexpected state where everything is reset to the values in the inventory rather than how things were prior to upgrading. Because of this we've implemented a change that records the node's schedulability prior to the upgrade process and will restore the node to that state after the upgrade ignoring what was in the inventory. I think this is ultimately the right thing to do even if it's not strictly doing what's defined in the inventory.

https://github.com/openshift/openshift-ansible/pull/2406

What do you think?

Comment 3 Anping Li 2016-09-05 09:20:32 UTC

Scott, The fix works well. yes, for upgrade, it is better to follow the rule Only modify which must be modified.

Comment 5 errata-xmlrpc 2016-09-27 09:47:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933