1372594 – The SchedulingDisabled nodes are schedulable after upgrade

Bug 1372594 - The SchedulingDisabled nodes are schedulable after upgrade

Summary: The SchedulingDisabled nodes are schedulable after upgrade

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Cluster Version Operator
Sub Component:
Version:	3.3.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Scott Dodson
QA Contact:	Anping Li
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1375718
TreeView+	depends on / blocked

Reported:	2016-09-02 07:48 UTC by Anping Li
Modified:	2017-03-08 18:26 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:	Previously nodes had their schedulability state reset to the state defined in the inventory used during an upgrade. If the scheduling state had been modified since the inventory file was created this would be quite a surprise to admins. We have modified the upgrade process to preserve the current schedulability state during upgrade so that nodes do not change state after an upgrade.
Clone Of:
Clones:	1375718 (view as bug list)
Environment:
Last Closed:	2016-09-27 09:47:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1933	0	normal	SHIPPED_LIVE	Red Hat OpenShift Container Platform 3.3 Release Advisory	2016-09-27 13:24:36 UTC

Description Anping Li 2016-09-02 07:48:13 UTC

Description of problem:
The SchedulingDisabled nodes are schedulable after upgrade. By default, all nodes with masters are SchedulingDisabled after installation.

Version-Release number of selected component (if applicable):
atomic-openshift-utils-3.3.20-1.git.0.d15a8dc.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. install two nodes OSE 3.2 and check node status.
[root@anli-working host6]# oc get nodes
NAME                      STATUS                     AGE
host6master.example.com   Ready,SchedulingDisabled   41d
host6node.example.com     Ready                      41d

2. upgrade to OCP 3.3. and check node status.
[root@anli-working host6]# oc get nodes
NAME                      STATUS                     AGE
host6master.example.com   Ready                      41d
host6node.example.com     Ready                      41d

Actual results:


Expected results:
The node schedule status is same as before.

Additional info:

Comment 1 Scott Dodson 2016-09-02 20:04:58 UTC

Hi Anping,

We weren't able to reproduce the scenario you've described, where the master was set to schedulable when it wasn't prior to the upgrade. Can you share your inventory? The only way I can think that you'd get the results you got is if the master had openshift_schedulable=true in the inventory but then was manually set unschedulable befor the upgrade.


Regardless, I think the previous behavior could have left an environment in "correct" but unexpected state where everything is reset to the values in the inventory rather than how things were prior to upgrading. Because of this we've implemented a change that records the node's schedulability prior to the upgrade process and will restore the node to that state after the upgrade ignoring what was in the inventory. I think this is ultimately the right thing to do even if it's not strictly doing what's defined in the inventory.

https://github.com/openshift/openshift-ansible/pull/2406

What do you think?

Comment 3 Anping Li 2016-09-05 09:20:32 UTC

Scott, The fix works well. yes, for upgrade, it is better to follow the rule Only modify which must be modified.

Comment 5 errata-xmlrpc 2016-09-27 09:47:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1933

Note You need to log in before you can comment on or make changes to this bug.