Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1279303

Summary:	[3.5] Race condition is seen when updating a batch of nodes in cluster using "oadm manage-node"
Product:	OpenShift Container Platform	Reporter:	Johnny Liu <jialiu>
Component:	oc	Assignee:	Fabiano Franz <ffranz>
Status:	CLOSED ERRATA	QA Contact:	Mike Fiedler <mifiedle>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.1.0	CC:	aos-bugs, gpei, jialiu, jokerman, mifiedle, mmccomas, sdodson, tdawson, xtian
Target Milestone:	---	Flags:	sdodson: needinfo-
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: race condition when updating a batch of nodes (we verified the issue on 8+ nodes) to (un)schedulable with "oadm manage-node --schedulable=<true\|false>". Consequence: several nodes couldn't be updated with the "object has been modified" error. Fix: use a patch on the "unschedulable" field of the node object instead of a full update. Result: all nodes could be properly updated schedulable and/or unschedulable (tested on a 60 nodes cluster).	Story Points:	---
Clone Of:
Clones:	1416512 (view as bug list)		Environment:
Last Closed:	2017-04-12 19:04:34 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1416506, 1416509, 1416512

Description Johnny Liu 2015-11-09 05:44:58 UTC

Description of problem:
When there are a lot of nodes in a cluster, use "oadm manage-node" command to update all nodes' attribute, several nodes can not be updated, and error message is saying "the object has been modified; please apply your changes to the latest version and try again", so seem like race condition is happening to prevent node's attribute to be updated.

Version-Release number of selected component (if applicable):
atomic-openshift-3.1.0.0-0.git.0.0e71938.el7aos.x86_64

How reproducible:
Always

Steps to Reproduce:
1.Set up a cluster including 60 nodes.
2.Run "oadm manage-node" command to disable scheduling for all the nodes.
# oadm manage-node --selector="region=primary" --schedulable=false
NAME                  LABELS                                                           STATUS                     AGE
test1.cluster.local   kubernetes.io/hostname=10.66.81.15,region=primary,zone=default   Ready,SchedulingDisabled   2d
test10.cluster.local   kubernetes.io/hostname=10.66.80.12,region=primary,zone=default   Ready,SchedulingDisabled   2d
test11.cluster.local   
<--snip-->
Error from server: node "test15.cluster.local" cannot be updated: the object has been modified; please apply your changes to the latest version and try again
Error from server: node "test21.cluster.local" cannot be updated: the object has been modified; please apply your changes to the latest version and try again
Error from server: node "test22.cluster.local" cannot be updated: the object has been modified; please apply your changes to the latest version and try again
<--snip-->


Actual results:
several nodes failed to be updated. 

Expected results:
All the nodes should be updated successfully.

Additional info:

Comment 2 Fabiano Franz 2016-02-11 19:54:28 UTC

I could not reproduce locally, sending to QE to check if this is already fixed.

Comment 11 Fabiano Franz 2017-01-17 00:08:32 UTC

Fixed in https://github.com/openshift/origin/pull/12486.

Comment 12 Troy Dawson 2017-01-20 22:54:57 UTC

This has been merged into ocp and is in OCP v3.5.0.7 or newer.

Comment 13 Johnny Liu 2017-01-22 12:15:31 UTC

Verified this bug with atomic-openshift-3.5.0.7-1.git.0.390ef18.el7, and PASS.

Comment 14 Fabiano Franz 2017-01-24 23:54:34 UTC

This fix was backported to 3.3 and 3.4, so sending to QA again to be tested in those versions.

Comment 17 Fabiano Franz 2017-01-25 16:30:56 UTC

Opened separate bugs to track 3.3 and 3.4:

https://bugzilla.redhat.com/show_bug.cgi?id=1416506
https://bugzilla.redhat.com/show_bug.cgi?id=1416509

Comment 21 errata-xmlrpc 2017-04-12 19:04:34 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884