Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1279303

Summary: [3.5] Race condition is seen when updating a batch of nodes in cluster using "oadm manage-node"
Product: OpenShift Container Platform Reporter: Johnny Liu <jialiu>
Component: ocAssignee: Fabiano Franz <ffranz>
Status: CLOSED ERRATA QA Contact: Mike Fiedler <mifiedle>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.1.0CC: aos-bugs, gpei, jialiu, jokerman, mifiedle, mmccomas, sdodson, tdawson, xtian
Target Milestone: ---Flags: sdodson: needinfo-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: race condition when updating a batch of nodes (we verified the issue on 8+ nodes) to (un)schedulable with "oadm manage-node --schedulable=<true|false>". Consequence: several nodes couldn't be updated with the "object has been modified" error. Fix: use a patch on the "unschedulable" field of the node object instead of a full update. Result: all nodes could be properly updated schedulable and/or unschedulable (tested on a 60 nodes cluster).
Story Points: ---
Clone Of:
: 1416512 (view as bug list) Environment:
Last Closed: 2017-04-12 19:04:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1416506, 1416509, 1416512    

Description Johnny Liu 2015-11-09 05:44:58 UTC
Description of problem:
When there are a lot of nodes in a cluster, use "oadm manage-node" command to update all nodes' attribute, several nodes can not be updated, and error message is saying "the object has been modified; please apply your changes to the latest version and try again", so seem like race condition is happening to prevent node's attribute to be updated.

Version-Release number of selected component (if applicable):
atomic-openshift-3.1.0.0-0.git.0.0e71938.el7aos.x86_64

How reproducible:
Always

Steps to Reproduce:
1.Set up a cluster including 60 nodes.
2.Run "oadm manage-node" command to disable scheduling for all the nodes.
# oadm manage-node --selector="region=primary" --schedulable=false
NAME                  LABELS                                                           STATUS                     AGE
test1.cluster.local   kubernetes.io/hostname=10.66.81.15,region=primary,zone=default   Ready,SchedulingDisabled   2d
test10.cluster.local   kubernetes.io/hostname=10.66.80.12,region=primary,zone=default   Ready,SchedulingDisabled   2d
test11.cluster.local   
<--snip-->
Error from server: node "test15.cluster.local" cannot be updated: the object has been modified; please apply your changes to the latest version and try again
Error from server: node "test21.cluster.local" cannot be updated: the object has been modified; please apply your changes to the latest version and try again
Error from server: node "test22.cluster.local" cannot be updated: the object has been modified; please apply your changes to the latest version and try again
<--snip-->


Actual results:
several nodes failed to be updated. 

Expected results:
All the nodes should be updated successfully.

Additional info:

Comment 2 Fabiano Franz 2016-02-11 19:54:28 UTC
I could not reproduce locally, sending to QE to check if this is already fixed.

Comment 11 Fabiano Franz 2017-01-17 00:08:32 UTC
Fixed in https://github.com/openshift/origin/pull/12486.

Comment 12 Troy Dawson 2017-01-20 22:54:57 UTC
This has been merged into ocp and is in OCP v3.5.0.7 or newer.

Comment 13 Johnny Liu 2017-01-22 12:15:31 UTC
Verified this bug with atomic-openshift-3.5.0.7-1.git.0.390ef18.el7, and PASS.

Comment 14 Fabiano Franz 2017-01-24 23:54:34 UTC
This fix was backported to 3.3 and 3.4, so sending to QA again to be tested in those versions.

Comment 17 Fabiano Franz 2017-01-25 16:30:56 UTC
Opened separate bugs to track 3.3 and 3.4:

https://bugzilla.redhat.com/show_bug.cgi?id=1416506
https://bugzilla.redhat.com/show_bug.cgi?id=1416509

Comment 21 errata-xmlrpc 2017-04-12 19:04:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884