Bug 1279303 - [3.5] Race condition is seen when updating a batch of nodes in cluster using "oadm manage-node"
[3.5] Race condition is seen when updating a batch of nodes in cluster using...
Status: CLOSED ERRATA
Product: OpenShift Container Platform
Classification: Red Hat
Component: Command Line Interface (Show other bugs)
3.1.0
Unspecified Unspecified
medium Severity medium
: ---
: ---
Assigned To: Fabiano Franz
Mike Fiedler
:
Depends On:
Blocks: 1416506 1416509 1416512
  Show dependency treegraph
 
Reported: 2015-11-09 00:44 EST by Johnny Liu
Modified: 2017-07-24 10 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: race condition when updating a batch of nodes (we verified the issue on 8+ nodes) to (un)schedulable with "oadm manage-node --schedulable=<true|false>". Consequence: several nodes couldn't be updated with the "object has been modified" error. Fix: use a patch on the "unschedulable" field of the node object instead of a full update. Result: all nodes could be properly updated schedulable and/or unschedulable (tested on a 60 nodes cluster).
Story Points: ---
Clone Of:
: 1416512 (view as bug list)
Environment:
Last Closed: 2017-04-12 15:04:34 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
sdodson: needinfo-


Attachments (Terms of Use)

  None (edit)
Description Johnny Liu 2015-11-09 00:44:58 EST
Description of problem:
When there are a lot of nodes in a cluster, use "oadm manage-node" command to update all nodes' attribute, several nodes can not be updated, and error message is saying "the object has been modified; please apply your changes to the latest version and try again", so seem like race condition is happening to prevent node's attribute to be updated.

Version-Release number of selected component (if applicable):
atomic-openshift-3.1.0.0-0.git.0.0e71938.el7aos.x86_64

How reproducible:
Always

Steps to Reproduce:
1.Set up a cluster including 60 nodes.
2.Run "oadm manage-node" command to disable scheduling for all the nodes.
# oadm manage-node --selector="region=primary" --schedulable=false
NAME                  LABELS                                                           STATUS                     AGE
test1.cluster.local   kubernetes.io/hostname=10.66.81.15,region=primary,zone=default   Ready,SchedulingDisabled   2d
test10.cluster.local   kubernetes.io/hostname=10.66.80.12,region=primary,zone=default   Ready,SchedulingDisabled   2d
test11.cluster.local   
<--snip-->
Error from server: node "test15.cluster.local" cannot be updated: the object has been modified; please apply your changes to the latest version and try again
Error from server: node "test21.cluster.local" cannot be updated: the object has been modified; please apply your changes to the latest version and try again
Error from server: node "test22.cluster.local" cannot be updated: the object has been modified; please apply your changes to the latest version and try again
<--snip-->


Actual results:
several nodes failed to be updated. 

Expected results:
All the nodes should be updated successfully.

Additional info:
Comment 2 Fabiano Franz 2016-02-11 14:54:28 EST
I could not reproduce locally, sending to QE to check if this is already fixed.
Comment 11 Fabiano Franz 2017-01-16 19:08:32 EST
Fixed in https://github.com/openshift/origin/pull/12486.
Comment 12 Troy Dawson 2017-01-20 17:54:57 EST
This has been merged into ocp and is in OCP v3.5.0.7 or newer.
Comment 13 Johnny Liu 2017-01-22 07:15:31 EST
Verified this bug with atomic-openshift-3.5.0.7-1.git.0.390ef18.el7, and PASS.
Comment 14 Fabiano Franz 2017-01-24 18:54:34 EST
This fix was backported to 3.3 and 3.4, so sending to QA again to be tested in those versions.
Comment 17 Fabiano Franz 2017-01-25 11:30:56 EST
Opened separate bugs to track 3.3 and 3.4:

https://bugzilla.redhat.com/show_bug.cgi?id=1416506
https://bugzilla.redhat.com/show_bug.cgi?id=1416509
Comment 21 errata-xmlrpc 2017-04-12 15:04:34 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884

Note You need to log in before you can comment on or make changes to this bug.