Bug 1279303 - [3.5] Race condition is seen when updating a batch of nodes in cluster using "oadm manage-node"
Summary: [3.5] Race condition is seen when updating a batch of nodes in cluster using...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Fabiano Franz
QA Contact: Mike Fiedler
URL:
Whiteboard:
Depends On:
Blocks: 1416506 1416509 1416512
TreeView+ depends on / blocked
 
Reported: 2015-11-09 05:44 UTC by Johnny Liu
Modified: 2017-07-24 14:11 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: race condition when updating a batch of nodes (we verified the issue on 8+ nodes) to (un)schedulable with "oadm manage-node --schedulable=<true|false>". Consequence: several nodes couldn't be updated with the "object has been modified" error. Fix: use a patch on the "unschedulable" field of the node object instead of a full update. Result: all nodes could be properly updated schedulable and/or unschedulable (tested on a 60 nodes cluster).
Clone Of:
: 1416512 (view as bug list)
Environment:
Last Closed: 2017-04-12 19:04:34 UTC
Target Upstream Version:
Embargoed:
sdodson: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0884 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.5 RPM Release Advisory 2017-04-12 22:50:07 UTC

Description Johnny Liu 2015-11-09 05:44:58 UTC
Description of problem:
When there are a lot of nodes in a cluster, use "oadm manage-node" command to update all nodes' attribute, several nodes can not be updated, and error message is saying "the object has been modified; please apply your changes to the latest version and try again", so seem like race condition is happening to prevent node's attribute to be updated.

Version-Release number of selected component (if applicable):
atomic-openshift-3.1.0.0-0.git.0.0e71938.el7aos.x86_64

How reproducible:
Always

Steps to Reproduce:
1.Set up a cluster including 60 nodes.
2.Run "oadm manage-node" command to disable scheduling for all the nodes.
# oadm manage-node --selector="region=primary" --schedulable=false
NAME                  LABELS                                                           STATUS                     AGE
test1.cluster.local   kubernetes.io/hostname=10.66.81.15,region=primary,zone=default   Ready,SchedulingDisabled   2d
test10.cluster.local   kubernetes.io/hostname=10.66.80.12,region=primary,zone=default   Ready,SchedulingDisabled   2d
test11.cluster.local   
<--snip-->
Error from server: node "test15.cluster.local" cannot be updated: the object has been modified; please apply your changes to the latest version and try again
Error from server: node "test21.cluster.local" cannot be updated: the object has been modified; please apply your changes to the latest version and try again
Error from server: node "test22.cluster.local" cannot be updated: the object has been modified; please apply your changes to the latest version and try again
<--snip-->


Actual results:
several nodes failed to be updated. 

Expected results:
All the nodes should be updated successfully.

Additional info:

Comment 2 Fabiano Franz 2016-02-11 19:54:28 UTC
I could not reproduce locally, sending to QE to check if this is already fixed.

Comment 11 Fabiano Franz 2017-01-17 00:08:32 UTC
Fixed in https://github.com/openshift/origin/pull/12486.

Comment 12 Troy Dawson 2017-01-20 22:54:57 UTC
This has been merged into ocp and is in OCP v3.5.0.7 or newer.

Comment 13 Johnny Liu 2017-01-22 12:15:31 UTC
Verified this bug with atomic-openshift-3.5.0.7-1.git.0.390ef18.el7, and PASS.

Comment 14 Fabiano Franz 2017-01-24 23:54:34 UTC
This fix was backported to 3.3 and 3.4, so sending to QA again to be tested in those versions.

Comment 17 Fabiano Franz 2017-01-25 16:30:56 UTC
Opened separate bugs to track 3.3 and 3.4:

https://bugzilla.redhat.com/show_bug.cgi?id=1416506
https://bugzilla.redhat.com/show_bug.cgi?id=1416509

Comment 21 errata-xmlrpc 2017-04-12 19:04:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0884


Note You need to log in before you can comment on or make changes to this bug.